AI Voice Designer (Beta) — Design a Voice from a Text Description
Most text-to-speech tools give you a catalogue of pre-built voices and ask you to listen through dozens of options until you find one that feels right. The EasyVoice AI Voice Designer flips that process: you describe the voice you want in plain English, and the system generates three distinct candidate previews for you to evaluate. Pick your favourite, give it a name, save it, and it is available everywhere — the TTS editor, podcast dialogue mode, and the REST API — with a stable voice ID that works in your existing integrations.
Voice Designer is a Pro feature, available at $9.99/mo on the Pro plan. It is in Beta, which means the underlying mapping uses a structured preset-recipe system today (we are honest about this — see the FAQ below for details) with a full generative upgrade planned when the hardware requirements are met. Even in its Beta form, the feature is genuinely useful: you get three real audio previews on a live sentence, not a synthetic mock, and the saved voice behaves identically to any standard voice across all EasyVoice surfaces.
How Voice Designer works — the four-step flow
The design flow is intentionally simple. You do not need any audio equipment, any knowledge of voice synthesis parameters, or any prior experience with TTS tools. The entire process — from description to saved, usable voice — takes about two minutes and is self-service inside your account.
- 1
Describe the voice you want
Open the Design a voice tab in your account. Type a free-text description: for example, “warm, female, British accent, slightly slower than normal pace” or “energetic young male, American, fast delivery, slightly higher pitch”. You can be as specific or as vague as you like — the system interprets natural language, not structured parameters. There is no required format or vocabulary.
- 2
Receive three candidate previews
The system maps your description to three diverse candidate recipes — each combining a base voice from EasyVoice's catalogue, a speed adjustment, and a pitch-shift — and synthesises a short audio sample for each candidate on a fixed test sentence. The three candidates are intentionally distinct: if you described a calm deep male voice, you might get one candidate that leans deep-and-slow, one that is deep-and-neutral, and one that adds a slight pitch-down while keeping speed natural. This spread gives you a genuine choice rather than three near-identical results. The previews are real synthesis output, not mock clips — what you hear is exactly what you will get when you synthesise your own text.
- 3
Pick your favourite, name it, and save
Listen to the three candidates, pick the one that best matches your intent, give it a display name (e.g. “Podcast Host” or “Product Demo Voice”), and click Save. The voice is stored to your account with a stable ID prefixed
vd_. Pro accounts can save up to 10 designed voices at a time; you can delete and replace voices if you need to experiment further. - 4
Use it everywhere in EasyVoice
Saved designed voices appear in the voice selector under a “Designed Voices” group alongside standard voices. They are available in the TTS text editor, in podcast dialogue mode (assign to any speaker role), and via the REST API — pass the
vd_voice ID to the/v1/audio/speechendpoint exactly as you would any other voice ID. The response format, streaming support, and request shape are identical to standard API calls.
Honest Beta framing — what “Voice Designer” is and is not
We call this feature Voice Designer (Beta) and we want to be explicit about what that means. The current implementation does not generate a novel voice from scratch in the way that a large-scale generative audio model would. Instead, it maps your text description to a structured recipe — a combination of one of EasyVoice's 56 catalogue voices, a speed multiplier, and a pitch-shift applied in post-processing — and synthesises a preview using that recipe. The mapping logic uses Claude to interpret the description and select three diverse, non-overlapping candidates from the space of possible combinations; if the API key is unavailable, a keyword-based fallback produces reasonable candidates without any external call.
This matters because the label “AI Voice Designer” could reasonably imply a fully generative model. We are choosing to be transparent rather than use language that overstates the capability. What you actually get is useful in practice: the description-to-preview flow saves you the time of manually trying voices and tweaking parameters, the three-candidate spread gives you a genuine choice, and the saved voice is stable and portable. But it is a guided search through a structured parameter space, not the synthesis of a unique voice that has never existed before.
The “(Beta)” qualifier signals that we plan to replace the recipe system with a genuinely generative model — one that can produce voices outside the existing catalogue — once the hardware constraints are resolved (current constraint: generative voice models do not fit in the 8 GB VRAM available on the inference server). When that upgrade ships, designed voices created under the Beta recipe system will be migrated or flagged clearly. In the meantime, the feature is useful, the copy is honest, and the pricing reflects it: Voice Designer Beta is included in the standard Pro plan at $9.99/mo, not gated behind a premium tier.
Who uses Voice Designer and for what
The most immediate use case is branded content narration. A marketing team, podcast producer, or e-learning author has a specific vocal identity in mind — a tone of voice that fits their brand — but cannot easily express it by browsing a catalogue. Voice Designer lets them describe the target and iterate quickly: try a description, listen to the candidates, refine the description, try again. Within a few iterations you typically converge on a voice that is close enough to ship; the saved voice then becomes a consistent asset used across all content for that project.
Podcast dialogue mode is a particularly natural fit. When producing a scripted multi-speaker podcast, you want the two speakers to sound clearly distinct — different gender, accent, or vocal energy — but you also want them to feel like they belong in the same episode. Voice Designer lets you describe both roles in relation to each other: “warm British female, moderate pace” for the host and “younger American male, slightly faster, higher energy” for the guest. You preview both, save both, assign them to Speaker A and Speaker B in the dialogue editor, and generate the full episode without recording a single audio sample.
API integrations that need a voice selection step in their workflow benefit from Voice Designer because the design flow is fully REST-API accessible. If you are building a SaaS product that generates narrated content for end users and you want to give those users a voice customisation step without exposing the full voice catalogue, you can expose a simple description field in your product UI, call the EasyVoice design endpoint on their behalf, present the three candidates, and save the chosen voice to their account. The savedvd_ voice ID is then used for all subsequent synthesis calls for that user.
Iterative content testing is a lower-stakes but surprisingly common use case: which vocal character performs better in a short social video? Voice Designer lets you create two or three distinct named voices for the same script, generate audio for each, run the variants, and retire the underperformers. Because designed voices have stable IDs, you can keep the winning voice in production and delete the others without disrupting anything.
Voice Designer is on the Pro plan
EasyVoice Pro
$9.99/mo- ✓ Voice Designer (Beta) — describe & preview
- ✓ Up to 10 saved designed voices
- ✓ Designed voices in editor, podcast & API
- ✓ Unlimited TTS — 56 voices, 30+ languages
- ✓ OpenAI-compatible REST API access
- ✓ Podcast dialogue mode
Free tier
$0- ✓ 5,000 characters/day TTS
- ✓ Access to all 56 standard voices
- ✕ Voice Designer not included
- ✕ No API access
- ✕ No podcast dialogue mode
- ✕ No voice cloning
Voice Designer is intentionally included in the standard Pro plan, not gated behind Pro+ ($19.99/mo). Pro+ is reserved for voice cloning — a separate feature that enrolls a real recorded voice sample with consent attestation and AudioSeal watermarking. If your use case is describing and saving synthetic voices rather than cloning a real voice, Pro is the right tier.
Related features
AI Podcast Generator
Turn a script into a multi-speaker audio podcast. Assign designed voices to Speaker A and Speaker B for a fully custom-voiced episode.
Voice Cloning (Pro+)
Clone a real recorded voice with consent attestation and AudioSeal watermarking. Available on Pro+ at $19.99/mo.
Pricing — Free, Pro, Pro+
Compare all three tiers: free 5K chars/day, Pro $9.99/mo with Voice Designer, Pro+ $19.99/mo with voice cloning.
API Docs
Full REST API reference including the Voice Designer endpoints: POST /v1/voices/design and POST /v1/voices for saving designed voices.
Frequently asked questions
What is the EasyVoice AI Voice Designer?▾
The AI Voice Designer (Beta) is a Pro feature that lets you describe a voice in plain English — for example, 'deep, calm, American male, slightly slow' — and receive three distinct candidate voice previews. Each candidate is a recipe: a combination of a base voice from EasyVoice's catalogue, a speed adjustment, and a pitch shift. You listen to the candidates on a fixed sample sentence, choose the one you like, give it a name, and save it. The saved voice then appears in your voice library and is available everywhere: the TTS text editor, podcast dialogue mode, and the REST API.
Is the Voice Designer genuinely generative AI or is it a preset recipe system?▾
It is an honest preset-recipe system in Beta. We say 'Beta' because we intend to upgrade to a full generative model once the hardware requirements are met. Today, Voice Designer maps your text description to a structured recipe — a base voice from EasyVoice's catalogue (56 voices across 30+ languages), a speed multiplier (0.5x–2.0x), and a pitch-shift in semitones (−4 to +4 applied via audio post-processing). Claude maps the description to three diverse, non-overlapping candidates. You get real audio previews and a saved voice with a stable ID, but the underlying synthesis engine is the same one that powers standard voices. We are transparent about this: 'Voice Designer (Beta)' in the product name reflects this constraint, and the FAQ you are reading right now is the disclosure.
Which plans include the Voice Designer?▾
Voice Designer is a Pro feature, available on both the Pro plan ($9.99/mo) and the Pro+ plan ($19.99/mo). The free tier does not include it — free users who open the Design a Voice tab in their account will see an upsell prompt to upgrade to Pro. You do not need Pro+ for Voice Designer; Pro+ is required for voice cloning (a separate feature that enrolls a real recorded voice sample).
Can I use a designed voice in podcast dialogue mode?▾
Yes. Once you save a designed voice, it appears in your voice library under a 'Designed Voices' group. In the podcast dialogue editor you assign a voice to each speaker role — designed voices are available alongside standard voices in that picker. This means you can create a custom-sounding duo for a podcast episode without recording a single audio sample: describe two different voices, preview and save each, then assign them to Speaker A and Speaker B in the dialogue script.
How many designed voices can I create?▾
Pro and Pro+ accounts can save up to 10 designed voices at a time. You can delete a designed voice and replace it with a new one if you want to experiment with different recipes. Each saved designed voice has a stable voice ID (prefixed vd_) that works in the API, so your integrations do not break if you redesign a voice — you would just update the voice ID in your API calls after saving the new version.
Is Voice Designer available in the API?▾
Yes. The design flow is fully exposed via the REST API. POST to /v1/voices/design with a JSON body containing your description to receive three candidate recipes. Then POST to /v1/voices with the chosen recipe and a display name to save the voice. The saved voice's ID (prefixed vd_) can then be passed to the standard /v1/audio/speech TTS endpoint exactly as you would pass any other voice ID. Full API documentation is at /api-docs.