Ready to use the API?
Free account — no credit card required. Get an API key in 30 seconds.
OpenAI-compatible Text-to-Speech API. Drop-in replacement — change your base URL and API key.
curl -X POST https://your-domain.com/api/v1/audio/speech \
-H "Authorization: Bearer ev_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "kokoro-82m",
"input": "Hello, this is EasyVoice!",
"voice": "af_aoede"
}' \
--output speech.mp3Generate speech from text. Returns audio file directly.
| Header | Value |
|---|---|
| Authorization | Bearer ev_your_api_key |
| Content-Type | application/json |
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | No | Always "kokoro-82m" |
| input | string | Yes | Text to convert (max 10,000 chars) |
| voice | string | No | Voice ID (default: af_aoede) |
| response_format | string | No | "mp3" or "wav" (default: mp3) |
| speed | number | No | 0.5 to 2.0 (default: 1.0) |
EasyVoice adds optional audio controls on top of the OpenAI-compatible request. On the public POST /v1/audio/speech endpoint these use an ev_ prefix. All are optional and default to a no-op, so standard OpenAI clients that omit them receive identical output to before. These are deterministic audio/voicing controls (pitch shift, output gain, and EQ tone presets) — not generative effects.
| Parameter | Type | Required | Description |
|---|---|---|---|
| ev_pitch | number | No | Pitch shift in semitones. Range -4 to +4. Default: 0 (no shift). |
| ev_volume_db | number | No | Output gain in dB. Range -6 to +6. Default: 0 (no change). |
| ev_tone | string | No | EQ tone preset: "neutral" (default), "warm", "bright", or "bass". |
| <break> in input | markup | No | Insert silence: embed <break time="500ms"/> or <break time="0.5s"/> in the input text. Max 3000ms per break. |
On the web app (POST /api/tts/generate) the same controls are sent without the prefix: pitch, volume_db, and tone (same ranges and defaults). Out-of-range values are rejected with 400.
curl -X POST https://your-domain.com/api/v1/audio/speech \
-H "Authorization: Bearer ev_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "kokoro-82m",
"input": "Take a breath. <break time=\"500ms\"/> Now continue.",
"voice": "af_aoede",
"ev_pitch": 2,
"ev_volume_db": -1.5,
"ev_tone": "warm"
}' \
--output speech.mp3from openai import OpenAI
client = OpenAI(
api_key="ev_your_api_key",
base_url="https://your-domain.com/api/v1"
)
response = client.audio.speech.create(
model="kokoro-82m",
voice="af_aoede",
input="Hello from EasyVoice!"
)
response.stream_to_file("output.mp3")import OpenAI from "openai";
const client = new OpenAI({
apiKey: "ev_your_api_key",
baseURL: "https://your-domain.com/api/v1"
});
const mp3 = await client.audio.speech.create({
model: "kokoro-82m",
voice: "af_aoede",
input: "Hello from EasyVoice!"
});
const buffer = Buffer.from(await mp3.arrayBuffer());
await fs.promises.writeFile("output.mp3", buffer);| Limit | Value |
|---|---|
| Requests per minute | 60 |
| Max input length | 10,000 characters |
| Characters per month | Unlimited (Pro plan) |
56 voices across 9 languages. Use the voice ID in your API requests.
See the voice browser for the full list with audio previews.
Arabic: 10 voices ar_m1–ar_m5 and ar_f1–ar_f5 work directly in POST /v1/audio/speech. Arabic numerals, dates, and AED amounts are normalized to spoken form automatically. See Arabic text to speech.
Pro+ subscribers can also use cloned voice IDs (e.g. voice_abc123) returned by the GET /v1/voices endpoint.
Enroll, list, and delete custom voice clones. Cloned voice IDs can be used directly in POST /v1/audio/speech requests. Requires a Pro+ subscription and explicit consent for each enrolled voice.
Multipart form upload. Returns 202 with {voice_id, status:"enrolling"}. Enrollment typically completes in 1–2 minutes; poll GET /v1/voices for status.
curl -X POST https://your-domain.com/api/v1/voices \ -H "Authorization: Bearer ev_your_api_key" \ -F "name=My Voice" \ -F "consent=true" \ -F "audio=@sample.wav"
| Field | Type | Required | Description |
|---|---|---|---|
| name | string | Yes | Display name for the cloned voice |
| consent | string | Yes | Must be "true" — attests you own/have consent to clone this voice |
| audio | file | Yes | WAV, MP3, MP4, or M4A. 15–60 seconds of clear speech recommended. |
Returns all cloned voices for the authenticated user. Use status === "ready" voices in speech requests.
curl https://your-domain.com/api/v1/voices \
-H "Authorization: Bearer ev_your_api_key"
# Response:
# {
# "voices": [
# { "id": "voice_abc123", "name": "My Voice", "status": "ready", "createdAt": 1234567890 }
# ]
# }For long inputs, submit an asynchronous job instead of waiting on the synchronous endpoint. Jobs are queued, processed, and the result is fetched by polling. Works with any voice ID, including Arabic and cloned voices.
curl -X POST https://your-domain.com/api/v1/jobs \
-H "Authorization: Bearer ev_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"input": "A long article text...",
"voice": "af_aoede",
"format": "mp3"
}'
# 202 Accepted
# { "id": "1f6f7c2e-...", "status": "queued" }curl https://your-domain.com/api/v1/jobs/1f6f7c2e-... \
-H "Authorization: Bearer ev_your_api_key"
# { "id": "1f6f7c2e-...", "status": "completed",
# "audio_url": "/audio/....mp3", "created_at": ..., "completed_at": ... }Statuses: queued → active → completed / failed. Jobs are visible only to the account that created them.
Design a custom voice by describing it in plain text. The API maps your description to 3 diverse preset-recipe candidates (baseVoice + speed + pitch). Pick one and save it as a vd_ voice you can use in any synthesis or podcast request. Pro or Pro+ required; max 10 designed voices per account.
Note: Voice design maps your description to existing preset voices with speed and pitch adjustments — it is not generative synthesis. Results are preset-recipe based. When design assist is temporarily unavailable a heuristic fallback is used automatically (the response includes "fallback":true).
Also accepted as POST /v1/voices with a JSON body containing {"description":"..."}. Returns 3 candidate recipes to choose from.
curl -X POST https://your-domain.com/api/v1/voices/design \
-H "Authorization: Bearer ev_your_api_key" \
-H "Content-Type: application/json" \
-d '{"description": "warm, calm British male narrator"}'
# 200 OK
# {
# "candidates": [
# {
# "baseVoice": "bm_george",
# "speed": 0.9,
# "pitchShift": -1,
# "label": "Deep & Calm",
# "rationale": "British male with slightly lowered pitch and relaxed pace."
# },
# { "baseVoice": "bm_lewis", "speed": 0.85, "pitchShift": 0, ... },
# { "baseVoice": "am_echo", "speed": 1.0, "pitchShift": -1, ... }
# ]
# }Pick a candidate recipe from Step 1 and save it with a name. Returns avd_ voice ID you can use immediately in synthesis.
curl -X POST https://your-domain.com/api/v1/voices \
-H "Authorization: Bearer ev_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"type": "designed",
"name": "My Narrator",
"recipe": {
"baseVoice": "bm_george",
"speed": 0.9,
"pitchShift": -1
}
}'
# 200 OK
# { "voice_id": "vd_...", "status": "ready" }Generate two-host dialogue podcasts from a list of script segments. Submit a job via POST /api/v1/jobs with type:"podcast", then poll for the stitched result. Per-segment audio URLs are returned alongside the final episode URL.
curl -X POST https://your-domain.com/api/v1/jobs \
-H "Authorization: Bearer ev_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"type": "podcast",
"voices": { "A": "af_aoede", "B": "am_echo" },
"format": "mp3",
"segments": [
{ "speaker": "A", "text": "Welcome to the show. Today we explore the future of AI." },
{ "speaker": "B", "text": "Thanks for having me. AI is evolving at a remarkable pace." },
{ "speaker": "A", "text": "Absolutely. What do you think the next five years will look like?" },
{ "speaker": "B", "text": "I expect multimodal models to become the default in most apps." }
]
}'
# 202 Accepted
# { "id": "3a9f1b7c-...", "status": "queued" }curl https://your-domain.com/api/v1/jobs/3a9f1b7c-... \
-H "Authorization: Bearer ev_your_api_key"
# {
# "id": "3a9f1b7c-...",
# "status": "completed",
# "audio_url": "/audio/episode-3a9f1b7c.mp3",
# "segments": [
# { "speaker": "A", "audioUrl": "/audio/seg-0.mp3" },
# { "speaker": "B", "audioUrl": "/audio/seg-1.mp3" },
# { "speaker": "A", "audioUrl": "/audio/seg-2.mp3" },
# { "speaker": "B", "audioUrl": "/audio/seg-3.mp3" }
# ],
# "error": null,
# "created_at": 1749500000000,
# "completed_at": 1749500045000
# }| Plan | Max chars / episode | Cloned hosts |
|---|---|---|
| Free | 2,000 | No |
| Pro |
Embed <break time="500ms"/> (or <break time="0.5s"/>) anywhere in the input text to insert a silent pause. Accepts milliseconds (ms) or seconds (s), capped at 3000ms per break.
Embed [word](/phonemes/) inline in your input text to control pronunciation using misaki IPA notation. Example: The [Louisville](/lˈuːɪvɪl/) skyline.
Permanently deletes the voice clone and its embedding. Returns 200 on success.
curl -X DELETE https://your-domain.com/api/v1/voices/voice_abc123 \ -H "Authorization: Bearer ev_your_api_key"
Pass the voice_* ID as the voice field in any speech request. Pro+ required.
curl -X POST https://your-domain.com/api/v1/audio/speech \
-H "Authorization: Bearer ev_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "kokoro-82m",
"input": "Hello, this is my cloned voice!",
"voice": "voice_abc123"
}' \
--output speech.mp3Pass the vd_ voice ID as the voice field in any speech or podcast request.
curl -X POST https://your-domain.com/api/v1/audio/speech \
-H "Authorization: Bearer ev_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "kokoro-82m",
"input": "Welcome to the show.",
"voice": "vd_..."
}' \
--output speech.mp3| Plan | Voice design | Max designed voices |
|---|---|---|
| Free | No (403) | — |
| Pro | Yes | 10 |
| Pro+ | Yes | 10 |
Returns 503 when design assist is unavailable (API key not configured server-side); the response body includes a heuristic fallback automatically in most cases. Returns 409 when the 10-voice limit is reached.
| 10,000 |
| No |
| Pro+ | 30,000 | Yes — use voice_* IDs |
Use standard voice IDs (e.g. af_aoede, am_echo) for hosts A and B. Pro+ subscribers can assign cloned voice_* IDs returned by GET /v1/voices.