EasyVoice
VoicesNewArabicNewPricingAPI
EasyVoice

Free text-to-speech powered by open source AI.

Product

  • Voices
  • Pricing
  • API

Resources

  • Blog
  • Documentation
  • About

Legal

  • Privacy Policy
  • Terms of Service

© 2026 EasyVoice. Powered by Kokoro-82M (Apache 2.0).

Built with ❤️ and open source AI.

Built by InfoDriven

Dubai, United Arab Emirates · Support@infodriven.ae · infodriven.ae

  1. Home
  2. /API Documentation

Ready to use the API?

Free account — no credit card required. Get an API key in 30 seconds.

Get API Key (Free Signup)
Pro Feature

API Documentation

OpenAI-compatible Text-to-Speech API. Drop-in replacement — change your base URL and API key.

Quick Start

curl -X POST https://your-domain.com/api/v1/audio/speech \
  -H "Authorization: Bearer ev_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro-82m",
    "input": "Hello, this is EasyVoice!",
    "voice": "af_aoede"
  }' \
  --output speech.mp3

POST /api/v1/audio/speech

Generate speech from text. Returns audio file directly.

Headers

HeaderValue
AuthorizationBearer ev_your_api_key
Content-Typeapplication/json

Body Parameters

ParameterTypeRequiredDescription
modelstringNoAlways "kokoro-82m"
inputstringYesText to convert (max 10,000 chars)
voicestringNoVoice ID (default: af_aoede)
response_formatstringNo"mp3" or "wav" (default: mp3)
speednumberNo0.5 to 2.0 (default: 1.0)

EasyVoice Extensions

EasyVoice adds optional audio controls on top of the OpenAI-compatible request. On the public POST /v1/audio/speech endpoint these use an ev_ prefix. All are optional and default to a no-op, so standard OpenAI clients that omit them receive identical output to before. These are deterministic audio/voicing controls (pitch shift, output gain, and EQ tone presets) — not generative effects.

Audio parameters

ParameterTypeRequiredDescription
ev_pitchnumberNoPitch shift in semitones. Range -4 to +4. Default: 0 (no shift).
ev_volume_dbnumberNoOutput gain in dB. Range -6 to +6. Default: 0 (no change).
ev_tonestringNoEQ tone preset: "neutral" (default), "warm", "bright", or "bass".
<break> in inputmarkupNoInsert silence: embed <break time="500ms"/> or <break time="0.5s"/> in the input text. Max 3000ms per break.

On the web app (POST /api/tts/generate) the same controls are sent without the prefix: pitch, volume_db, and tone (same ranges and defaults). Out-of-range values are rejected with 400.

Example request

curl -X POST https://your-domain.com/api/v1/audio/speech \
  -H "Authorization: Bearer ev_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro-82m",
    "input": "Take a breath. <break time=\"500ms\"/> Now continue.",
    "voice": "af_aoede",
    "ev_pitch": 2,
    "ev_volume_db": -1.5,
    "ev_tone": "warm"
  }' \
  --output speech.mp3

Python Example

from openai import OpenAI

client = OpenAI(
    api_key="ev_your_api_key",
    base_url="https://your-domain.com/api/v1"
)

response = client.audio.speech.create(
    model="kokoro-82m",
    voice="af_aoede",
    input="Hello from EasyVoice!"
)

response.stream_to_file("output.mp3")

Node.js Example

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "ev_your_api_key",
  baseURL: "https://your-domain.com/api/v1"
});

const mp3 = await client.audio.speech.create({
  model: "kokoro-82m",
  voice: "af_aoede",
  input: "Hello from EasyVoice!"
});

const buffer = Buffer.from(await mp3.arrayBuffer());
await fs.promises.writeFile("output.mp3", buffer);

Rate Limits

LimitValue
Requests per minute60
Max input length10,000 characters
Characters per monthUnlimited (Pro plan)

Available Voices

56 voices across 9 languages. Use the voice ID in your API requests.

See the voice browser for the full list with audio previews.

Arabic: 10 voices ar_m1–ar_m5 and ar_f1–ar_f5 work directly in POST /v1/audio/speech. Arabic numerals, dates, and AED amounts are normalized to spoken form automatically. See Arabic text to speech.

Pro+ subscribers can also use cloned voice IDs (e.g. voice_abc123) returned by the GET /v1/voices endpoint.

Voice Cloning API

Pro+ Required

Enroll, list, and delete custom voice clones. Cloned voice IDs can be used directly in POST /v1/audio/speech requests. Requires a Pro+ subscription and explicit consent for each enrolled voice.

POST /api/v1/voices — Enroll a voice clone

Multipart form upload. Returns 202 with {voice_id, status:"enrolling"}. Enrollment typically completes in 1–2 minutes; poll GET /v1/voices for status.

curl -X POST https://your-domain.com/api/v1/voices \
  -H "Authorization: Bearer ev_your_api_key" \
  -F "name=My Voice" \
  -F "consent=true" \
  -F "audio=@sample.wav"
FieldTypeRequiredDescription
namestringYesDisplay name for the cloned voice
consentstringYesMust be "true" — attests you own/have consent to clone this voice
audiofileYesWAV, MP3, MP4, or M4A. 15–60 seconds of clear speech recommended.

GET /api/v1/voices — List voice clones

Returns all cloned voices for the authenticated user. Use status === "ready" voices in speech requests.

curl https://your-domain.com/api/v1/voices \
  -H "Authorization: Bearer ev_your_api_key"

# Response:
# {
#   "voices": [
#     { "id": "voice_abc123", "name": "My Voice", "status": "ready", "createdAt": 1234567890 }
#   ]
# }

Async Jobs API

For long inputs, submit an asynchronous job instead of waiting on the synchronous endpoint. Jobs are queued, processed, and the result is fetched by polling. Works with any voice ID, including Arabic and cloned voices.

POST /api/v1/jobs — Submit a TTS job

curl -X POST https://your-domain.com/api/v1/jobs \
  -H "Authorization: Bearer ev_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "A long article text...",
    "voice": "af_aoede",
    "format": "mp3"
  }'

# 202 Accepted
# { "id": "1f6f7c2e-...", "status": "queued" }

GET /api/v1/jobs/{id} — Poll job status

curl https://your-domain.com/api/v1/jobs/1f6f7c2e-... \
  -H "Authorization: Bearer ev_your_api_key"

# { "id": "1f6f7c2e-...", "status": "completed",
#   "audio_url": "/audio/....mp3", "created_at": ..., "completed_at": ... }

Statuses: queued → active → completed / failed. Jobs are visible only to the account that created them.

Voice Design API

Pro Required

Design a custom voice by describing it in plain text. The API maps your description to 3 diverse preset-recipe candidates (baseVoice + speed + pitch). Pick one and save it as a vd_ voice you can use in any synthesis or podcast request. Pro or Pro+ required; max 10 designed voices per account.

Note: Voice design maps your description to existing preset voices with speed and pitch adjustments — it is not generative synthesis. Results are preset-recipe based. When design assist is temporarily unavailable a heuristic fallback is used automatically (the response includes "fallback":true).

Step 1 — POST /api/v1/voices/design — Get candidates

Also accepted as POST /v1/voices with a JSON body containing {"description":"..."}. Returns 3 candidate recipes to choose from.

curl -X POST https://your-domain.com/api/v1/voices/design \
  -H "Authorization: Bearer ev_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"description": "warm, calm British male narrator"}'

# 200 OK
# {
#   "candidates": [
#     {
#       "baseVoice": "bm_george",
#       "speed": 0.9,
#       "pitchShift": -1,
#       "label": "Deep & Calm",
#       "rationale": "British male with slightly lowered pitch and relaxed pace."
#     },
#     { "baseVoice": "bm_lewis", "speed": 0.85, "pitchShift": 0, ... },
#     { "baseVoice": "am_echo",  "speed": 1.0,  "pitchShift": -1, ... }
#   ]
# }

Step 2 — POST /api/v1/voices — Save a designed voice

Pick a candidate recipe from Step 1 and save it with a name. Returns avd_ voice ID you can use immediately in synthesis.

curl -X POST https://your-domain.com/api/v1/voices \
  -H "Authorization: Bearer ev_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "designed",
    "name": "My Narrator",
    "recipe": {
      "baseVoice": "bm_george",
      "speed": 0.9,
      "pitchShift": -1
    }
  }'

# 200 OK
# { "voice_id": "vd_...", "status": "ready" }

Podcast API

Pro Required

Generate two-host dialogue podcasts from a list of script segments. Submit a job via POST /api/v1/jobs with type:"podcast", then poll for the stitched result. Per-segment audio URLs are returned alongside the final episode URL.

POST /api/v1/jobs — Submit a podcast job

curl -X POST https://your-domain.com/api/v1/jobs \
  -H "Authorization: Bearer ev_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "podcast",
    "voices": { "A": "af_aoede", "B": "am_echo" },
    "format": "mp3",
    "segments": [
      { "speaker": "A", "text": "Welcome to the show. Today we explore the future of AI." },
      { "speaker": "B", "text": "Thanks for having me. AI is evolving at a remarkable pace." },
      { "speaker": "A", "text": "Absolutely. What do you think the next five years will look like?" },
      { "speaker": "B", "text": "I expect multimodal models to become the default in most apps." }
    ]
  }'

# 202 Accepted
# { "id": "3a9f1b7c-...", "status": "queued" }

GET /api/v1/jobs/{id} — Poll podcast job

curl https://your-domain.com/api/v1/jobs/3a9f1b7c-... \
  -H "Authorization: Bearer ev_your_api_key"

# {
#   "id": "3a9f1b7c-...",
#   "status": "completed",
#   "audio_url": "/audio/episode-3a9f1b7c.mp3",
#   "segments": [
#     { "speaker": "A", "audioUrl": "/audio/seg-0.mp3" },
#     { "speaker": "B", "audioUrl": "/audio/seg-1.mp3" },
#     { "speaker": "A", "audioUrl": "/audio/seg-2.mp3" },
#     { "speaker": "B", "audioUrl": "/audio/seg-3.mp3" }
#   ],
#   "error": null,
#   "created_at": 1749500000000,
#   "completed_at": 1749500045000
# }

Tier limits & voice options

PlanMax chars / episodeCloned hosts
Free2,000No
Pro

Pauses — <break>

Embed <break time="500ms"/> (or <break time="0.5s"/>) anywhere in the input text to insert a silent pause. Accepts milliseconds (ms) or seconds (s), capped at 3000ms per break.

Pronunciation overrides — [word](/phonemes/)

Embed [word](/phonemes/) inline in your input text to control pronunciation using misaki IPA notation. Example: The [Louisville](/lˈuːɪvɪl/) skyline.

DELETE /api/v1/voices/{id} — Delete a voice clone

Permanently deletes the voice clone and its embedding. Returns 200 on success.

curl -X DELETE https://your-domain.com/api/v1/voices/voice_abc123 \
  -H "Authorization: Bearer ev_your_api_key"

Using a cloned voice in speech synthesis

Pass the voice_* ID as the voice field in any speech request. Pro+ required.

curl -X POST https://your-domain.com/api/v1/audio/speech \
  -H "Authorization: Bearer ev_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro-82m",
    "input": "Hello, this is my cloned voice!",
    "voice": "voice_abc123"
  }' \
  --output speech.mp3

Step 3 — Use the vd_ voice in synthesis

Pass the vd_ voice ID as the voice field in any speech or podcast request.

curl -X POST https://your-domain.com/api/v1/audio/speech \
  -H "Authorization: Bearer ev_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro-82m",
    "input": "Welcome to the show.",
    "voice": "vd_..."
  }' \
  --output speech.mp3

Tier limits

PlanVoice designMax designed voices
FreeNo (403)—
ProYes10
Pro+Yes10

Returns 503 when design assist is unavailable (API key not configured server-side); the response body includes a heuristic fallback automatically in most cases. Returns 409 when the 10-voice limit is reached.

10,000
No
Pro+30,000Yes — use voice_* IDs

Use standard voice IDs (e.g. af_aoede, am_echo) for hosts A and B. Pro+ subscribers can assign cloned voice_* IDs returned by GET /v1/voices.