Ready to use the API?

Free account — no credit card required. Get an API key in 30 seconds.

Get API Key (Free Signup)

Pro Feature

API Documentation

Name: EasyVoice
Availability: InStock
Author: EasyVoice

OpenAI-compatible Text-to-Speech API. Drop-in replacement — change your base URL and API key.

Quick Start

curl -X POST https://your-domain.com/api/v1/audio/speech \
  -H "Authorization: Bearer ev_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro-82m",
    "input": "Hello, this is EasyVoice!",
    "voice": "af_aoede"
  }' \
  --output speech.mp3

POST /api/v1/audio/speech

Generate speech from text. Returns audio file directly.

Headers

Header	Value
Authorization	Bearer ev_your_api_key
Content-Type	application/json

Body Parameters

Parameter	Type	Required	Description
model	string	No	Always "kokoro-82m"
input	string	Yes	Text to convert (max 10,000 chars)
voice	string	No	Voice ID (default: af_aoede)
response_format	string	No	"mp3" or "wav" (default: mp3)
speed	number	No	0.5 to 2.0 (default: 1.0)

EasyVoice Extensions

EasyVoice adds optional audio controls on top of the OpenAI-compatible request. On the public POST /v1/audio/speech endpoint these use an ev_ prefix. All are optional and default to a no-op, so standard OpenAI clients that omit them receive identical output to before. These are deterministic audio/voicing controls (pitch shift, output gain, and EQ tone presets) — not generative effects.

Audio parameters

Parameter	Type	Required	Description
ev_pitch	number	No	Pitch shift in semitones. Range -4 to +4. Default: 0 (no shift).
ev_volume_db	number	No	Output gain in dB. Range -6 to +6. Default: 0 (no change).
ev_tone	string	No	EQ tone preset: "neutral" (default), "warm", "bright", or "bass".
<break> in input	markup	No	Insert silence: embed `<break time="500ms"/>` or `<break time="0.5s"/>` in the input text. Max 3000ms per break.

On the web app (POST /api/tts/generate) the same controls are sent without the prefix: pitch, volume_db, and tone (same ranges and defaults). Out-of-range values are rejected with 400.

Example request

curl -X POST https://your-domain.com/api/v1/audio/speech \
  -H "Authorization: Bearer ev_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro-82m",
    "input": "Take a breath. <break time=\"500ms\"/> Now continue.",
    "voice": "af_aoede",
    "ev_pitch": 2,
    "ev_volume_db": -1.5,
    "ev_tone": "warm"
  }' \
  --output speech.mp3

Python Example

from openai import OpenAI

client = OpenAI(
    api_key="ev_your_api_key",
    base_url="https://your-domain.com/api/v1"
)

response = client.audio.speech.create(
    model="kokoro-82m",
    voice="af_aoede",
    input="Hello from EasyVoice!"
)

response.stream_to_file("output.mp3")

Node.js Example

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "ev_your_api_key",
  baseURL: "https://your-domain.com/api/v1"
});

const mp3 = await client.audio.speech.create({
  model: "kokoro-82m",
  voice: "af_aoede",
  input: "Hello from EasyVoice!"
});

const buffer = Buffer.from(await mp3.arrayBuffer());
await fs.promises.writeFile("output.mp3", buffer);

Rate Limits

Limit	Value
Requests per minute	60
Max input length	10,000 characters
Characters per month	Unlimited (Pro plan)

Available Voices

56 voices across 9 languages. Use the voice ID in your API requests.

See the voice browser for the full list with audio previews.

Arabic: 10 voices ar_m1–ar_m5 and ar_f1–ar_f5 work directly in POST /v1/audio/speech. Arabic numerals, dates, and AED amounts are normalized to spoken form automatically. See Arabic text to speech.

Pro+ subscribers can also use cloned voice IDs (e.g. voice_abc123) returned by the GET /v1/voices endpoint.

Voice Cloning API

Pro+ Required

Enroll, list, and delete custom voice clones. Cloned voice IDs can be used directly in POST /v1/audio/speech requests. Requires a Pro+ subscription and explicit consent for each enrolled voice.

POST /api/v1/voices — Enroll a voice clone

Multipart form upload. Returns 202 with {voice_id, status:"enrolling"}. Enrollment typically completes in 1–2 minutes; poll GET /v1/voices for status.

curl -X POST https://your-domain.com/api/v1/voices \
  -H "Authorization: Bearer ev_your_api_key" \
  -F "name=My Voice" \
  -F "consent=true" \
  -F "audio=@sample.wav"

Field	Type	Required	Description
name	string	Yes	Display name for the cloned voice
consent	string	Yes	Must be `"true"` — attests you own/have consent to clone this voice
audio	file	Yes	WAV, MP3, MP4, or M4A. 15–60 seconds of clear speech recommended.

GET /api/v1/voices — List voice clones

Returns all cloned voices for the authenticated user. Use status === "ready" voices in speech requests.

curl https://your-domain.com/api/v1/voices \
  -H "Authorization: Bearer ev_your_api_key"

# Response:
# {
#   "voices": [
#     { "id": "voice_abc123", "name": "My Voice", "status": "ready", "createdAt": 1234567890 }
#   ]
# }

Async Jobs API

For long inputs, submit an asynchronous job instead of waiting on the synchronous endpoint. Jobs are queued, processed, and the result is fetched by polling. Works with any voice ID, including Arabic and cloned voices.

POST /api/v1/jobs — Submit a TTS job

curl -X POST https://your-domain.com/api/v1/jobs \
  -H "Authorization: Bearer ev_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "A long article text...",
    "voice": "af_aoede",
    "format": "mp3"
  }'

# 202 Accepted
# { "id": "1f6f7c2e-...", "status": "queued" }

GET /api/v1/jobs/{id} — Poll job status

curl https://your-domain.com/api/v1/jobs/1f6f7c2e-... \
  -H "Authorization: Bearer ev_your_api_key"

# { "id": "1f6f7c2e-...", "status": "completed",
#   "audio_url": "/audio/....mp3", "created_at": ..., "completed_at": ... }

Statuses: queued → active → completed / failed. Jobs are visible only to the account that created them.

Voice Design API

Pro Required

Design a custom voice by describing it in plain text. The API maps your description to 3 diverse preset-recipe candidates (baseVoice + speed + pitch). Pick one and save it as a vd_ voice you can use in any synthesis or podcast request. Pro or Pro+ required; max 10 designed voices per account.

Note: Voice design maps your description to existing preset voices with speed and pitch adjustments — it is not generative synthesis. Results are preset-recipe based. When design assist is temporarily unavailable a heuristic fallback is used automatically (the response includes "fallback":true).

Step 1 — POST /api/v1/voices/design — Get candidates

Also accepted as POST /v1/voices with a JSON body containing {"description":"..."}. Returns 3 candidate recipes to choose from.

curl -X POST https://your-domain.com/api/v1/voices/design \
  -H "Authorization: Bearer ev_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"description": "warm, calm British male narrator"}'

# 200 OK
# {
#   "candidates": [
#     {
#       "baseVoice": "bm_george",
#       "speed": 0.9,
#       "pitchShift": -1,
#       "label": "Deep & Calm",
#       "rationale": "British male with slightly lowered pitch and relaxed pace."
#     },
#     { "baseVoice": "bm_lewis", "speed": 0.85, "pitchShift": 0, ... },
#     { "baseVoice": "am_echo",  "speed": 1.0,  "pitchShift": -1, ... }
#   ]
# }

Step 2 — POST /api/v1/voices — Save a designed voice

Pick a candidate recipe from Step 1 and save it with a name. Returns avd_ voice ID you can use immediately in synthesis.

curl -X POST https://your-domain.com/api/v1/voices \
  -H "Authorization: Bearer ev_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "designed",
    "name": "My Narrator",
    "recipe": {
      "baseVoice": "bm_george",
      "speed": 0.9,
      "pitchShift": -1
    }
  }'

# 200 OK
# { "voice_id": "vd_...", "status": "ready" }

Podcast API