OpenAI TTS API Alternative — Drop-in Migration to Flat-Rate
OpenAI's audio.speech.create endpoint is the obvious default for any developer already authenticated to the OpenAI platform — but at $15 per 1M characters on tts-1 and $30 per 1M on tts-1-hd with no free tier, the cost model punishes scale and surprises finance every quarter. EasyVoice's TTS API mirrors OpenAI's request shape closely enough that swapping is usually a 5-line code change, replaces per-character pricing with flat $9.99/mo unlimited Pro (5K chars/day free, no card), and exposes 46 voices across 8 languages on the open-weight Kokoro-82M model. This guide covers the voice mapping table, the literal code diff, response-format compatibility, and what is and isn't drop-in.
5,000 characters per day on the free tier, no credit card. Pro $9.99/mo unlimited. 46 voices, 8 languages.
Why migrate from OpenAI TTS
OpenAI's TTS endpoint shipped in November 2023 and has been the developer-default ever since — partly because it's good, partly because if you're already paying for GPT-4 and Whisper the marginal friction of adding TTS is zero. But the pricing model has gotten more painful as voice features have become standard product surface area. At $15 per 1M characters on tts-1, a midsize content app generating 5M chars/month pays $75/mo just for the TTS call — and tts-1-hd doubles that. EasyVoice's flat $9.99/mo Pro tier is the same money at 666K chars/mo and pure savings on anything larger. At 10M chars/mo, OpenAI bills $150 vs our $9.99 — a 15× cost difference for what is, in blind A/B tests, indistinguishable narration on most prose.
The second migration driver is the free tier. OpenAI charges from request one — there is no free tier, no monthly allowance, no on-ramp for indie developers or hobby projects. EasyVoice offers 5,000 characters per day on the free API tier with a daily reset, no credit card. That's enough for most side projects to never need to pay, and the same daily-reset model means evaluators don't hit a 'trial expired' wall in week two. The third driver is open-weight auditability — regulated industries, government accessibility contracts, and EU AI-Act-compliant pipelines often need to point at the model weights. Kokoro-82M is Apache-2.0 on Hugging Face. OpenAI's TTS models are closed.
What's drop-in (and what isn't)
Drop-in: the request body shape. EasyVoice accepts {voice, input, response_format} in the same JSON layout OpenAI's SDK sends. Drop-in: the Bearer-token auth header — set Authorization: Bearer YOUR_EASYVOICE_KEY exactly like Authorization: Bearer YOUR_OPENAI_KEY. Drop-in: response formats mp3 (default), wav, and opus all work. Drop-in: streaming via chunked transfer-encoding — the response is byte-streamable the same way OpenAI's stream=True flag delivers chunks.
Not drop-in: the URL. OpenAI's endpoint is https://api.openai.com/v1/audio/speech; EasyVoice's is https://easyvoice.ae/api/tts/generate. Not drop-in: voice IDs — OpenAI's six voices (alloy, echo, fable, onyx, nova, shimmer) need to map to Kokoro voice IDs (af_alloy, am_echo, bm_fable, am_onyx, af_nova, af_jessica respectively). The mapping table below covers it. Not drop-in: the model parameter. OpenAI accepts model: 'tts-1' or 'tts-1-hd'. EasyVoice doesn't gate fidelity by model name — there's a single Kokoro-82M model serving all voices, so the model parameter is optional and ignored if sent.
Voice mapping — OpenAI → EasyVoice (Kokoro)
The closest sonic match for each OpenAI voice, based on register, pitch range, and listener-perception A/B tests: alloy → af_alloy (warm female mid-range, the closest direct match). echo → am_echo (neutral male, slightly above the OpenAI baseline pitch). fable → bm_fable (British male storytelling voice, the natural fable-narrator fit). onyx → am_onyx (deep American male baritone, the heaviest voice in either catalog). nova → af_nova (bright female with energy, matching the OpenAI nova brand). shimmer → af_jessica (soft female with shimmer-like air in the upper register).
These mappings are approximate — voice perception is subjective and pitched against your particular content. For literary fiction or audiobook work, we recommend generating 30-second samples of the same passage in both vendor voices and listening blind before committing to a mapping. The EasyVoice catalog has 40 additional voices beyond the OpenAI-equivalent six — once you've migrated, browse /voices to find regional, age-banded, and character voices the OpenAI catalog doesn't offer.
Response formats and streaming
Both APIs return raw audio bytes (not JSON-wrapped base64). response_format mp3 is the default and the right choice for most app integrations — it's the smallest payload and decodes natively on every browser and audio library. wav is the right choice for downstream audio processing pipelines (mixing into a podcast track, applying filters in Audacity, importing into a DAW) where lossless quality matters more than file size. opus is the right choice for real-time streaming over WebRTC or other latency-sensitive transport — opus's frame structure handles packet loss more gracefully than mp3.
Streaming works identically: hit the endpoint with the same body and read the response as chunks. The HTTP response opens with Content-Type: audio/mpeg (or audio/wav, audio/opus) and Transfer-Encoding: chunked, so any standard HTTP library — fetch in browser/Node, requests with stream=True in Python, http.Client in Go — delivers byte chunks as they're synthesized. First-byte latency on EasyVoice is typically 300-600ms warm, vs OpenAI tts-1 at 800ms-1.5s. The streaming advantage matters most for chatbot and IVR use cases where time-to-first-audio is the perceived latency, not total generation time.
What about ChatGPT plugin / OpenAI Realtime API users
OpenAI's Realtime API ships speech-in/speech-out in a single bidirectional WebSocket — it's a different API than audio.speech.create and not directly comparable. EasyVoice's TTS API is text-in/audio-out only; if you need full voice-to-voice agent behavior, the Realtime API or a stack like LiveKit + EV's TTS is the right call. For the vast majority of TTS use cases (read-aloud, accessibility, notifications, chatbot response synthesis with text already produced by your LLM of choice), the simple POST-and-receive-audio model EasyVoice ships is the right shape and the cheapest option.
If you're using ChatGPT plugins or the OpenAI Assistants API and want to swap just the TTS leg out for cost reasons, the integration is straightforward: your existing OpenAI auth handles GPT-4o, your new EasyVoice auth handles TTS, and you call them as two separate HTTP requests in your backend. Auth keys are isolated by vendor — there's no shared rate-limit pool or shared billing surface. We've intentionally not built an 'OpenAI compatibility mode' that pretends to be the OpenAI endpoint at api.openai.com; honest separation of vendor surface keeps debugging tractable when one of the APIs misbehaves.
Pricing breakeven math
OpenAI tts-1 charges $15 per 1M characters. EasyVoice Pro is $9.99/mo flat unlimited. The breakeven point is therefore 666,000 characters per month — about 100,000 words. If your app generates more than 100K words of synthesized audio per month, EasyVoice is cheaper. If your app generates more than 1M words per month, EasyVoice is 15× cheaper. For tts-1-hd at $30/1M, breakeven drops to 333K chars/mo (50K words).
Most apps that ship voice features cross those thresholds quickly. A read-aloud feature on a content app with 1,000 daily active users averaging 5 minutes of audio per session generates roughly 4.5M chars/day — under EasyVoice that's $9.99/mo total, under OpenAI tts-1 that's ~$2,000/mo. Even if your usage is well under 666K chars/mo today, building on a flat-rate API means feature growth isn't a budget event — adding 'audio version of every article' to a content product costs zero incremental cents. For the full comparison page with calculator, see /compare/openai-tts.
Code samples
Drop-in examples for the EasyVoice TTS API. Every request below assumes you've set EASYVOICE_API_KEY as an environment variable.
Before — OpenAI tts-1
Python with the official openai SDKfrom openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="Hello, this is OpenAI's TTS API.",
response_format="mp3",
)
with open("out.mp3", "wb") as f:
f.write(response.content)After — EasyVoice
Python with stdlib requests, 5 lines changedimport os, requests
res = requests.post(
"https://easyvoice.ae/api/tts/generate",
headers={"Authorization": f"Bearer {os.environ['EASYVOICE_API_KEY']}"},
json={
"voice": "af_alloy", # was "alloy"
"input": "Hello, this is EasyVoice TTS API.",
"response_format": "mp3",
},
)
with open("out.mp3", "wb") as f:
f.write(res.content)JavaScript / TypeScript
fetch in Node 18+ or modern browsersconst res = await fetch("https://easyvoice.ae/api/tts/generate", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.EASYVOICE_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
voice: "am_onyx", // was "onyx"
input: "Hello from EasyVoice.",
response_format: "mp3",
}),
});
const buf = Buffer.from(await res.arrayBuffer());
require("fs").writeFileSync("out.mp3", buf);curl
Quick smoke-test from the terminalcurl -X POST https://easyvoice.ae/api/tts/generate \
-H "Authorization: Bearer $EASYVOICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"voice":"af_heart","input":"Hello from curl.","response_format":"mp3"}' \
--output out.mp3Voices to try with the API
Every voice below is callable via the same voice parameter — preview samples and read the full character profile.
Frequently asked questions
Is EasyVoice fully OpenAI-API-compatible?▾
Not 100% drop-in — the URL changes from api.openai.com/v1/audio/speech to easyvoice.ae/api/tts/generate, and voice IDs map (alloy → af_alloy, onyx → am_onyx, etc.). But the request body shape, auth header, and response handling are identical, so the swap is typically a 5-line code change. We've intentionally not built an 'OpenAI compatibility shim' at api.openai.com — honest vendor separation makes debugging tractable when one of the APIs misbehaves.
What's the closest EasyVoice voice to OpenAI's alloy / onyx / nova?▾
Use the mapping: alloy → af_alloy, echo → am_echo, fable → bm_fable, onyx → am_onyx, nova → af_nova, shimmer → af_jessica. These are approximate sonic matches — pitch, register, and brightness aligned. For literary fiction or audiobook work, we recommend generating 30-second samples of the same passage in both vendor voices and listening blind before committing to a mapping. Once you've migrated, the EasyVoice catalog has 40 additional voices beyond the OpenAI six.
Does it support tts-1-hd-quality output?▾
EasyVoice doesn't gate fidelity by model name — Kokoro-82M serves a single high-quality model for all voices. In blind A/B tests against OpenAI tts-1, listeners can't reliably tell the two apart on prose-length narration. Against tts-1-hd, OpenAI's higher-fidelity model occasionally wins on close-mic'd character voice work; for general narration, accessibility readouts, chatbot responses, and content audio versions, the difference is rarely perceptible. The model parameter in your request body is optional and ignored if sent.
How much do I save migrating from OpenAI tts-1?▾
Breakeven at 666,000 characters per month (~100,000 words). Below that, OpenAI tts-1 is marginally cheaper. Above that, EasyVoice Pro at $9.99/mo flat is cheaper. At 1M chars/mo, OpenAI bills $15 vs EV's $9.99. At 10M chars/mo, OpenAI bills $150 vs EV's $9.99 — a 15× cost difference. tts-1-hd doubles OpenAI's price, dropping the breakeven to 333K chars/mo. For most apps that ship a read-aloud feature, monthly TTS usage crosses 1M chars within weeks of launch.
Can I use EasyVoice with the OpenAI SDK?▾
The official openai Python and Node SDKs are hardcoded to api.openai.com — they don't support custom base URLs for the audio.speech endpoint. The simpler integration is to use stdlib HTTP (requests in Python, fetch in JS) for the EasyVoice TTS call, and keep the openai SDK for GPT-4o, Whisper, and other OpenAI endpoints. Your existing OpenAI auth stays untouched; the new EasyVoice auth is a separate Bearer token.
Is there a free tier I can test on before paying?▾
Yes — 5,000 characters per day on the free API tier with a daily reset, no credit card required. That's enough to prototype, build the full integration, run end-to-end tests in CI, and ship a free-tier of your own product on top. Sign up at /signup to get an API key. Pro at $9.99/mo unlimited unlocks the cap when you scale past 5K chars/day.
Related TTS API guides
Free TTS API — 5,000 Characters Per Day, No Credit Card
Free TTS API with 5,000 characters per day, daily reset, no credit card. 46 voices, 8 languages, OpenAI-compatible. Sign up, get a key, ship in 60 seconds.
TTS API for Developers — Bearer Auth, OpenAI Shape, Flat Pricing
TTS API for developers — Bearer auth, OpenAI-compatible request shape, curl/JS/Python/Go samples. 5K chars/day free. $9.99/mo unlimited Pro. 46 voices.
Comparing vendors? See EasyVoice vs elevenlabs →
Start building with the EasyVoice TTS API
5,000 characters per day free, no credit card. Pro $9.99/mo unlimited. OpenAI-compatible request shape.