20 American English AI voices. Male and female options. Generate natural-sounding speech for free.
American English is the most-requested language for text-to-speech globally. With roughly 240 million native speakers in the United States (US Census Bureau ACS 2023) and several hundred million more who use it as a second language for business, software, and media consumption, it powers the bulk of the world's voiceover, e-learning, and AI assistant content. EasyVoice ships 20 American English voices on the Kokoro engine — the broadest catalog we offer for any single locale — covering 11 female and 9 male timbres ranging from warm and conversational (af_heart, af_bella) to authoritative and broadcast-style (am_michael, am_onyx). All 20 voices are tuned for the General American accent baseline — the rhotic, regionally neutral pronunciation used by US national news anchors, YouTubers, and most LA-based voice talent — so the default output reads as 'standard American' rather than tied to a specific city. We see American English used most heavily for YouTube narration, online course modules, podcast intros, IVR phone trees, app onboarding flows, and accessibility read-aloud features. The combination of 20 voices, a 5,000-character/day free tier (no credit card), and the $9.99/mo unlimited Pro plan makes EasyVoice the cheapest production-grade option for creators churning out daily American English content where ElevenLabs' per-character billing punishes high-volume scripts. According to Grand View Research's 2024 Text-to-Speech Market Report, the global TTS market is projected to reach roughly $7.6 billion by 2030, with North America accounting for the largest revenue share — driven primarily by demand for American English voiceover in the creator economy, accessibility compliance under the ADA, and enterprise contact-center automation.
EasyVoice's American English voices target General American (GenAm) by default — the accent associated with national broadcast media, the upper Midwest baseline, and most professionally trained US voice actors. GenAm is rhotic (the 'r' in 'car' is pronounced), uses the cot-caught merger that's now common across most of the western US, and avoids the strongest regional markers of New York (non-rhotic, raised /ɔ/), Boston (dropped 'r', broad 'a'), Southern American (monophthongal /aɪ/, drawled vowels), or African American Vernacular English. Within the GenAm bucket, our voices vary by warmth and pitch rather than by region: af_aoede and af_bella sit in a brighter, more expressive register suited to lifestyle and creator content; am_michael and am_eric land lower and more measured for documentary, training, and corporate narration; am_santa and am_puck lean character-forward for game and animation work. Kokoro's American voices handle AAVE pronunciation patterns reasonably well for natural-sounding speech when input text uses standard English spelling — for buyers needing distinct Southern, New York, Boston, or character-cloned regional accents, those specific cuts are not currently available as dedicated voice models and are tracked on the roadmap.
The neutral broadcast baseline — rhotic, with the cot-caught merger common across the western US, and no strong regional vowel markers. This is the default register used by national news anchors, mainstream YouTube creators, and the bulk of LA-based commercial voice talent. All 20 EasyVoice American English voices land here by default, with af_heart, af_bella, am_adam, am_michael, and am_eric being the strongest natural fits. If your audience is geographically distributed across the US or your script needs to feel 'standard American' without signalling a specific region, GenAm is the right pick — and every EasyVoice American English voice covers it natively.
The Southern register stretches across Texas, the Deep South, Appalachia, and parts of the Ozarks, and is characterized by the Southern vowel shift (monophthongal /aɪ/ so 'time' sounds closer to 'tahm'), drawled diphthongs, and a slower default cadence. EasyVoice does not currently ship a dedicated Southern-accented voice model — our American English voices target GenAm by default and will not reproduce Southern vowel shifts even with phonetic spelling. Buyers producing content aimed specifically at Southern audiences (Country music marketing, regional Texas advertising, Southern political ads) typically pair EasyVoice with a regional voice specialist for the Southern cuts. Dedicated Southern variants are on the longer-term roadmap.
Midwestern English — including the Inland North accent of Chicago, Milwaukee, Detroit, and Cleveland and the distinct Upper Midwest 'Minnesota nice' variety made famous by Fargo — is often confused with GenAm but actually has its own markers: the Northern Cities Vowel Shift (where 'block' can sound closer to 'black'), a flatter intonation, and the famous Minnesota/Wisconsin elongated 'o'. Because GenAm itself is loosely Midwest-derived, EasyVoice's am_adam, am_michael, and am_eric voices read as comfortably neutral to Midwestern audiences without needing a dedicated regional model. For deliberate caricature work (an SNL-style Minnesota character, for example), a regional specialist would be the better fit.
The classic Eastern New England accent — non-rhotic ('pahk the cah'), with the broad 'a' in words like 'bath' and 'class' and a distinctive raised /ɔ/ in 'thought' — is one of the most caricatured American accents in media. EasyVoice does not ship a dedicated Boston/Northeastern voice model. Our American English voices are rhotic by training, so phonetic respelling will not produce a convincing Boston accent — the dropped 'r' is a structural feature of the underlying speech model, not a prosody knob. Content explicitly set in Boston, New England, or wider New York metro requires a regional specialist; for content with characters from those regions delivered in standard narration, GenAm reads as the natural choice.
AAVE is a fully developed linguistic system spoken by tens of millions of Americans, with distinct grammar (habitual 'be', double negation, copula deletion), phonology (consonant cluster reduction, final-stop deletion, distinct vowel realizations in words like 'pin'/'pen'), and prosody. EasyVoice does not currently ship a voice model specifically trained on AAVE speech — Kokoro's am_michael and af_jessica handle AAVE pronunciation patterns reasonably well for natural-sounding speech when input is written in standard American English spelling, but the output will not reproduce AAVE-specific grammar features (because those are textual choices made before TTS), and will not authentically render AAVE phonology in the way a dedicated model would. For projects centred on AAVE-speaking characters or aimed at audiences for whom AAVE-authentic delivery matters, a specialist provider is the better choice. A dedicated AAVE-trained voice is on the longer-term roadmap, with the difficult ethical question of who consents to be cloned for such a model being part of why we have not shipped it yet.
EasyVoice is the practical answer to 'American accent generator', 'American accent reader online', and 'make text sound American' search queries. Paste any English text into the app, choose any of the 20 American English voices, and EasyVoice generates broadcast-quality American-accented audio in seconds — no recording, no microphone, no human voice talent. The free tier covers 5,000 characters per day (roughly 35 minutes of finished audio), enough for a creator to produce a daily short-form video or a developer to run small-batch read-aloud tests indefinitely without payment. For volume work — full YouTube videos, podcast episodes, course modules, audiobook drafts — the $9.99/mo Pro plan removes the daily cap entirely and includes API access for programmatic generation. Unlike browser-based 'accent generators' that produce robotic Festival/eSpeak output, EasyVoice runs on Kokoro-82M, an open-source neural TTS model trained on natural American English speech, so the output sounds like a person rather than a 1990s text-to-speech engine. Output is delivered as MP3, WAV, or OPUS for direct use in video editors, podcast software, and game engines.
African American English (AAE) — also called AAVE or Black English — is one of the most-spoken native varieties of American English, used by tens of millions of speakers, yet remains underserved across the major TTS providers. EasyVoice does not currently ship a voice model specifically trained on AAVE speech, and we want to be honest about that rather than overclaim. Two of our voices, am_michael (a deep American male) and af_jessica (a warm American female), handle AAVE pronunciation patterns reasonably well for natural-sounding speech in standard American English contexts — they will read AAVE-grammar input (habitual 'be', copula deletion) as written without flagging it as ungrammatical, and their default prosody is closer to AAVE rhythms than the brighter GenAm voices like af_aoede or af_bella. They are not, however, AAVE-trained models, and a listener familiar with AAVE phonology will hear the difference. A dedicated AAVE-trained voice is on our longer-term roadmap; it is not currently available because the responsible path to building it — securing informed consent from AAVE-speaking voice donors, ensuring AAVE speakers themselves are credited and compensated, and avoiding cultural appropriation — takes time, and we would rather ship it right than rush it.
Three popular American English voices — listen to samples and explore details.
Warm, conversational American female — the Kokoro flagship for long-form narration, audiobooks, and explainer voiceover.
Deep, calm American male — the audiobook narrator voice in the free tier, handles AAVE pronunciation patterns reasonably well.
Energetic, expressive American female — fast cadence and smile-in-the-voice ideal for ads, hooks, and short-form social.
Polished, neutral American female — versatile mid-range narration voice suited to corporate explainers and product walkthroughs.
Bright, slightly higher-pitched American female — the modern creator-economy voice for YouTube, TikTok, and Instagram Reels.
Energetic, expressive American female — fast cadence and smile-in-the-voice ideal for ads, hooks, and short-form social.
Warm, conversational American female — the Kokoro flagship for long-form narration, audiobooks, and explainer voiceover.
Confident, mid-pitched American female — natural fit for podcast hosting and narrative-style content, handles AAVE pronunciation patterns well.
Crisp, focused American female — broadcast-news clarity, strong for tech tutorials and developer content.
Soft, intimate American female — close-mic-style delivery for meditation, ASMR-adjacent audio, and gentle narration.
Modern, energetic American female — slightly more polished than Bella, popular for SaaS marketing and product launches.
Mellow, mid-pitched American female — warm but understated, good for documentary narration and reflective content.
Confident, professional American female — corporate-training voice with measured authority.
Light, airy American female — bright timbre for kids' content, educational shorts, and upbeat explainer videos.
Confident American male baritone — the default narrator voice for explainers, ads, and corporate VO.
Mid-baritone American male with a slightly resonant quality — works well for podcast intros and brand voice work.
Steady, measured American male — neutral broadcast register for training, documentation, and tech reviews.
Deep, slightly gravelly American male — character-forward voice for game narration and dramatic reads.
Younger-skewing American male — mid-pitched and casual, fits creator content and conversational narration.
Deep, calm American male — the audiobook narrator voice in the free tier, handles AAVE pronunciation patterns reasonably well.
Low-bass American male — the deepest voice in the catalog, ideal for dramatic ad reads and trailer-style narration.
Playful, character-forward American male — mid-pitched with personality, suited to animation, games, and YouTube character work.
Warm, jolly American male — character voice with a noticeable smile, great for holiday content and storybook reads.
What teams typically build with American English voices on EasyVoice.
American English carries the bulk of the global eLearning market — Coursera, Udemy, MasterClass, LinkedIn Learning, and Khan Academy all default to GenAm narration. EasyVoice's flat $9.99/mo Pro plan makes it the practical choice for indie course creators producing weekly modules where ElevenLabs' per-character billing becomes punitive. af_heart and am_michael are the most popular picks for course narration; af_kore and am_adam suit more technical, tutorial-style modules.
Podcast production teams use EasyVoice for show intros, sponsor reads, mid-roll ad inserts, and trailer voiceover — work that historically required hiring a voice actor for each new sponsor. A consistent voice like am_adam or af_nova as your show's branded intro voice, generated unlimited at $9.99/mo, replaces a recurring $50-100/show voice talent fee. The API access on Pro lets podcast hosting platforms (Buzzsprout, Transistor, Captivate) integrate dynamic ad-read generation directly.
Business IVR ('press 1 for billing, press 2 for support') has historically used flat, robotic Polly-style voices that immediately signal 'low-budget company'. EasyVoice's natural-sounding American voices — am_michael for authoritative IVR, af_sarah for friendly support flows — produce phone-tree audio that listeners actually tolerate. The Pro API integrates directly with Twilio, Vonage, and contact-center platforms, with audio generated as 8 kHz μ-law WAV when needed for telephony format compliance.
YouTube is the largest single market for American English TTS — creators producing daily Shorts, weekly long-form videos, and faceless channels in finance, history, science, and explainer niches are EasyVoice's largest user segment. The 5K/day free tier covers a 3-4 minute Short daily; Pro unlimited covers long-form channels. Most creators settle on one voice as their channel identity (af_heart and am_adam are the two most-picked) and use the API for batch generation as part of their video pipeline (CapCut, Descript, Premiere Pro automation).
AI-narrated audiobooks have become a viable category on Audible (via ACX), Spotify Audiobooks, Findaway Voices, and Google Play Books, particularly for indie nonfiction and self-published fiction where traditional voice talent ($200-400/finished hour) is cost-prohibitive. am_michael is the most popular audiobook voice in the EasyVoice catalog for nonfiction; af_heart for fiction and memoir. A typical 50,000-word manuscript generates in chunks under the Pro plan with no per-character fees, making the marginal cost of an audiobook draft effectively zero beyond the $9.99/mo subscription.
Every great story starts with a voice the listener trusts. Whether you are building an online course, scripting a podcast episode, or producing your next YouTube video, the right tone keeps your audience engaged from the first sentence to the last. EasyVoice's American English voices deliver that natural, broadcast-quality cadence: paste your script, choose your voice, and download ready-to-use audio in seconds.
Kokoro's American English voices read cardinal numbers ('1,000' as 'one thousand'), ordinal dates ('March 15' as 'March fifteenth'), and common acronyms (USA, FBI, NASA) naturally in most scripts. For short acronyms that could be read either way — 'AI', 'TTS', 'SQL' — writing out the intended pronunciation ('artificial intelligence', 'A.I.') gives the most consistent output. Sentence-ending punctuation controls pacing: a period adds a full cadence pause; an em dash creates a natural mid-sentence breath. For longer pauses, break the sentence into two shorter sentences — the API's chunk-stitching adds a natural breath between them.
20 American English voices — 11 female and 9 male — all tuned for the General American accent baseline. That makes American English our largest voice catalog by a wide margin, and 6 of the 10 voices on the free tier are American English.
Native-sounding General American. The Kokoro models are trained on American English speech directly — they are not the same model speaking with an American accent. Listeners typically can't distinguish the output from a US-based human voice actor in casual content.
Yes. Both the free tier and Pro plan permit commercial use under EasyVoice's terms of service, including monetized YouTube videos, paid courses, client-billed marketing assets, and SaaS products. There is no separate commercial license fee.
ElevenLabs' American English voices are slightly more emotive and support voice cloning, but billing scales with characters — heavy users pay $22–$99/mo. Google Cloud TTS WaveNet voices are robotic by comparison and require GCP setup. EasyVoice is $9.99/mo flat for unlimited characters and ships 20 ready-to-use voices with no provisioning step.
Pro accounts can generate scripts of effectively any length — long-form narration is split into chunks server-side and stitched seamlessly. The free tier resets 5,000 characters per day, which covers roughly a 35-minute narration script.
Not authentically with EasyVoice today. Our American English voices target General American by default and do not reproduce Southern vowel shifts (the monophthongal /aɪ/, drawled diphthongs) even with phonetic respelling — those features are baked into the underlying speech model, not adjustable via prosody controls. For content explicitly requiring a Southern accent, pair EasyVoice with a regional voice specialist. A dedicated Southern variant is on our longer-term roadmap.
Partially. We do not ship a voice model specifically trained on AAVE speech, but two voices — am_michael (deep American male) and af_jessica (warm American female) — handle AAVE pronunciation patterns reasonably well for natural-sounding speech when input uses standard English spelling. They will read AAVE-grammar input (habitual 'be', copula deletion) as written and their default prosody is closer to AAVE rhythms than the brighter GenAm voices. A dedicated AAVE-trained voice is on our roadmap; it is not currently available because we want to do it responsibly (informed consent from voice donors, AAVE speakers credited and compensated) rather than rush an inauthentic model to market.
For corporate, training, and broadcast-style work: am_michael (deep, measured baritone) and af_sarah (confident, professional female) are the top picks. For tech and developer-facing content: af_kore (crisp, focused) and am_adam (mid-baritone explainer). For brand work where you want warmth without losing credibility: af_heart and af_nova. The free tier lets you A/B all of these in minutes — most teams settle on a default after testing 3-4 voices on the same script.
Three big differences: (1) rhoticity — General American pronounces the 'r' in 'car' and 'father'; British Received Pronunciation drops it; (2) vowel placement — General American uses the flat 'a' in 'bath' and 'dance'; British RP uses the broad 'a'; (3) prosody — General American is rhythmically flatter, British RP has more pitch variation and a slower default cadence. Both are supported on EasyVoice (20 American voices, 8 British voices), and most teams pick based on audience: US-targeted content gets American voices, UK and Commonwealth audiences get British voices like bf_emma and bm_daniel.
Yes, on every plan including the free tier. EasyVoice's terms of service grant full commercial usage rights to all generated audio — paid YouTube content, paid courses, client work, monetized podcasts, commercial SaaS products, paid audiobooks, and paid advertising are all explicitly permitted. There are no royalties, no per-project licensing fees, and no attribution requirement. The Kokoro-82M model is Apache-licensed at the engine level, and EasyVoice's user-facing terms confirm full commercial rights on generated output.
Yes — these are the three largest use cases for EasyVoice American English. YouTube creators produce daily Shorts and weekly long-form videos with voices like af_heart, af_bella, and am_adam; podcasters use am_michael and af_nova for show intros, sponsor reads, and narrative segments; TikTok creators use af_bella and af_aoede for fast, energetic short-form. All commercial monetization is permitted under our terms. The flat $9.99/mo Pro plan covers unlimited generation, which materially undercuts ElevenLabs and PlayHT for high-volume creator workloads.
Free American English text to speech on EasyVoice gives you 5,000 characters per day — enough for a 3-to-4-minute narration — with no credit card and no account required. Six of EasyVoice's free-tier voices are American English (including af_heart and am_michael, the two most popular for content work), covering both female and male timbres. Paste your script, pick a voice, and download as MP3 or WAV in under a minute. For unlimited characters, all 20 American English voices, and API access for programmatic pipeline integration, the Pro plan is $9.99/mo.