AI Podcast Generator — Turn Any Article into a Two-Host Podcast
Paste any article into EasyVoice, and the AI podcast generator writes a natural two-host conversation script, assigns Host A and Host B distinct voices, synthesizes each speaker segment individually, and stitches everything into a finished MP3 episode — all within minutes. No microphone. No editing software. No studio booking. The entire pipeline from plain text to publishable audio runs automatically: the language model writes the script, the neural TTS engine voices each line, and the audio stitcher normalizes loudness across the full episode before the file downloads.
Two-host dialogue format makes dense written content genuinely listenable. A single narrator reading an article is fine for short-form content; a conversation between two distinct voices — one asking clarifying questions, the other providing context, both referencing specific claims from your source material — sustains listener attention across longer pieces. Commuters, gym-goers, and people processing a long reading list are the primary audience: they want the ideas in your article, but in a format that works while their eyes are otherwise occupied.
How the AI podcast generator works
The pipeline has four steps. Understanding each step helps set accurate expectations about what the generator does and does not do in v1.
Step 1 — Paste your article
Copy and paste the full text of any article, blog post, research summary, or document into the podcast generator. URL fetch is not supported in v1 — paste the text directly. Articles between 300 and 3,000 words produce the most natural episode length. Longer articles are supported on Pro and Pro+ but the script compresses content rather than reading every sentence verbatim.
Step 2 — AI writes a two-host script
A language model reads your article and writes a conversational script with an intro, a body section organized around the article's main claims, and an outro. Host A and Host B take turns — Host A tends to present claims and context; Host B asks clarifying questions, adds caveats, and keeps the pacing natural. Stage directions are stripped from the final output; the episode is pure dialogue.
Step 3 — Pick two voices
Choose a voice for Host A and a voice for Host B from the EasyVoice catalog — 46 voices across English (American and British), Arabic, Spanish, French, Italian, Portuguese, Japanese, and Hindi. On Pro+, you can assign your own cloned voice to either host slot. The two voices are synthesized independently using the Kokoro-82M neural TTS model, which means each speaker segment is generated from scratch, not pitch-shifted from a single recording.
Step 4 — Download the finished MP3
Each speaker segment is synthesized individually and then stitched together into a single audio file. Loudness normalization runs across the full episode before the file is written — so the two voices feel like they are in the same room, not recorded on separate rigs. The output is a finished MP3 ready to publish, share, or embed. No post-processing required. No DAW needed.
Two-host conversational format — design decisions
The AI podcast generator is specifically built around the two-host format rather than a single narrator because the conversational structure solves several problems that single-voice narration does not. Here is the design reasoning behind each key decision in v1.
Host A and Host B, not a single reader. Single-narrator audiobook-style playback works well for fiction and short articles but loses listeners on dense or technical content. When there are two distinct voices having a genuine back-and-forth, the listener can follow the argument structure more naturally — the question-and-answer pattern maps to how people actually discuss ideas, and it gives the AI a structural scaffold for organizing complex material without losing narrative thread.
Intro and outro included. The generated script opens with a brief intro that names the topic and sets listener expectations — "Today we're covering…" — and closes with an outro that recaps the main points and invites the listener to the source. This is standard podcast structure. It makes the episode feel complete rather than a raw text reading with a hard stop at the end.
No stage directions in the audio output. The script-writing step may include internal production notes (pause here, emphasize this phrase) but those are stripped before synthesis. The audio contains only the spoken dialogue. What you hear is what a listener would hear — no "(laughs)" artifacts, no "[HOST A pauses]" being read aloud.
No background music in v1. Background music that is legally distributable under podcast licensing adds meaningful cost and complexity. Rather than ship a low-quality or improperly licensed music layer, v1 ships voice-only. The output is still broadcast-ready — loudness normalization ensures professional audio levels — and it is easier for creators to layer their own branded music if they want it.
Use your own cloned voice as a podcast host (Pro+)
On the Pro+ plan, you can clone your own voice and assign it as Host A or Host B in any generated episode. This means the podcast sounds like you — your cadence, your register, your voice — even though you never sat in front of a microphone for that episode. Content creators, educators, and newsletter writers use this to maintain a consistent personal brand voice across more content than they could manually record.
Consent attestation is required. When you create a voice clone on EasyVoice, you complete a one-time consent attestation confirming that you either are the voice owner or hold explicit written permission from the voice owner. Cloning a voice without the owner's consent is prohibited under the EasyVoice Terms of Service. See /voice-cloning for the full consent policy and what the attestation covers.
AudioSeal watermarking on all cloned output. Every audio clip generated using a cloned voice — including podcast episodes where a cloned voice is assigned to Host A or Host B — carries an inaudible AudioSeal watermark embedded at synthesis time, before the file is written. The watermark survives re-encoding and re-compression. It is not audible to a human listener but is machine-detectable, meaning any cloned output is provably synthetic and traceable back to the EasyVoice platform. This protects the voice owner, the listener, and you as the content creator.
Pro+ includes up to three custom voice clones. Each clone is usable across the podcast generator, the standard TTS editor, and the API under the same subscription. Pro+ is $19.99/mo or $49.99/qtr. See /pricing for the full comparison with Pro and free tier limits.
Languages and voice catalog
The EasyVoice voice catalog includes 46 voices across eight languages: American English, British English, Arabic, Spanish, French, Italian, Portuguese, Japanese, and Hindi. All voices are neural TTS voices built on Kokoro-82M — they are not English-engine voices reading translated text, but native-speaker-trained models for each language.
For the AI podcast generator, English presets are the primary supported configuration in v1. Arabic voices are fully supported for both host slots. Other language combinations work at the voice synthesis level but the script-writing step currently generates the conversational script in English — multilingual script generation is on the roadmap. If you are working with Arabic content, you can paste Arabic text and assign Arabic voices; the language model will produce an Arabic-language two-host script. For other non-English articles, the current recommendation is to paste an English translation of the source material.
Pricing and episode limits
Free
One short episode per day. Approximately 3–5 minutes of audio. Standard catalog voices only. No credit card required. Daily reset — you can evaluate the generator indefinitely without paying.
Full tier comparison including API rate limits, request size caps, and annual billing options at /pricing.
Related tools and guides
Article to Podcast — content repurposing guide
The content-repurposing angle: why converting existing blog posts and articles into podcast episodes expands reach without new writing work, who the audience is, and how the paste-only workflow fits a weekly publishing cadence.
Voice cloning — consent-first, watermarked
Clone your own voice for use as a podcast host. Consent attestation required. AudioSeal watermark on every cloned output. Pro+ feature at $19.99/mo with up to 3 custom clones.
Pricing — free, Pro, Pro+
Full tier comparison: free 1 episode/day, Pro $9.99/mo standard-length, Pro+ $19.99/mo long-form plus cloned hosts. API rate limits and request size caps included.
Voice catalog — 46 voices, 8 languages
Browse all available host voices with audio previews. Assign any two voices to Host A and Host B in the podcast generator. All voices available on free tier for evaluation.
Frequently asked questions
Is it really AI voices, or are real people reading my article?▾
Fully AI voices — no human narrators involved. EasyVoice is built on Kokoro-82M, an open-weight neural TTS model (Apache-2.0 on Hugging Face). The two-host script is written by a language model and then synthesized speaker segment by speaker segment using Kokoro voices. Each segment is individually generated and stitched together with consistent loudness normalization into a single MP3 episode. No human recording, no studio, no actor fees.
Can I use my own voice as one of the podcast hosts?▾
Yes — on the Pro+ plan ($19.99/mo) you can clone your own voice and assign it as Host A or Host B. Voice cloning requires a 15–60 second clean audio sample and a one-time consent attestation confirming you own or have explicit permission to clone the voice. Every episode generated using a cloned voice carries an inaudible AudioSeal watermark embedded at synthesis time, before the file is written. The watermark cannot be removed by re-encoding. This design protects both the voice owner and the listener. See /voice-cloning for the full consent and watermark policy.
What article length works best?▾
Articles between 300 and 3,000 words produce the most natural episodes. Short articles under 300 words produce episodes that feel rushed — the hosts have little to discuss. Very long articles above 5,000 words are supported on Pro and Pro+, but the AI script compresses content to hit a listenable episode length rather than reading every sentence verbatim. The sweet spot is a standard 800–1,500 word blog post or news article.
How long can a podcast episode be?▾
Episode length depends on your plan. The free tier caps at one short episode per day (approximately 3–5 minutes of audio, roughly 500–800 words of script). The Pro plan ($9.99/mo) supports standard-length episodes up to around 15 minutes. Pro+ ($19.99/mo) unlocks long-form episodes up to 45 minutes and cloned-voice hosts. See /pricing for the full tier breakdown.
Do I need editing software after downloading the episode?▾
No. The output is a finished MP3 with both host voices stitched together, loudness-normalized to a consistent level across the full episode. You can publish it directly to a podcast RSS feed, embed it on a blog, or share it as a standalone audio file. No DAW, no Audacity, no post-processing required. The episode is a single file from the moment it downloads.
Is there background music or sound effects?▾
No — v1 of the AI podcast generator produces voice-only episodes with no background music, intro jingle, or sound effects. This is intentional: adding music that is properly licensed for podcast distribution adds cost and complexity that would flow through to the pricing tier. The voice-only output is also easier to post-process if you want to add your own branded intro. Background music support is on the roadmap for a future release.
Start generating podcast episodes from your articles
Free tier: one episode per day, no credit card. Pro at $9.99/mo for unlimited standard episodes. Pro+ at $19.99/mo for long-form and your own cloned voice.