AI Podcast Generator — Turn Your Script into a Two-Host Podcast

Name: EasyVoice
Availability: InStock
Author: EasyVoice

Write your episode as a two-host dialogue in the EasyVoice podcast studio — or paste content from an article or your notes and split it into speaking turns — then assign Host A and Host B distinct voices, and the generator synthesizes each speaker segment individually and stitches everything into a finished MP3 episode within minutes. No microphone. No editing software. No studio booking. The entire audio pipeline from script to publishable episode runs automatically: the neural TTS engine voices each line, and the audio stitcher normalizes loudness across the full episode before the file downloads.

Two-host dialogue format makes dense written content genuinely listenable. A single narrator reading an article is fine for short-form content; a conversation between two distinct voices — one asking clarifying questions, the other providing context, both referencing specific claims from your source material — sustains listener attention across longer pieces. Commuters, gym-goers, and people processing a long reading list are the primary audience: they want the ideas in your article, but in a format that works while their eyes are otherwise occupied.

How the AI podcast generator works

The pipeline has four steps. Understanding each step helps set accurate expectations about what the generator does and does not do in v1.

Step 1 — Write your dialogue script

Write your episode directly in the script editor, or paste content from an article, blog post, research summary, or your notes. URL fetch is not supported in v1 — paste the text directly. Scripts adapted from 300–3,000 word articles produce the most natural episode length; for longer source material, condense as you adapt rather than carrying over every sentence verbatim.

Step 2 — Split it into host turns

Structure the script as alternating segments and assign each one to Host A or Host B. A pattern that works well: Host A presents claims and context; Host B asks clarifying questions, adds caveats, and keeps the pacing natural. Open with a brief intro naming the topic and close with an outro recap — standard podcast structure. The episode is pure dialogue: every word you write is exactly what gets spoken.

Step 3 — Pick two voices

Choose a voice for Host A and a voice for Host B from the EasyVoice catalog — 56 voices across nine languages: English (American and British), Arabic, Spanish, French, Italian, Portuguese, Japanese, and Hindi. On Pro, you can assign your own cloned voice to either host slot. The two host voices are synthesized independently — each speaker segment is generated from scratch, not pitch-shifted from a single recording.

Step 4 — Download the finished MP3

Each speaker segment is synthesized individually and then stitched together into a single audio file. Loudness normalization runs across the full episode before the file is written — so the two voices feel like they are in the same room, not recorded on separate rigs. The output is a finished MP3 ready to publish, share, or embed. No post-processing required. No DAW needed.

Two-host conversational format — design decisions

The AI podcast generator is specifically built around the two-host format rather than a single narrator because the conversational structure solves several problems that single-voice narration does not. Here is the design reasoning behind each key decision in v1.

Host A and Host B, not a single reader. Single-narrator audiobook-style playback works well for fiction and short articles but loses listeners on dense or technical content. When there are two distinct voices having a genuine back-and-forth, the listener can follow the argument structure more naturally — the question-and-answer pattern maps to how people actually discuss ideas, and it gives you a structural scaffold for organizing complex material without losing narrative thread.

Intro and outro recommended. Open your script with a brief intro that names the topic and sets listener expectations — "Today we're covering…" — and close with an outro that recaps the main points and invites the listener to the source. This is standard podcast structure. It makes the episode feel complete rather than a raw text reading with a hard stop at the end.

No stage directions in the audio output. The synthesizer reads exactly what you write — there is no interpretation layer between your script and the audio. Keep the segments pure spoken dialogue: do not include production notes like "(laughs)" or "[HOST A pauses]", because they would be read aloud verbatim. What you write is what a listener hears.

No background music in v1. Background music that is legally distributable under podcast licensing adds meaningful cost and complexity. Rather than ship a low-quality or improperly licensed music layer, v1 ships voice-only. The output is still broadcast-ready — loudness normalization ensures professional audio levels — and it is easier for creators to layer their own branded music if they want it.

Use your own cloned voice as a podcast host (Pro)

On the Pro plan, you can clone your own voice and assign it as Host A or Host B in any generated episode. This means the podcast sounds like you — your cadence, your register, your voice — even though you never sat in front of a microphone for that episode. Content creators, educators, and newsletter writers use this to maintain a consistent personal brand voice across more content than they could manually record.

Consent attestation is required. When you create a voice clone on EasyVoice, you complete a one-time consent attestation confirming that you either are the voice owner or hold explicit written permission from the voice owner. Cloning a voice without the owner's consent is prohibited under the EasyVoice Terms of Service. See /voice-cloning for the full consent policy and what the attestation covers.

AudioSeal watermarking on all cloned output. Every audio clip generated using a cloned voice — including podcast episodes where a cloned voice is assigned to Host A or Host B — carries an inaudible AudioSeal watermark embedded at synthesis time, before the file is written. The watermark survives re-encoding and re-compression. It is not audible to a human listener but is machine-detectable, meaning any cloned output is provably synthetic and traceable back to the EasyVoice platform. This protects the voice owner, the listener, and you as the content creator.

Pro includes up to three custom voice clones. Each clone is usable across the podcast generator, the standard TTS editor, and the API under the same subscription. Pro is $9.99/mo or $24.99/qtr. See /pricing for the full comparison with Pro and free tier limits.

Languages and voice catalog

The EasyVoice voice catalog includes 56 voices across nine languages: American English, British English, Arabic, Spanish, French, Italian, Portuguese, Japanese, and Hindi. 46 of these are built on the Kokoro-82M neural TTS model; the 10 Arabic voices run on the dedicated Supertonic Arabic engine — none are English-engine voices reading translated text, but native-speaker-trained models for each language.

For the podcast generator, English presets are the primary supported configuration in v1, and Arabic voices are fully supported for both host slots. Because you write the script yourself, the language of the episode is entirely under your control: write Arabic dialogue and assign Arabic voices for an Arabic-language episode, or English dialogue with English voices. Other language combinations work at the voice synthesis level — match the script language to voices trained for that language for the most natural results.

Pricing and episode limits

Free

One short episode per day. Approximately 3–5 minutes of audio. Standard catalog voices only. No credit card required. Daily reset — you can evaluate the generator indefinitely without paying.

Pro — $9.99/mo

Long-form episodes up to 45 minutes. All 56 catalog voices. Cloned-voice hosts (up to 3 custom clones). Unlimited episodes per month. API access for programmatic generation. Scripts up to 30,000 characters.

Full tier comparison including API rate limits, request size caps, and annual billing options at /pricing.

Related tools and guides

Article to Podcast — content repurposing guide

The content-repurposing angle: why converting existing blog posts and articles into podcast episodes expands reach without new writing work, who the audience is, and how the paste-only workflow fits a weekly publishing cadence.

Voice cloning — consent-first, watermarked

Clone your own voice for use as a podcast host. Consent attestation required. AudioSeal watermark on every cloned output. Included with Pro at $9.99/mo with up to 3 custom clones.

Pricing — free & Pro

Full tier comparison: free 1 episode/day, Pro $9.99/mo long-form plus cloned hosts. API rate limits and request size caps included.

Voice catalog — 56 voices, 9 languages

Browse all available host voices with audio previews. Assign any two voices to Host A and Host B in the podcast generator. All voices available on free tier for evaluation.

Frequently asked questions

Is it really AI voices, or are real people reading my script?▾

Fully AI voices — no human narrators involved. EasyVoice is built on Kokoro-82M, an open-weight neural TTS model (Apache-2.0 on Hugging Face). You write the two-host script as dialogue turns, and each speaker segment is synthesized individually using Kokoro voices. The segments are stitched together with consistent loudness normalization into a single MP3 episode. No human recording, no studio, no actor fees.

Can I use my own voice as one of the podcast hosts?▾

Yes — on the Pro plan ($9.99/mo) you can clone your own voice and assign it as Host A or Host B. Voice cloning requires a 15–60 second clean audio sample and a one-time consent attestation confirming you own or have explicit permission to clone the voice. Every episode generated using a cloned voice carries an inaudible AudioSeal watermark embedded at synthesis time, before the file is written. The watermark cannot be removed by re-encoding. This design protects both the voice owner and the listener. See /voice-cloning for the full consent and watermark policy.

What script length works best?▾

Scripts adapted from articles between 300 and 3,000 words produce the most natural episodes. Very short scripts feel rushed — the hosts have little to discuss. If you are adapting a long article, condense as you write rather than carrying over every sentence verbatim: keep the main claims, evidence, and conclusions as dialogue turns. Script limits scale with plan — 2,000 characters per episode on the free tier, 30,000 on Pro. The sweet spot is a script adapted from a standard 800–1,500 word blog post or news article.

How long can a podcast episode be?▾

Episode length depends on your plan. The free tier caps at one short episode per day (approximately 3–5 minutes of audio, roughly 500–800 words of script). The Pro plan ($9.99/mo) unlocks long-form episodes up to 45 minutes and cloned-voice hosts. See /pricing for the full tier breakdown.

Do I need editing software after downloading the episode?▾

No. The output is a finished MP3 with both host voices stitched together, loudness-normalized to a consistent level across the full episode. You can publish it directly to a podcast RSS feed, embed it on a blog, or share it as a standalone audio file. No DAW, no Audacity, no post-processing required. The episode is a single file from the moment it downloads.

Is there background music or sound effects?▾

No — v1 of the AI podcast generator produces voice-only episodes with no background music, intro jingle, or sound effects. This is intentional: adding music that is properly licensed for podcast distribution adds cost and complexity that would flow through to the pricing tier. The voice-only output is also easier to post-process if you want to add your own branded intro. Background music support is on the roadmap for a future release.

Start generating podcast episodes from your articles

Free tier: one episode per day, no credit card. Pro at $9.99/mo for unlimited long-form episodes and your own cloned voice.