2026-05-05 TTS Bake-off

Kokoro-82M

Apache-2.0Commercial OK5 clips

Role: Default production narrator / preset two-voice option

Evidence: Local render on Impact Signals sample text

am_liam — Male default candidate

same Impact Signals text / local render · 16.45s

Impact Signals is tracking how artificial intelligence is moving from demonstration projects into public health, disaster response, and humanitarian operations. The replacement voice should sound calm, credible, and clear. It should pronounce organizations, numbers, and policy terms without skipping words.

local Kokoro-82M ONNX assets

am_michael — Male alternate

same Impact Signals text / local render · 20.63s

Impact Signals is tracking how artificial intelligence is moving from demonstration projects into public health, disaster response, and humanitarian operations. The replacement voice should sound calm, credible, and clear. It should pronounce organizations, numbers, and policy terms without skipping words.

local Kokoro-82M ONNX assets

af_bella — Female cohost candidate

same Impact Signals text / local render · 20.12s

Impact Signals is tracking how artificial intelligence is moving from demonstration projects into public health, disaster response, and humanitarian operations. The replacement voice should sound calm, credible, and clear. It should pronounce organizations, numbers, and policy terms without skipping words.

local Kokoro-82M ONNX assets

af_heart — Female high-grade candidate

same Impact Signals text / local render · 19.07s

Impact Signals is tracking how artificial intelligence is moving from demonstration projects into public health, disaster response, and humanitarian operations. The replacement voice should sound calm, credible, and clear. It should pronounce organizations, numbers, and policy terms without skipping words.

local Kokoro-82M ONNX assets

am_liam + af_bella — two preset voice dialogue

same Impact Signals dialogue / local render · 27.63s

Impact Signals is tracking how artificial intelligence is moving from demonstration projects into public health, disaster response, and humanitarian operations. / The replacement voice should sound calm, credible, and clear. It should pronounce organizations, numbers, and policy terms without skipping words. / For production, we care about commercial licensing, transcript fidelity, natural pacing, and whether the audio can pass automated guardrails every day.

local Kokoro-82M ONNX assets

VibeVoice

MITCommercial OK3 clips

Role: Long-form multi-speaker podcast candidate

Evidence: Official public demo samples downloaded from Microsoft GitHub Pages

2-person podcast: See You Again

official demo sample / not same Impact Signals text · 61.33s

Official demo text from model authors; use for voice/style only, not final transcript-fidelity certainty.

https://microsoft.github.io/VibeVoice/assets/audio/2p_see_u_again.mp3

2-person argument/dialogue

official demo sample / not same Impact Signals text · 68.53s

Official demo text from model authors; use for voice/style only, not final transcript-fidelity certainty.

https://microsoft.github.io/VibeVoice/assets/audio/2p_argument.mp3

3-person GPT-5 discussion

official demo sample / not same Impact Signals text · 738.0s

Official demo text from model authors; use for voice/style only, not final transcript-fidelity certainty.

https://microsoft.github.io/VibeVoice/assets/audio/3p_gpt5.mp3

VoxCPM2

Apache-2.0Commercial OK3 clips

Role: Designed voice without cloning / controllable speaker candidate

Evidence: Official public demo samples downloaded from OpenBMB demo page

English voice design sample

download failed · unknowns

https://openbmb.github.io/voxcpm2-demopage/audio/voice_design/output.wav

English language sample

official demo sample / not same Impact Signals text · 8.52s

Official demo text from model authors; use for voice/style only, not final transcript-fidelity certainty.

https://openbmb.github.io/voxcpm2-demopage/audio/language/en_us.mp3

Cross-lingual English reference output

official demo sample / not same Impact Signals text · 13.6s

Official demo text from model authors; use for voice/style only, not final transcript-fidelity certainty.

https://openbmb.github.io/voxcpm2-demopage/audio/cross_lingual/en_en.wav

FireRedTTS2

Apache-2.0Commercial OK3 clips

Role: Dialogue-style / podcast generation candidate

Evidence: Official public demo samples downloaded from FireRedTTS2 demo page

Podcast generation sample 1

download failed · unknowns

https://fireredteam.github.io/demos/firered_tts_2/audios/zero-shot-podcast-generation_samples/en_1_mooncast.wav

Podcast generation sample 2

download failed · unknowns

https://fireredteam.github.io/demos/firered_tts_2/audios/zero-shot-podcast-generation_samples/en_2_mooncast.wav

English zero-shot output sample

download failed · unknowns

https://fireredteam.github.io/demos/firered_tts_2/audios/ZeroShotICL_samples/mandarin_to_english/26900001%23en_024.wav

Fun-CosyVoice3

Apache-2.0Commercial OK3 clips

Role: Possible replacement for problematic CosyVoice2 path

Evidence: Official public demo samples downloaded from FunAudioLLM CosyVoice3 page

CosyVoice3 base English zero-shot

official demo sample / not same Impact Signals text · 4.92s

Official demo text from model authors; use for voice/style only, not final transcript-fidelity certainty.

https://funaudiollm.github.io/cosyvoice3/audio/c3_base/zero-shot/en/uttid_2.wav

CosyVoice3 large English zero-shot

official demo sample / not same Impact Signals text · 5.28s

Official demo text from model authors; use for voice/style only, not final transcript-fidelity certainty.

https://funaudiollm.github.io/cosyvoice3/audio/c3_large/zero-shot/en/uttid_2.wav

CosyVoice2 base English comparator

official demo sample / not same Impact Signals text · 5.96s

Official demo text from model authors; use for voice/style only, not final transcript-fidelity certainty.

https://funaudiollm.github.io/cosyvoice3/audio/c2_base/zero-shot/en/uttid_2.wav

Impact Signals TTS Bake-off Soundboard

Kokoro-82M

VibeVoice

VoxCPM2

FireRedTTS2

Fun-CosyVoice3

Decision rules

Exact local sample text