VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on in-the-wild data including audiobooks, internet videos, and podcasts.




AudiowaveAI is described as 'Lets you convert text to high-quality audio easily and affordably. Listen to PDFs, epubs, articles, blog posts, links, emails or anything else you want on any device with natural-sounding voices' and is a Text to Speech service. There are more than 25 alternatives to AudiowaveAI for a variety of platforms, including Web-based, Windows, Linux, Mac and Android apps. The best AudiowaveAI alternative is VoiceCraft, which is both free and Open Source. Other great apps like AudiowaveAI are ElevenLabs, X to Voice, RHVoice and Voice Engine.
VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on in-the-wild data including audiobooks, internet videos, and podcasts.




ElevenLabs uses AI to deliver natural, expressive speech for diverse applications such as podcasts and videos. It features a user-friendly interface, customizable intonation, and offers seamless API integration. Privacy, scalability, and multilingual capabilities enhance its adaptability.




Open-source tool that analyzes your X/Twitter profile data to generate a custom voice with ElevenLabs Voice Design API, integrating with Hedra's video API for an innovative audio-visual experience.




Voice Engine is a text-to-voice generation platform from OpenAI, which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker.


Natural Reader is a professional text to speech program that converts any written text into spoken words. The paid versions of Natural Reader have many more features.



eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows.


SherpaTTS is an Android Text-to-Speech engine based on Next-gen Kaldi using Piper or Coqui voices.


Transform text into speech with natural synthesis, offering smooth and fine-tuned audio export. Create high-quality voiceovers, download outputs for diverse applications, and experience excellent synthesis. Supports various languages and operates on multiple platforms.




AIVocal is your all-in-one AI assistant for voice tasks—perfect for AI podcasting, speech generation, vocal editing, and voice control. From transcribing meetings to creating high-quality audio content, AIVocal makes voice work smarter and faster.

Audiomatic is a web app that seamlessly translates videos into other languages. Our state-of-the-art pipeline delivers contextually-accurate dubbed translations that preserve the tone, style, and emotion of the original speakers.



We're excited to introduce Chatterbox, Resemble AI's first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.