VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on in-the-wild data including audiobooks, internet videos, and podcasts.




Enginn Studio is described as 'Enginn creates human-quality voices which don't belong to anyone, and provides a SaaS platform, Enginn Studio, allowing to use those voices for content production' and is a Text to Speech service. There are more than 10 alternatives to Enginn Studio, not only websites but also apps for a variety of platforms, including Self-Hosted, Python, SaaS and Google Chrome apps. The best Enginn Studio alternative is VoiceCraft, which is both free and Open Source. Other great sites and apps similar to Enginn Studio are ElevenLabs, X to Voice, Voice Engine and Chatterbox TTS.
VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on in-the-wild data including audiobooks, internet videos, and podcasts.




ElevenLabs uses AI to deliver natural, expressive speech for diverse applications such as podcasts and videos. It features a user-friendly interface, customizable intonation, and offers seamless API integration. Privacy, scalability, and multilingual capabilities enhance its adaptability.




Open-source tool that analyzes your X/Twitter profile data to generate a custom voice with ElevenLabs Voice Design API, integrating with Hedra's video API for an innovative audio-visual experience.


Voice Engine is a text-to-voice generation platform from OpenAI, which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker.


We're excited to introduce Chatterbox, Resemble AI's first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.
AIVocal is your all-in-one AI assistant for voice tasks—perfect for AI podcasting, speech generation, vocal editing, and voice control. From transcribing meetings to creating high-quality audio content, AIVocal makes voice work smarter and faster.

Dia is a 1.6B parameter text to speech model created by Nari Labs. It was pushed to the Hub using the PytorchModelHubMixin integration.

An AI-powered platform revolutionizing voice creation with cutting-edge technology. We provide advanced audio solutions for creators and businesses worldwide.



A privacy-focused online video editor that processes videos entirely in your browser using FFmpeg and WebAssembly. No uploads, no signups required.

AudiowaveAI lets you convert text to high-quality audio easily and affordably. Listen to PDFs, epubs, articles, blog posts, links, emails or anything else you want on any device with natural-sounding voices.




Convert text into natural-sounding speech using an API powered by the best of Google’s AI technologies.

Make an AI copy of your voice that keeps your tone and accent. Our voice cloning tech lets you create natural-sounding speech for videos and podcasts by reading a short text.
