Experience Your Imagination: ElevenLabs to Introduce AI for Sound Effects

ElevenLabs, an AI startup founded by former Google and Palantir employees two years ago, has excelled at using machine learning (ML)-powered voice cloning and synthesis technologies to produce voice clones and syntheses. Now they are expanding their offering with a text-to-sound model.

Introduced just hours ago, AI allows creators to produce sound effects by simply verbalizing their imaginations in words. It promises to enrich content creation experiences in this age of AI-powered digital experiences.

ElevenLabs has yet to make its model publicly available, but has demonstrated its capabilities by producing a minute-long teaser using videos produced by OpenAI’s Sora, enhanced with their own AI sounds and presented via video streaming services like Youtube or Facebook Live. Furthermore, ElevenLabs has set up a signup page inviting prospective users to join an early access waitlist for its model.

ElevenLabs began research into artificial intelligence (AI) to make audio and video content, such as movies or podcasts, accessible across languages and geographies since 2022. ElevenLabs has introduced several offerings to further this goal, such as text-to-speech models that produce AI speech from any piece of text/audio/video in 29 different languages while replicating their original speaker’s voice when speaking speech-to-speech models.

While both tools continue to gain adoption from enterprises and individuals who create content, we’ve also witnessed an upsurge in AI-generated videos from tools like Runway, Pika and OpenAI (with Sora). These products generate realistic AI videos from simple text prompts, yet lack standard audio for their videos – this is where ElevenLabs’ new model comes in – users can create sound effects for their content by providing descriptions.

Used properly, this offering allows AI creators to seamlessly enhance their work with background noise that should naturally accompany it – from birds chirping or vehicles moving along roads with horns blaring, all the way down to people talking, eating or walking on bustling streets.

ElevenLabs had previously only displayed their text-to-speech models publicly. But we have so much more in development! So when OpenAI released their Sora model — which can generate incredible videos but without sound — we decided to give a sneak peak of our new product line,” wrote Luke Harries, head of growth at ElevenLabs, while sharing an X post that featured several Sora videos enhanced with AI sound effects from ElevenLabs’ model.

AI-generated content isn’t the only application of this new model’s sounds; its output could also be utilized in plain speech produced from text or any video clip – be it Instagram, commercial, or video game trailer – that requires background audio. We shall see how its applications and quality impact the market.

ElevenLabs has not shared when or if their AI model will launch publicly, yet early signups are open now for early access. Users interested can visit this page and register their name and email, along with what their use case for sound effects might be. ElevenLabs also asks early volunteers to write sample prompts to optimize responses from its model.

Once registration is complete, users will be added to a waitlist and will gain access as soon as models become available; at this stage however, their timeline remains unknowable.

ElevenLabs may enjoy an early advantage with their text-to-sound technology, but other companies active in AI speech have the ability to enter this space – this includes known players like MURF.AI, Play.ht, and WellSaid Labs.

According to Market US, the global market for such tools stood at $1.2 billion in 2022 and is projected to surpass $5 billion by 2032 – at an estimated compound annual growth rate of 15.40%.