ChatGPT-maker OpenAI has taken further strides toward modernizing legacy digital media by entering audio as well. More specifically, voice cloning.
Today, OpenAI unveiled their “Voice Engine,” an AI model in development since 2022 and currently used to power OpenAI’s text-to-speech API and ChatGPT Voice and Read Aloud features unveiled earlier this month.
As it turns out, OpenAI can also perform voice cloning. Here’s how it works: after recording a 15-second clip with their phone or computer microphone microphone, OpenAI’s Voice Engine generates “natural-sounding speech that closely resembles their original speaker” that can then be used by human users typing text aloud to it.
Sophisticated recording technology could have far-reaching ramifications for those involved with spoken audio production – from podcasters, voice over artists, spoken word performers and audiobook/advertisement narrators, gamers/streamers/customer service agents/salespeople or anyone who regularly records themselves speaking – such as podcasters, voiceover artists/vocalover artists/spoken word performers to gamers/streamers/customer service agents/salespersons among many others.
Pressure from such developments also applies to other companies devoted to this kind of technology, including well-funded AI startup ElevenLabs, Captions, Meta, WellSaid Labs and MyShell among others.
OpenAI further showcases Voice Engine’s ability to support non-verbal individuals by giving them non-robotic voices that are individually customized, as well as aiding therapeutic and educational programs for those with speech impairments or learning needs.
OpenAI noted in its announcement of Voice Engine today that, thus far, only “a small group of trusted partners” have access to it. Examples were highlighted and named such as Google Cloud Platform and Oracle Corporation as potential use cases for Voice Engine technology.
Age of Learning, an education technology provider, leverages Voice Engine and GPT-4 for creating personalized voice content pre-scripted or real-time, expanding reading assistance capabilities and interactivity for various student audiences.
HeyGen is an AI visual storytelling platform that empowers creators and businesses alike to adapt their content for translation into multiple languages using Voice Engine’s video translation features, while custom human-like avatars with multilingual voices allow HeyGen to reach a global audience while keeping original speaker’s accent intact.
Dimagi, a software company providing tools for community health workers, leverages Voice Engine and GPT-4 to provide interactive feedback in various languages for its workers, thus improving service delivery to remote settings.
Livox, an AI app for Augmentative and Alternative Communication (AAC) devices used by those with speech and hearing difficulties, integrates Voice Engine to provide non-robotic voices across languages for non-verbal individuals.
The Norman Prince Neurosciences Institute at Lifespan at Brown University is a nonprofit medical and teaching organization committed to aiding those with neurological diseases and disorders. To assist those who suffer speech impairments in using Voice Engine’s AI version of their voice. Two doctors there, Rohaid Ali and pediatric neurosurgeon Konstantina Svokos, have successfully restored speech of one brain tumor patient using an audio sample from one of her school project videos.