OpenAI's voice engine can replicate human voices based on just a 15-second sample.

OpenAI boasts state-of-the-art models for text and image generation. Recently, it unveiled Sora, a remarkable text-to-video model. Now, the company introduces its Voice Engine model, capable of crafting speeches from a mere 15-second audio snippet. Essentially, it’s a text-to-audio model: provide a 15-second audio for training, input your text, and enjoy natural-sounding speech output.

OpenAI asserts that despite its compact size, Voice Engine can produce lifelike and expressive voices, closely resembling the original speaker. Developed in late 2022, the model has been driving the ChatGPT Voice Chat feature.

Acknowledging the “serious risks” tied to the technology and the potential for misuse of synthetic voices, OpenAI refrains from public release for now. Instead, it offers a preview of the model to initiate a conversation about voice synthesis and how society can navigate these emerging capabilities.

Regarding the model’s capabilities, it adeptly converts lifelike audio into various languages with nuanced accents. HeyGen, a prominent AI video and audio generation platform, leverages OpenAI’s Voice Engine for crafting bespoke voices. Meanwhile, ElevenLabs has pioneered its own speech synthesis model capable of replicating voices and producing speeches across multiple languages.

Despite its impressive power, the technology can be deceiving and pose risks to users in various scenarios. OpenAI acknowledges the use of voice-based authentication for accessing sensitive information like bank accounts, expressing hope for the phase-out of such authentication systems. Additionally, social media platforms witness instances of individuals cloning popular voices to promote their products.

Another ad using @MKBHD's voice pic.twitter.com/9z2c0ifYxg
— Max Weinbach (@MaxWinebach) March 29, 2024

In India, there’s a concerning uptick in AI voice cloning scams. Cybercriminals exploit this technology by cloning children’s voices to intimidate parents and demand ransom. Given this landscape, OpenAI finds itself in a delicate position regarding widespread release of the model. As we progress into the AI era, society must exercise greater caution and resilience.

What are your thoughts on OpenAI’s voice cloning engine? Do you believe the company should make the model available to the public? Drop your opinions in the comment box down below.

OpenAI’s voice engine can replicate human voices based on just a 15-second sample.

Leave a Reply Cancel reply

OpenAI’s voice engine can replicate human voices based on just a 15-second sample.

Share this article

Leave a Reply Cancel reply

Read next

Samsung Prepares Wide Fold To Challenge Apple’s 2026 Foldable

Apple’s Foldable IPhone Could Replace Face ID With Touch ID

Lenovo Idea Tab Plus Debuts With 90Hz Screen And 10,200mAh Battery

Vivo Launches S50 Pro Mini: Small Form Factor, Snapdragon 8 Gen 5, And 6,500mAh Power