ChatGPT’s Advanced Voice feature was first showcased months ago during the GPT-4o launch, but its release faced delays due to safety concerns. The situation escalated when controversy arose over the ‘Sky’ voice, which resembled Scarlett Johansson’s. After a five-month delay, OpenAI is finally launching Advanced Voice for all ChatGPT Plus and Team users, with the rollout expected to be finished this week.
For those unfamiliar, Advanced Voice is a significant upgrade from the standard voice chat available to free ChatGPT users. It leverages GPT-4o’s multimodal capabilities to enable more natural, free-flowing conversations, including support for interruptions.
While ChatGPT’s Advanced Voice might seem similar to Google’s Gemini Live, there’s a key difference. Gemini Live relies on TTS/STT (text-to-speech and speech-to-text) engines to convert spoken input into text, extract responses from an LLM, and then reply. In contrast, ChatGPT Advanced Voice manages audio input and output seamlessly. Though both platforms support interruptions, Gemini Live lacks the truly multimodal experience that ChatGPT offers.
However, in my brief testing, it seems that ChatGPT’s Advanced Voice has lost many of the multimodal features highlighted during the demo. OpenAI has demonstrated capabilities like singing, detecting mood or emotion from speech, recognizing different sounds, and even doing accents. Currently, though, Advanced Voice cannot identify speech nuances and camera input is not yet supported.
It seems OpenAI may have scaled back some features to avoid potentially awkward conversations with ChatGPT. That said, are you excited to try out ChatGPT’s Advanced Voice? Share your thoughts in the comments below!
0 Comments