In a significant leap forward, OpenAI’s ChatGPT, originally known for its text-based prowess, is expanding its capabilities to include voice and image interactions.
This transformative upgrade promises a more immersive and versatile user experience.
ChatGPT, initially introduced about nine months ago, has been a resounding success, allowing users to generate essays, poems, and summaries through text prompts. But now, the AI assistant is poised to become even more interactive by enabling voice conversations with users.
This announcement coincides with Amazon’s commitment to invest up to $4 billion in OpenAI rival Anthropic, setting the stage for a fierce battle in the generative AI arena among tech giants. Google’s Bard chatbot, Meta’s open-source ethos, and Microsoft’s alignment with OpenAI all contribute to the evolving landscape.
OpenAI’s latest move merges the world of voice-based assistants with its powerful large language models (LLMs). Users can now engage in spoken conversations with ChatGPT, opening up exciting possibilities. For example, users can ask ChatGPT to craft a bedtime story on the fly, guiding the narrative with vocal prompts. Alternatively, users can pose questions to ChatGPT, receiving spoken word responses.
Moreover, ChatGPT’s new features extend to image-based interactions. Users can upload images and ask ChatGPT to explain or provide instructions related to the content. This functionality is designed to enhance user engagement and problem-solving.
The voice feature is underpinned by a text-to-speech model capable of generating human-like voices from text inputs and a few seconds of sampled speech. OpenAI collaborated with established voice actors to create five distinct voices. The Whisper speech recognition system, part of the open-source toolkit, transcribes verbal utterances into text.
Spotify joins the ranks as a launch partner, introducing an intriguing feature for podcasters. This feature allows podcasters to translate their shows from English into Spanish, French, or German while retaining their original voice. However, OpenAI is exercising caution, limiting access to this technology and working closely with select podcasters to ensure responsible use.
OpenAI acknowledges the immense potential of its new voice technology but also recognizes the associated risks, including the possibility of malicious actors impersonating public figures or committing fraud.
The new features are set to roll out to paying Plus and Enterprise subscribers within the next two weeks. To activate voice capabilities, users can navigate to the app’s “settings” menu, select “new features,” and opt-in to voice conversations. Voice will initially be available on the ChatGPT Android and iOS apps on an opt-in beta basis, while image search will become the default on all platforms.