ChatGPT finds its voice (plus eyes and ears) with major upgrade


Open AI’s ChatGPT is rolling out major updates to its flagship AI chatbot, including the ability to “see, hear, and speak” with its users.

The Microsoft-backed AI technology company announced the new updates in a blog post on Monday, touting a more intuitive ChatGPT interface.

“You can now use voice to engage in a back-and-forth conversation with your assistant,” OpenAI states.

Powered by a new open-source text-to-speech model, ChatGPT is capable of generating human-like audio from just text and a few seconds of sample speech, it said.

Whisper, ChatGPT’s open-source speech recognition system, will do things like “request a bedtime story for your family or settle a dinner table debate.”

Users will also be able to interact with ChatGPT about specific images or select parts of an image using a new drawing tool, simply by asking.

ChatGPT will use its reasoning skills to answer questions, analyze, or provide insight on “a wide range of images, such as photographs, screenshots, and documents containing both text and images,” OpenAI said.

Tasks like troubleshooting why a grill won’t start, exploring the contents of a fridge to plan a meal, or analyzing a complex graph for work-related data are just a few examples of how the bot will interact with users who share a picture with it.

Users have a choice of five different voices to interact with, all created by the company in collaboration with professional voice actors.

The upgrade will put ChatGPT in line with other artificial intelligence AI assistants like Apple's Siri, Google’s voice assistant, and Amazon’s Alexa, which can already verbally interact with users.

The image features will also put ChatGPT in direct competition with Google Lens.

All three tech giants have introduced their own major AI upgrades to their products in recent weeks, such as Amazon's AI-powered human-like Alexa and new AI features for Google's Bard.

In August, ChatGPT released its own Enterprise version that has already been adopted by many major companies for a wide range of business tasks, from summarizing documents to writing computer code.

The new Whisper technology is also being run as a pilot program by Spotify so podcasters can translate their podcasts into different languages using their own voices, OpenAI said.

Not to shy away from the recent uptick of AI and privacy concerns lobbed at the company since it was launched in November 2022, OpenAI says it has taken technical measures to significantly limit ChatGPT’s ability to analyze and make direct statements about specific people shared in images or in video.

“Vision-based models also present new challenges, ranging from hallucinations about people to relying on the model’s interpretation of images in high-stakes domains,” the blog post said.

The imposed limitations are in response to tests OpenAI conducted on the chatbot in regard to extremism and scientific proficiency.

“ChatGPT is not always accurate and these systems should respect individuals’ privacy,” OpenAI said.

OpenAI will be rolling out the new features for its ChatGPT Plus monthly subscription and its Enterprise plans over the next two weeks.

Other users and developers are expected to have access to the new features at a later date.