OpenAI introduces new GPT-4o, a more intuitive AI model


Sam Altman’s OpenAI introduced a new conversational GPT model and several other fresh upgrades for ChatGPT and GTP-4 during a live stream demo event from its headquarters in San Francisco, California on Monday.

The biggest announcement at the ‘Spring Update’ community event was the launch of a new flagship model – GPT-4o bringing GPT-4-level intelligence to everyone, including free users.

The fresh "GPT-4o" model (the letter O for ‘omni’) can reason in real-time across voice, text, and vision, said OpenAI's chief technology officer Mira Murati, who kicked off the 10:00 a.m. PT live stream event.

The AI start-up also introduced a new ChatGPT Desktop app version for macOS and a refreshed UI that claims will make the popular chatbot simpler to use and much more natural.

“We know the models get more complex, but we want people to focus on the collaboration not the UI process,” said Murati.

The new GPT-4o boasts Improved quality and speed in 50 different languages, giving OpenAI the ability to bring the ChatGPT experience to as many people as possible, she said.

The much more natural human-computer interaction “accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs,” the company states.

First AI chatbot to mimic human conversation

Because all inputs and outputs are processed by the same neural network, the intuitive GPT-4o is said to mimic a human conversational response time in a way never quite seen before by any AI chatbot.

As the demonstrations played out on the live stream, Altman simply posted the word “her” on his X account, seemingly a nod to the 2013 sci-fi film “Her,” starring Joaquin Phoenix, who plays a man who falls in love with his AI assistant, the voice of Scarlett Johansson.

The ‘4o’ response, which opens in a new window, matches GPT-4 Turbo performance on text in English and code, OpenAI states.

Specifically, GPT-4o will respond to audio input in as little as 232 milliseconds, with an average of 320 milliseconds.

In addition to being able to speak to ChatGPT and obtain real-time responses with no delay, users can also interrupt the chatbot while it is speaking.

"It feels like AI from the movies ... Talking to a computer has never felt really natural for me; now it does," Altman wrote in a blog post describing the never-before-seen AI-to-human interaction.

The numerous demonstrations ranged from introducing the AI to a dog, asking it to help a user prep for a job interview, real-time language translation, and talking a researcher through solving a math equation on a sheet of paper.

Here, GPT-4o is with two users and asked to interpret its surrounding environment, which includes a piece of cake with a candle on a desk, through the lens of the phone, as well as a previous clip of the AI being asked to create and sing a lullaby.

Marathi explained that before 4o, the three components GPT-4 used to deliver its voice mode – transcription, intelligence, and text-to-speech – had previously brought a lot of latency to the emersion process.

“With GPT-4o this will happen natively, allowing us to bring efficiency to everyone,” she said.

Free to all ChatGPT users

The new intuitive model. starting Monday, will be available to a larger audience, including developers, providing them with several more intuitive processes.

With the addition of ‘vision’ abilities, users can now upload screenshots, photos, documents containing both text and images to start conversations with ChatGPT.

The flagship’s ability to integrate ‘memory’ will give the user a sense of continuity across all the conversations, Murati pointed out, while the ability to ‘browse’ allows the user to search for real-time information within a conversation.

Additionally, ‘data analysis’ will allow users to upload documents, such as charts and graphs, and ask the chatbot to analyze the information for them.

Mirati said this will expand access to users and the developers who create custom specific use GPTs and make them available in the GPT store.

Finally, GPT-4o is being brought to the API – 50% cheaper, 2X faster and 5X higher rate limits – so engineers can start developing with 4o,

Murati said those who pay for a premium subscription to the more efficient GPT-4o will also have up to 5 times the capacity limit than free users. Plus users will also get early access to features “like our new macOS desktop app and next-generation voice and video capabilities,” the company posted on X.

OpenAI has been under pressure to expand its ChatGPT user base, which, according to the latest stats tallied by SEO.ai, has more than “180 million users” or about “1.6 billion visits per month."

The move comes just one day before Alphabet’s Google I/O holds its much-anticipated developers conference, expecting to introduce its own AI software integrations, such as with Google’s search and Android OS 15, codename Vanilla Ice Cream.