The biggest AI announcements from Google I/O 2024


Google parent Alphabet on Tuesday announced details of how it is building on artificial intelligence across its businesses, incorporating AI features into almost every aspect of its platform, including improvements to search, workspace, and the entire multi-modal Gemini family.

At its annual I/O developer event in Mountain View, California, CEO Sundar Pichai said the company is rolling out AI Overviews to all users in the US this week after a long period of public testing since last year.

“We’ve been investing in AI for more than a decade – and innovating at every layer of the stack: research, product, infrastructure… Still, we are in the early days of the AI platform shift. We see so much opportunity ahead, for creators, for developers, for startups, for everyone. Helping to drive those opportunities is what our Gemini era is all about,” Pichai said in the introduction to his keynote speech.

ADVERTISEMENT

AI features dominate

The most impactful introductions appear to be new AI upgrades to Google Search, Google Workspace, and its Gemini Pro 1.5 model, which gives the Google flagship chatbot the ability to make sense of a massive amount of data.

New features that will roll out in the coming months include Live, which gives the user the ability to have an in-depth two-way conversation with Gemini using their voice, and an expanded watermarking SynthID, used for AI-generated images and audio, and now text and video outputs.

“Today all of our 2 billion user products use Gemini. But we’re still in the beginning of our Gemini era,” the company posted on X right after the live event kicked off at 10:00 a.m PT.

The gamut of Google AI elements will also be built directly into Android OS, a boon for users who will soon be able to interact with their phones in ways never imagined, according to the tech giant.

Apps like Gmail, Docs, Sheets, and Calendar all get an AI boost, including Google’s workspace side panel, which is now integrated with the Gemini Pro 1.5 model.

ADVERTISEMENT

The Pro model - starting with prompt sizes of up to 1 million tokens, or pieces of data - will be available in 35 languages to users who pay for the ‘Gemini Advanced’ subscription.

The improved Gemini 1.5 Pro – considered a breakthrough in long context understanding – will also be offered with 1 million tokens to all developers globally in private preview.

“It’s the next step towards the ultimate goal of infinite context,” Google said.

As part of the upgrade, Google said it was doubling that amount, to 2 million tokens, meaning the AI potentially could answer questions when given thousands of pages of text or more than an hour of video to ingest in a single prompt.

The next new AI feature is Google’s ‘AI Overviews,’ which uses generative AI to synthesize information and answer more complex queries for which there is no simple answer on the Web.

The new AI-powered Search feature is empowered with a multi-step reasoning capability that “breaks your bigger question down into parts and figures out which problems to solve and in what order,” according to Google. Research that would have normally taken a user hours can be completed within seconds, the company said.

“Multimodality radically expands the questions we can ask, and the answers we’ll get back. It understand each type of input – and finds connections between them,” Pinchai said.

The powerful feature will also give the user the ability to ask questions with video, directly in Search. The new AI Overviews is being rolled out to all US users starting Tuesday, with other countries to follow.

ADVERTISEMENT

Tuesday’s announcements also introduced some new members to the Gemini multimodal family.

The new LearnLM and Learning coach Gem is part of a new family of fine tuned learning opportunities designed to help the user “build understanding,” not just receive an answer from Gemini without context.

LearnLM applies educational research to make products – like Search, Gemini and YouTube – “more personal, active and engaging for learners,” Google posted on X.

Google DeepMind’s prototype Project Astra was also on display Tuesday. The company posted a video on X showing how the next universal AI agent could be truly helpful in everyday life.

Another AI-powered tool shown by Google DeepMind was the new AlphaGo to help with Red-Teaming, a team of penetration testers hired to infiltrate a system, simulating a real cybersecurity attack, aimed at uncovering the companies weaknesses.

The new technique, dubbed "AI-Assisted Red Teaming," helps train agents to compete against each other, improving their red teaming capabilities, adversarial prompting and limits problematic outputs, the company said.

Google competes with OpenAI

All the new AI features unveiled on Tuesday will help investors evaluate Alphabet's progress as it races against Microsoft, OpenAI, and other competitors to dominate the emerging technology.

Microsoft-backed OpenAI on Monday showcased a new AI model called GPT-4o, which enables ChatGPT to respond via voice in real-time and be interrupted – both hallmarks of realistic voice conversations that AI voice assistants like Google Assistant have found challenging.

ADVERTISEMENT

In another sign of fierce competition between OpenAI and Google, the online search leader teased Veo, an AI model that it claims to be its most powerful yet for creating videos on a simple text command.

Google had released an earlier video-generation technology in January, only to be upstaged weeks later by OpenAI’s Sora. The ChatGPT maker has promoted its film-conjuring software among Hollywood executives, enthralling and worrying the creative industry.

Google said that filmmaker Donald Glover has experimented with its AI. The company also previewed a new text-to-image model, Imagen 3, and it touted other artist collaborations.

Finally, Google announced a scaled-down version of Gemini called 1.5 Flash, which aims to lower the cost of deploying AI and speed up responses.

Like the more capable version, Flash can take in large amounts of data while being optimized for chat applications, video and image captioning.