Anthropic’s updated mid-size AI model claims the crown

AI startup Anthropic has launched a new large language model (LLM), Claude 3.5 Sonnet, which replaces the previous free tier Claude 3.0 Sonnet. The company claims that this chatbot is currently the most powerful, outperforming even the top-tier models from competitors.

Anthropic’s largest mode, Claude 3.0 Opus, which is only available for subscribers, has not been updated yet. The 3.5 version is coming later this year.

The new Claude 3.5 Sonnet model is also free. Subscribers can expect higher rate limits. The largest available token context window remains the same, at 200,000.

“Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations, with the speed and cost of our mid-tier model, Claude 3 Sonnet,” Anthropic said.

According to measurements conducted by the company, the new Sonnet outperforms OpenAI’s GPT-4o and Google’s Gemini 1.5 Pro across most industry benchmarks.

If true, the Sonnet sets new records for graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding proficiency. It is also a capable vision model that can interpret pictures, charts, and graphs and accurately transcribe text from imperfect images.

Sonnet results

It can do that while operating at twice the speed of the Claude 3 Opus, the company’s previous best and largest model.

“It shows marked improvement in grasping nuance, humor, and complex instructions, and is exceptional at writing high-quality content with a natural, relatable tone,” the company said.

In a coding test, Claude 3.5 Sonnet solved 64% of problems, outperforming Claude 3 Opus’ score of 38%. Anthropic said that LLM can “independently write, edit, and execute code with sophisticated reasoning and troubleshooting capabilities.”

Anthropic also updated the user interface with a new feature for interacting with Claude. The LLM-generated code appears in a dedicated window alongside the conversation, which allows editing and building in real-time, “seamlessly integrating AI-generated content” into projects and workflows.

The startup is working on new modalities and features to support more use cases and integrations.

“To complete the Claude 3.5 model family, we’ll be releasing Claude 3.5 Haiku and Claude 3.5 Opus later this year,” the company said.