New Open AI model can reason and excels in coding, but is slower

OpenAI has released a preview of its latest model OpenAI o1, codenamed strawberry.

Starting Thursday, users of OpenAI Premium and Team plans can try out a a preview of OpenAI’s most powerful model, the o1. It can reason through complex tasks and solve harder problems than previous models in science, coding, and math.

OpenAI claims that it developed o1 to spend more time thinking through problems before it responds, much like a person would. Therefore, it will take longer for the model to reply to a prompt.

“In our tests, the next model update performs similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology. We also found that it excels in math and coding. In a qualifying exam for the International Mathematics Olympiad (IMO), GPT-4o correctly solved only 13% of problems, while the reasoning model scored 83%,” says the company in its blogpost.

However, it underlines that the o1 is still early and can’t browse web information and upload files and images. For such cases, the company’s GPT-4o model will be used.

here is o1, a series of our most capable and aligned models yet:https://t.co/yzZGNN8HvD

o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it. pic.twitter.com/Qs1HoSDOz1
undefined Sam Altman (@sama) September 12, 2024

In addition, OpenAI is also releasing a model for developers, called o1-mini, which is “faster, cheaper and particularly effective at coding.”

Both o1-preview and o1-mini can be selected manually in the model picker, and at launch, weekly rate limits will be 30 messages for o1-preview and 50 for o1-mini.

According to OpenAI, o1 can be particularly useful for healthcare researchers to annotate cell sequencing data, by physicists to generate complicated mathematical formulas needed for quantum optics, and by developers in all fields to build and execute multi-step workflows.

The o1 also uses a new safety training approach that “harnesses their reasoning capabilities to make them adhere to safety and alignment guidelines.”

Open AI says it tested this in its jailbreaking tests, where it measures how well the model adheres to safety rules while users try to bypass them. In one of its tests, GPT-4o scored 22 (on a scale of 0-100) while o1-preview model scored 84.