Two new artificial intelligence (AI) systems from Google work in tandem to achieve silver medalist-level math problem-solving skills, successfully tackling four out of six problems from this year’s International Mathematical Olympiad (IMO).
The IMO, held annually since 1959, is the most prestigious and largest competition for young mathematicians. Competitors are provided with six hard problems, and each can earn seven points, with a total maximum of 42.
This year, the gold-medal threshold starts at 29 points. This was achieved by 58 of 609 contestants at the official competition.
“We earned 28 out of 42 total points, achieving the same level as a silver medalist in the competition,” Google DeepMind said.
To better understand this achievement, it’s worth noting that elite pre-college mathematicians sometimes train for thousands of hours to tackle six exceptionally difficult problems in algebra, combinatorics, geometry, and number theory. In the competition, students submit answers in two sessions of 4.5 hours each.
AI is not faster than humans, at least not yet.
“Our systems solved one problem within minutes and took up to three days to solve the others,” Google said.
Two AIs working in tandem were AlphaProof, a new reinforcement-learning-based system for formal math reasoning, and AlphaGeometry 2, an improved version of our geometry-solving system.
AlphaProof solved two algebra problems and one number theory problem by determining the answer and proving it was correct.
According to Google, this included the hardest problem in the competition, which only five human contestants solved.
Meanwhile, AlphaGeometry 2 proved the geometry problem. The two combinatorics problems remained unsolved.
“The fact that the program can come up with a non-obvious construction like this is very impressive, and well beyond what I thought was state of the art,” said Prof Sir Timothy Gowers, an IMO gold medalist and Fields Medal winner.
The AlphaProof system trains itself to prove mathematical statements. It combines a large language model, Gemini, which can hallucinate plausible but incorrect reasonings, with AlphaZero, an AI that previously mastered Go, chess, and shogi.
“When presented with a problem, AlphaProof generates solution candidates and then proves or disproves them by searching over possible proof steps in Lean.
Each proof that was found and verified is used to reinforce AlphaProof’s language model, enhancing its ability to solve subsequent, more challenging problems,” Google DeepMind explained.
The AlphaGeometry2, “a neuro-symbolic hybrid system,” significantly improves from the first version and is two orders of magnitude faster.
Before this year’s competition, AlphaGeometry 2 could solve 83% of all historical IMO geometry problems from the past 25 years, compared to the 53% rate achieved by its predecessor. For IMO 2024, this system cracked one problem within 19 seconds.
Google plans to release technical details on AlphaProof and aims to develop tools for mathematicians to explore hypotheses and quickly complete time-consuming elements of proof.
Your email address will not be published. Required fields are markedmarked