Google DeepMind announces new breakthrough in AI model that can solve difficult math problems
Google DeepMind has announced a breakthrough in building AI systems that can handle complex mathematical problems.
The research division of Alphabet Inc.'s Google said Thursday it has developed a software system that combines multiple AI models to help high school students perform in the top quarter of test-takers in the International Mathematical Olympiad (IMO), a global test that measures their math talent — good enough to win a silver medal at the competition.
While the feat marks a proud milestone in the machine vs. mathematician race, it also opens up new possibilities for combining different approaches to AI to create more capable hybrid AI systems that Google says could eventually be adopted into commercial products such as its Gemini lineup of AI tools.
The news follows advancements to a system the AI ​​Lab announced in January called AlphaGeometry, which can solve IMO geometry problems at roughly the same level as top high school students. Combining a new model called AlphaSolver with an updated and improved AlphaGeometry 2, the new system can tackle any kind of math problem and produce elegant solutions.
The new system was able to answer the most difficult IMO problem, which only five of the 609 participants in last week's competition were able to solve. That said, the new system isn't perfect: For two of the six IMO problems, the new system couldn't find the answer, and for one problem, it took three days to arrive at the correct answer. Human participants have four and a half hours to solve the three problems, so on average, they can't take more than 90 minutes per problem.
Google DeepMind researchers said the new system is a first step toward more powerful AI models that can plan and reason about complex tasks, but cautioned that the approach works best in situations where it's possible to clearly determine whether the output is valid. This is the case, for example, with software coding, where the code is compiled and executed only if it's valid. David Silver, one of the Google DeepMind researchers who worked on the new system, said it could also work in areas where humans can provide clear feedback on whether an AI-generated solution is appropriate.
Google DeepMind said it will incorporate insights from the new system into future versions of its Gemini AI model, but did not specify how it will do this or how quickly Gemini will realize these improvements in mathematical capabilities.
Silver acknowledged that in many real-world situations, the validity of an answer is highly subjective, or the validity of a solution can only be determined over a long period of time, which he said would make it difficult to successfully apply the techniques Google DeepMind used for the IMO problem to such real-world problems.
AlphaZero to the rescue
Unlike other well-known AI models that consist of a single large neural network (a type of AI software loosely based on the human brain), the AlphaSolver system contains multiple neural networks, each performing a different function.
A large-scale language model (LLM), in this case Google's Gemini model, is used as part of the process. However, the LLM itself does not perform mathematical reasoning. LLMs, which underpin popular AI chatbots like Gemini, OpenAI's ChatGPT, Anthropics Claude, and the Metas AI chatbot, have struggled to solve math problems unless they have access to external tools like a calculator or specialized math software.
Instead, the LLM is fine-tuned to translate text-based math problems into formal math language. It then passes the problem to another AI model, Google DeepMinds AlphaZero. It was developed in 2017 and was originally used to learn to play the strategy board games Chess, Go, and Shogi at superhuman levels. But it turns out AlphaZero can be used to solve any kind of problem with clear rules and a system that makes it easy to keep track of scores.
In this case, the AlphaZero component is trained to suggest proof steps for the problem in the mathematical programming language Lean. If the proof step is valid, Lean compiles it correctly; if it's not valid, it doesn't compile. This provides AlphaZero with a reward signal, like points in a video game. In this way, the AlphaZero component of AlphaSolver learns, by trial and error, which steps are more likely to lead to a valid solution. According to Google DeepMind, AlphaSolver was trained on about one million example IMO problems in the weeks leading up to the competition, and continued to be improved as it worked on problems in the IMO competition.
If the problem involved geometry, AlphaGeometry 2 was given the problem instead. AlphaGeometry 2 is also a hybrid system, combining an LLM component with a component that uses symbolic reasoning. The new AlphaGeometry was able to solve 83% of IMO geometry problems, compared to only 53% for its predecessor. In one case, AlphaGeometry was able to solve a highly complex geometry problem in just 19 seconds, a feat that was more like inspiration than a brute force approach based on endless trial and error. In another case, AlphaGeometry presented a proof that initially baffled some of the mathematicians who looked at it, but who determined that it was in fact an elegant and highly unusual way to solve the problem.
Influence on human mathematicians
Pushmeet Kohli, head of AI science at Google DeepMinds, said he sees AlphaSolver and AlphaGeometry 2 primarily as tools to help mathematicians with their research. Silver said he doesn't think these new math AIs call into question the importance of academic mathematicians.
But Timothy Gowers, director of mathematical research at the University of Cambridge and recipient of the Fields Medalla Award, awarded every four years to two to four mathematicians under 40 who have contributed most significantly to the field, said he had looked at the proofs produced by AlphaSolver and AlphaGeometry 2 and was impressed. “I recognized some familiar arguments coming out of the system,” he said.
He also said that some problems required him, a human mathematician, to dig quite deep to find the so-called magic key that suddenly turned a seemingly unsolvable problem into a quickly solvable one. Gowers said he was surprised that the system found some of these magic keys because he intuited that they would be hard to stumble upon by naive trial and error without any understanding of the mathematical principles involved. But he said he would reserve judgment on whether this meant that AlphaSolver had actually developed something like mathematical intuition. He said more research is needed to understand exactly how the system worked out the answers to IMO problems.
Gowers pointed out that the IMO problem is much simpler than the problems mathematicians are studying. However, compared to Kohli and Silver, Gowers was much less optimistic about the future if AI models continue to advance at their current pace. “I think that when computers become very good at finding extremely hard proofs, that's pretty much the end of mathematics research,” he said. He's not saying we're close to that point right now, just thinking a long way down the road, but how far out that actually is is very difficult to say.
