Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Get a Samsung OLED gaming monitor for just $350

    Qualcomm Snapdragon X2 Elite tops the Apple M5 in new test video

    Tapo’s 1440p Wi-Fi security cam is 42% off! Grab it now for $70

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026

      To avoid accusations of AI cheating, college students are turning to AI

      January 29, 2026

      ChatGPT can embrace authoritarian ideas after just one prompt, researchers say

      January 24, 2026
    • Business

      New VoidLink malware framework targets Linux cloud servers

      January 14, 2026

      Nvidia Rubin’s rack-scale encryption signals a turning point for enterprise AI security

      January 13, 2026

      How KPMG is redefining the future of SAP consulting on a global scale

      January 10, 2026

      Top 10 cloud computing stories of 2025

      December 22, 2025

      Saudia Arabia’s STC commits to five-year network upgrade programme with Ericsson

      December 18, 2025
    • Crypto

      Bernstein Discusses Bitcoin’s Weakest Bear Market Yet – “Nothing Broke”

      February 9, 2026

      Ethereum Price Hits Breakdown Target — But Is a Bigger Drop to $1,000 Coming?

      February 9, 2026

      Damex Secures MiCA CASP Licence, Establishing Its Position as a Tier-1 Digital Asset Institution in Europe

      February 9, 2026

      Bitget and BlockSec Introduce the UEX Security Standard, Setting a New Benchmark for Universal Exchanges

      February 9, 2026

      3 Meme Coins To Watch In The Second Week Of February 2026

      February 9, 2026
    • Technology

      Get a Samsung OLED gaming monitor for just $350

      February 10, 2026

      Qualcomm Snapdragon X2 Elite tops the Apple M5 in new test video

      February 10, 2026

      Tapo’s 1440p Wi-Fi security cam is 42% off! Grab it now for $70

      February 10, 2026

      This 8BitDo Retro wireless ‘mecha’ keyboard is just $63 today

      February 10, 2026

      Star power, AI jabs and Free Bird: Digiday’s guide to what was in and out at the Super Bowl

      February 10, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»What’s next for AI and math
    Technology

    What’s next for AI and math

    TechAiVerseBy TechAiVerseJune 5, 2025No Comments14 Mins Read2 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    What’s next for AI and math
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    What’s next for AI and math

    MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here.

    The way DARPA tells it, math is stuck in the past. In April, the US Defense Advanced Research Projects Agency kicked off a new initiative called expMath—short for Exponentiating Mathematics—that it hopes will speed up the rate of progress in a field of research that underpins a wide range of crucial real-world applications, from computer science to medicine to national security.

    “Math is the source of huge impact, but it’s done more or less as it’s been done for centuries—by people standing at chalkboards,” DARPA program manager Patrick Shafto said in a video introducing the initiative. 

    The modern world is built on mathematics. Math lets us model complex systems such as the way air flows around an aircraft, the way financial markets fluctuate, and the way blood flows through the heart. And breakthroughs in advanced mathematics can unlock new technologies such as cryptography, which is essential for private messaging and online banking, and data compression, which lets us shoot images and video across the internet.

    But advances in math can be years in the making. DARPA wants to speed things up. The goal for expMath is to encourage mathematicians and artificial-intelligence researchers to develop what DARPA calls an AI coauthor, a tool that might break large, complex math problems into smaller, simpler ones that are easier to grasp and—so the thinking goes—quicker to solve.

    Mathematicians have used computers for decades, to speed up calculations or check whether certain mathematical statements are true. The new vision is that AI might help them crack problems that were previously uncrackable.  

    But there’s a huge difference between AI that can solve the kinds of problems set in high school—math that the latest generation of models has already mastered—and AI that could (in theory) solve the kinds of problems that professional mathematicians spend careers chipping away at.

    On one side are tools that might be able to automate certain tasks that math grads are employed to do; on the other are tools that might be able to push human knowledge beyond its existing limits.

    Here are three ways to think about that gulf.

    1/ AI needs more than just clever tricks

    Large language models are not known to be good at math. They make things up and can be persuaded that 2 + 2 = 5. But newer versions of this tech, especially so-called large reasoning models (LRMs) like OpenAI’s o3 and Anthropic’s Claude 4 Thinking, are far more capable—and that’s got mathematicians excited.

    This year, a number of LRMs, which try to solve a problem step by step rather than spit out the first result that comes to them, have achieved high scores on the American Invitational Mathematics Examination (AIME), a test given to the top 5% of US high school math students.

    At the same time, a handful of new hybrid models that combine LLMs with some kind of fact-checking system have also made breakthroughs. Emily de Oliveira Santos, a mathematician at the University of São Paulo, Brazil, points to Google DeepMind’s AlphaProof, a system that combines an LLM with DeepMind’s game-playing model AlphaZero, as one key milestone. Last year AlphaProof became the first computer program to match the performance of a silver medallist at the International Math Olympiad, one of the most prestigious mathematics competitions in the world.

    And in May, a Google DeepMind model called AlphaEvolve discovered better results than anything humans had yet come up with for more than 50 unsolved mathematics puzzles and several real-world computer science problems.

    The uptick in progress is clear. “GPT-4 couldn’t do math much beyond undergraduate level,” says de Oliveira Santos. “I remember testing it at the time of its release with a problem in topology, and it just couldn’t write more than a few lines without getting completely lost.” But when she gave the same problem to OpenAI’s o1, an LRM released in January, it nailed it.

    Does this mean such models are all set to become the kind of coauthor DARPA hopes for? Not necessarily, she says: “Math Olympiad problems often involve being able to carry out clever tricks, whereas research problems are much more explorative and often have many, many more moving pieces.” Success at one type of problem-solving may not carry over to another.

    Others agree. Martin Bridson, a mathematician at the University of Oxford, thinks the Math Olympiad result is a great achievement. “On the other hand, I don’t find it mind-blowing,” he says. “It’s not a change of paradigm in the sense that ‘Wow, I thought machines would never be able to do that.’ I expected machines to be able to do that.”

    That’s because even though the problems in the Math Olympiad—and similar high school or undergraduate tests like AIME—are hard, there’s a pattern to a lot of them. “We have training camps to train high school kids to do them,” says Bridson. “And if you can train a large number of people to do those problems, why shouldn’t you be able to train a machine to do them?”

    Sergei Gukov, a mathematician at the California Institute of Technology who coaches Math Olympiad teams, points out that the style of question does not change too much between competitions. New problems are set each year, but they can be solved with the same old tricks.

    “Sure, the specific problems didn’t appear before,” says Gukov. “But they’re very close—just a step away from zillions of things you have already seen. You immediately realize, ‘Oh my gosh, there are so many similarities—I’m going to apply the same tactic.’” As hard as competition-level math is, kids and machines alike can be taught how to beat it.

    That’s not true for most unsolved math problems. Bridson is president of the Clay Mathematics Institute, a nonprofit US-based research organization best known for setting up the Millenium Prize Problems in 2000—seven of the most important unsolved problems in mathematics, with a $1 million prize to be awarded to the first person to solve each of them. (One problem, the Poincaré conjecture, was solved in 2010; the others, which include P versus NP and the Riemann hypothesis, remain open). “We’re very far away from AI being able to say anything serious about any of those problems,” says Bridson.

    And yet it’s hard to know exactly how far away, because many of the existing benchmarks used to evaluate progress are maxed out. The best new models already outperform most humans on tests like AIME.

    To get a better idea of what existing systems can and cannot do, a startup called Epoch AI has created a new test called FrontierMath, released in December. Instead of co-opting math tests developed for humans, Epoch AI worked with more than 60 mathematicians around the world to come up with a set of math problems from scratch.

    FrontierMath is designed to probe the limits of what today’s AI can do. None of the problems have been seen before and the majority are being kept secret to avoid contaminating training data. Each problem demands hours of work from expert mathematicians to solve—if they can solve it at all: some of the problems require specialist knowledge to tackle.

    FrontierMath is set to become an industry standard. It’s not yet as popular as AIME, says de Oliveira Santos, who helped develop some of the problems: “But I expect this to not hold for much longer, since existing benchmarks are very close to being saturated.”

    On AIME, the best large language models (Anthropic’s Claude 4, OpenAI’s o3 and o4-mini, Google DeepMind’s Gemini 2.5 Pro, X-AI’s Grok 3) now score around 90%. On FrontierMath, 04-mini scores 19% and Gemini 2.5 Pro scores 13%. That’s still remarkable, but there’s clear room for improvement.     

    FrontierMath should give the best sense yet just how fast AI is progressing at math. But there are some problems that are still too hard for computers to take on.

    2/ AI needs to manage really vast sequences of steps

    Squint hard enough and in some ways math problems start to look the same: to solve them you need to take a sequence of steps from start to finish. The problem is finding those steps. 

    “Pretty much every math problem can be formulated as path-finding,” says Gukov. What makes some problems far harder than others is the number of steps on that path. “The difference between the Riemann hypothesis and high school math is that with high school math the paths that we’re looking for are short—10 steps, 20 steps, maybe 40 in the longest case.” The steps are also repeated between problems.

    “But to solve the Riemann hypothesis, we don’t have the steps, and what we’re looking for is a path that is extremely long”—maybe a million lines of computer proof, says Gukov.

    Finding very long sequences of steps can be thought of as a kind of complex game. It’s what DeepMind’s AlphaZero learned to do when it mastered Go and chess. A game of Go might only involve a few hundred moves. But to win, an AI must find a winning sequence of moves among a vast number of possible sequences. Imagine a number with 100 zeros at the end, says Gukov.

    But that’s still tiny compared with the number of possible sequences that could be involved in proving or disproving a very hard math problem: “A proof path with a thousand or a million moves involves a number with a thousand or a million zeros,” says Gukov. 

    No AI system can sift through that many possibilities. To address this, Gukov and his colleagues developed a system that shortens the length of a path by combining multiple moves into single supermoves. It’s like having boots that let you take giant strides: instead of taking 2,000 steps to walk a mile, you can now walk it in 20.

    The challenge was figuring out which moves to replace with supermoves. In a series of experiments, the researchers came up with a system in which one reinforcement-learning model suggests new moves and a second model checks to see if those moves help.

    They used this approach to make a breakthrough in a math problem called the Andrews-Curtis conjecture, a puzzle that has been unsolved for 60 years. It’s a problem that every professional mathematician will know, says Gukov.

    (An aside for math stans only: The AC conjecture states that a particular way of describing a type of set called a trivial group can be translated into a different but equivalent description with a certain sequence of steps. Most mathematicians think the AC conjecture is false, but nobody knows how to prove that. Gukov admits himself that it is an intellectual curiosity rather than a practical problem, but an important problem for mathematicians nonetheless.)

    Gukov and his colleagues didn’t solve the AC conjecture, but they found that a counterexample (suggesting that the conjecture is false) proposed 40 years ago was itself false. “It’s been a major direction of attack for 40 years,” says Gukov. With the help of AI, they showed that this direction was in fact a dead end.   

    “Ruling out possible counterexamples is a worthwhile thing,” says Bridson. “It can close off blind alleys, something you might spend a year of your life exploring.” 

    True, Gukov checked off just one piece of one esoteric puzzle. But he thinks the approach will work in any scenario where you need to find a long sequence of unknown moves, and he now plans to try it out on other problems.

    “Maybe it will lead to something that will help AI in general,” he says. “Because it’s teaching reinforcement learning models to go beyond their training. To me it’s basically about thinking outside of the box—miles away, megaparsecs away.”  

    3/ Can AI ever provide real insight?

    Thinking outside the box is exactly what mathematicians need to solve hard problems. Math is often thought to involve robotic, step-by-step procedures. But advanced math is an experimental pursuit, involving trial and error and flashes of insight.

    That’s where tools like AlphaEvolve come in. Google DeepMind’s latest model asks an LLM to generate code to solve a particular math problem. A second model then evaluates the proposed solutions, picks the best, and sends them back to the LLM to be improved. After hundreds of rounds of trial and error, AlphaEvolve was able to come up with solutions to a wide range of math problems that were better than anything people had yet come up with. But it can also work as a collaborative tool: at any step, humans can share their own insight with the LLM, prompting it with specific instructions.

    This kind of exploration is key to advanced mathematics. “I’m often looking for interesting phenomena and pushing myself in a certain direction,” says Geordie Williamson, a mathematician at the University of Sydney in Australia. “Like: ‘Let me look down this little alley. Oh, I found something!’”

    Williamson worked with Meta on an AI tool called PatternBoost, designed to support this kind of exploration. PatternBoost can take a mathematical idea or statement and generate similar ones. “It’s like: ‘Here’s a bunch of interesting things. I don’t know what’s going on, but can you produce more interesting things like that?’” he says.

    Such brainstorming is essential work in math. It’s how new ideas get conjured. Take the icosahedron, says Williamson: “It’s a beautiful example of this, which I kind of keep coming back to in my own work.” The icosahedron is a 20-sided 3D object where all the faces are triangles (think of a 20-sided die). The icosahedron is the largest of a family of exactly five such objects: there’s the tetrahedron (four sides), cube (six sides), octahedron (eight sides), and dodecahedron (12 sides).

    Remarkably, the fact that there are exactly five of these objects was proved by mathematicians in ancient Greece. “At the time that this theorem was proved, the icosahedron didn’t exist,” says Williamson. “You can’t go to a quarry and find it—someone found it in their mind. And the icosahedron goes on to have a profound effect on mathematics. It’s still influencing us today in very, very profound ways.”

    For Williamson, the exciting potential of tools like PatternBoost is that they might help people discover future mathematical objects like the icosahedron that go on to shape the way math is done. But we’re not there yet. “AI can contribute in a meaningful way to research-level problems,” he says. “But we’re certainly not getting inundated with new theorems at this stage.”

    Ultimately, it comes down to the fact that machines still lack what you might call intuition or creative thinking. Williamson sums it up like this: We now have AI that can beat humans when it knows the rules of the game. “But it’s one thing for a computer to play Go at a superhuman level and another thing for the computer to invent the game of Go.”

    “I think that applies to advanced mathematics,” he says. “Breakthroughs come from a new way of thinking about something, which is akin to finding completely new moves in a game. And I don’t really think we understand where those really brilliant moves in deep mathematics come from.”

    Perhaps AI tools like AlphaEvolve and PatternBoost are best thought of as advance scouts for human intuition. They can discover new directions and point out dead ends, saving mathematicians months or years of work. But the true breakthroughs will still come from the minds of people, as has been the case for thousands of years.

    For now, at least. “There’s plenty of tech companies that tell us that won’t last long,” says Williamson. “But you know—we’ll see.” 

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleMicrosoft unveils free EU cybersecurity program for governments
    Next Article How an Unstable US Dollar is Making Americans Rush to Bitcoin
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Get a Samsung OLED gaming monitor for just $350

    February 10, 2026

    Qualcomm Snapdragon X2 Elite tops the Apple M5 in new test video

    February 10, 2026

    Tapo’s 1440p Wi-Fi security cam is 42% off! Grab it now for $70

    February 10, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025661 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025250 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025149 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025111 Views
    Don't Miss
    Technology February 10, 2026

    Get a Samsung OLED gaming monitor for just $350

    Get a Samsung OLED gaming monitor for just $350 Image: Samsung To paraphrase a certain…

    Qualcomm Snapdragon X2 Elite tops the Apple M5 in new test video

    Tapo’s 1440p Wi-Fi security cam is 42% off! Grab it now for $70

    This 8BitDo Retro wireless ‘mecha’ keyboard is just $63 today

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Get a Samsung OLED gaming monitor for just $350

    February 10, 20263 Views

    Qualcomm Snapdragon X2 Elite tops the Apple M5 in new test video

    February 10, 20263 Views

    Tapo’s 1440p Wi-Fi security cam is 42% off! Grab it now for $70

    February 10, 20264 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    This new Roomba finally solves the big problem I have with robot vacuums

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.