Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Hegseth Spars With Senate Democrats Over War in Iran

    Tim Cook takes victory lap as Apple’s financial results soar past Wall Street expectations | Apple

    South East Water’s management should be sacked, MPs say

    Facebook X (Twitter) Instagram
    Facebook X (Twitter) YouTube LinkedIn
    Naija Global News |
    Friday, May 1
    • Business
    • Health
    • Politics
    • Science
    • Sports
    • Education
    • Social Issues
    • Technology
    • More
      • Crime & Justice
      • Environment
      • Entertainment
    Naija Global News |
    You are at:Home»Environment»OpenAI Model Earns Gold-Medal Score at International Math Olympiad and Advances Path to Artificial General Intelligence
    Environment

    OpenAI Model Earns Gold-Medal Score at International Math Olympiad and Advances Path to Artificial General Intelligence

    onlyplanz_80y6mtBy onlyplanz_80y6mtAugust 22, 2025008 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Email
    OpenAI Model Earns Gold-Medal Score at International Math Olympiad and Advances Path to Artificial General Intelligence

    peshkov/Getty Images

    Share
    Facebook Twitter LinkedIn Pinterest Email

    A few months before the 2025 International Mathematical Olympiad (IMO) in July, a three-person team at OpenAI made a long bet that they could use the competition’s brutally tough problems to train an artificial intelligence model to think on its own for hours so that it was capable of writing math proofs. Their goal wasn’t simply to create an AI that could do complex math but one that could evaluate ambiguity and nuance—skills AIs will need if they are to someday take on many challenging real-world tasks. In fact, these are precisely the skills required to create artificial general intelligence, or AGI: human-level understanding and reasoning.

    The IMO, held this year on Australia’s Sunshine Coast, is the world’s premier math competition for high schoolers, bringing together top contenders from more than 100 countries. All are given the same six problems—three per day, each worth seven points—to solve over two days. But these problems are nothing like what you probably remember from high school. Rather than a brief numeric answer, each demands sustained reasoning and creativity in the form of a pages-long written proof. These logical, step-by-step arguments have to span many fields of mathematics—exactly the sort of problems that, until just this year, AI systems failed at spectacularly.

    The OpenAI team of researchers and engineers—Alex Wei, Sheryl Hsu and Noam Brown—used a general-purpose reasoning model: an AI designed to “think” through challenging problems by breaking them into steps, checking its own work and adapting its approach as it goes. Though AI systems couldn’t officially compete as participants, the notoriously tough test served as a demonstration of what they can do, and the AIs tackled this year’s questions in the same test format and with the same constraints as human participants. Upon receiving the questions, the team’s experimental system worked for two 4.5‑hour sessions (just as the student contestants did), without tools or the Internet—it had absolutely no external assistance from tools such as search engines or software designed for math. The proofs it produced were graded by three former IMO medalists and posted online. The AI completed five of the six problems correctly, receiving 35 out of 42 points—the minimum required for an IMO gold medal. (Google’s DeepMind AI system also achieved that score this year.) Out of 630 competitors, only 26 students, or 4 percent, outperformed the AI; five students achieved perfect 42s. Given that a year ago language-based AI systems like OpenAI’s struggled to do elementary math, the results were a dramatic leap in performance.

    On supporting science journalism

    If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

    In the following conversation, Scientific American spoke with two members of the OpenAI team, Alex Wei and Sheryl Hsu, to discuss how they conducted their work, why the model’s lack of response to the sixth question was actually a major step toward addressing AI’s “hallucination” problem and how developing a system capable of writing complex proofs could help lead to artificial general intelligence.

    [An edited transcript of the interview follows.]

    What led you to suddenly begin preparing an AI model for the IMO just a few months before the competition? What was the spark?

    WEI: I had been thinking about math proofs for quite a while. I’m on a team at OpenAI called MathGen. We had just seen the results progress a lot. We felt like we had a shot to get a model that could do really well at the IMO, and we wanted to make a mad dash to get there.

    HSU: I used to do math competitions. [Wei] used to do math competitions—he was a lot better than me. The IMO is definitely well known within the [AI research] community, including among researchers at OpenAI. So it was really inspiring to push specifically for that.

    Can you talk about your decision to work with a general‑purpose AI system rather than a system that was specifically designed to answer math problems?

    WEI: The philosophy is that we want to build general‑purpose AI and develop methods that don’t just work for math. Math is a very good proving ground for AI because it’s fairly objective: if you have a proof, it’s easier to get consensus on whether it’s correct. That’s harder for, say, poetry—you’ll have more disagreement among readers. And IMO problems are very hard, so we wanted to tackle hard problems with general‑purpose methods in the hope that they’ll also apply to domains beyond math.

    HSU: I’d also say the goal at OpenAI is to build AGI—it’s not necessarily to write papers or win competitions. It was important that everything we did for this project also be useful for the bigger goal of building AGI and better models that users can actually use.

    In what ways could a reasoning model winning a gold in the IMO help lead to AGI?

    WEI: One perspective is to think in terms of how long tasks take. A year ago, ChatGPT could only do very basic math problems. Two years ago—and even a year and a half ago—we were often thinking about grade‑school math problems you’d find on fifth‑grade homework. For someone really good at math, those take a second or two to read and solve. Then we started evaluating using AIME [the American Invitational Mathematics Examination, a 15-question high school math contest]. That takes around 10 minutes per problem, with about three hours for 15 problems. The IMO is four and a half hours for just three problems—that’s 90 minutes per problem. ChatGPT started off being good for quick questions. Now it’s better at longer‑running tasks, such as “Can you edit this paragraph for me?” As AI improves, you can expand the time horizon of tasks, and you can see that progression clearly in math.

    HSU: Another aspect is that reasoning models were previously very good at tasks that are easy to verify. If you’re solving a non‑proof‑based math problem, there’s one numerically correct answer. It’s easy to check. But in the real world—and in the tasks people actually want help with—it’s more complex. There’s nuance: maybe it’s mostly correct but has some errors; maybe it’s correct but could be stylized better. Proof‑based math isn’t trivial to evaluate. If we think about AGI, those tasks won’t be easy to judge as correct or not; they’ll be more loosely specified and harder overall.

    What was the process for training the model?

    WEI: In general, reinforcement learning trains a model by rewarding good behavior and penalizing bad behavior. If you repeatedly reinforce good behavior and discourage bad behavior, the model becomes more likely to exhibit the good behavior.

    HSU: Toward the end, we also scaled up test‑time compute [how long the AI model was able to “think” before answering]. Previously, for a human, problems of this sort might be a few minutes; now we were scaling to hours. That extra thinking time gave surprising gains. There was a moment when we ran evaluations on our internal test set that took a long time because of the increased test‑time compute. When we finally looked at the results—and Alex graded them—seeing the progress made me think gold might be within reach. That was pretty exciting.

    On the IMO test, the model you developed got five out of six answers correct. But with the sixth question, the model didn’t try to provide an answer. Can you tell me more about the significance of this response?

    WEI: The model knowing what it doesn’t know was one of the early signs of [progress] we saw. Today if you use ChatGPT, you’ll sometimes see “hallucinations”—models don’t reliably know when they don’t know. That capability isn’t specific to math. I’d love it if, for everyday questions, the model could honestly say when it doesn’t know instead of giving an answer I must verify independently.

    What kind of impact could your work on this model have on future models?

    HSU: Everything we did for this project is fairly general‑purpose—being able to grade outputs that aren’t single answers and to work on hard problems for a long time while making steady progress. Those contributed a lot to the success here, and now we and others at OpenAI are applying them beyond math. It’s not in GPT‑5, but in future models, we’re excited to integrate these capabilities.

    WEI: If you look at the solutions we publicly posted for the IMO problems, some are very long—five to 10 pages. This model can generate long outputs that are consistent and coherent, without mistakes. Many current state‑of‑the‑art models can’t produce a totally coherent five‑page report. I’m excited that this care and precision will help in many other domains.

    Advances Artificial earns General GoldMedal Intelligence International Math model Olympiad OpenAI path score
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleBaby food firms given 18 months to improve quality of products in England | Children’s health
    Next Article How NASA’s Juno Probe Changed Everything We Know about Jupiter
    onlyplanz_80y6mt
    • Website

    Related Posts

    AI outperforms doctors in Harvard trial of emergency triage diagnoses | AI (artificial intelligence)

    April 30, 2026

    ‘Suicidal’ model of capitalism leading to war and fascism, climate summit told | Climate crisis

    April 29, 2026

    GM expecting $500m Trump tariff refund, boosting its 2026 earnings outlook | General Motors

    April 28, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Watch Lady Gaga’s Perform ‘Vanish Into You’ on ‘Colbert’

    September 9, 20251 Views

    Advertisers flock to Fox seeking an ‘audience of one’ — Donald Trump

    July 13, 20251 Views

    A Setback for Maine’s Free Community College Program

    June 19, 20251 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    At Chile’s Vera Rubin Observatory, Earth’s Largest Camera Surveys the Sky

    By onlyplanz_80y6mtJune 19, 2025

    SpaceX Starship Explodes Before Test Fire

    By onlyplanz_80y6mtJune 19, 2025

    How the L.A. Port got hit by Trump’s Tariffs

    By onlyplanz_80y6mtJune 19, 2025

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Watch Lady Gaga’s Perform ‘Vanish Into You’ on ‘Colbert’

    September 9, 20251 Views

    Advertisers flock to Fox seeking an ‘audience of one’ — Donald Trump

    July 13, 20251 Views

    A Setback for Maine’s Free Community College Program

    June 19, 20251 Views
    Our Picks

    Hegseth Spars With Senate Democrats Over War in Iran

    Tim Cook takes victory lap as Apple’s financial results soar past Wall Street expectations | Apple

    South East Water’s management should be sacked, MPs say

    Recent Posts
    • Hegseth Spars With Senate Democrats Over War in Iran
    • Tim Cook takes victory lap as Apple’s financial results soar past Wall Street expectations | Apple
    • South East Water’s management should be sacked, MPs say
    • US Congress passes short-term renewal of Fisa warrantless spying powers | US news
    • ‘I am invoking Martha’s rule’: how a woman saved her father from near death in hospital | Health
    © 2026 naijaglobalnews. Designed by Pro.
    • About Us
    • Disclaimer
    • Get In Touch
    • Privacy Policy
    • Terms and Conditions

    Type above and press Enter to search. Press Esc to cancel.