{"id":17103,"date":"2025-08-22T01:09:11","date_gmt":"2025-08-22T01:09:11","guid":{"rendered":"https:\/\/naijaglobalnews.org\/?p=17103"},"modified":"2025-08-22T01:09:11","modified_gmt":"2025-08-22T01:09:11","slug":"openai-model-earns-gold-medal-score-at-international-math-olympiad-and-advances-path-to-artificial-general-intelligence","status":"publish","type":"post","link":"https:\/\/naijaglobalnews.org\/?p=17103","title":{"rendered":"OpenAI Model Earns Gold-Medal Score at International Math Olympiad and Advances Path to Artificial General Intelligence"},"content":{"rendered":"<p>\n<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">A few months before the 2025 International Mathematical Olympiad (IMO) in July, a three-person team at OpenAI made a long bet that they could use the competition\u2019s brutally tough problems to train an artificial intelligence model to think on its own for hours so that it was capable of writing math proofs. Their goal wasn\u2019t simply to create an AI that could do complex math but one that could evaluate ambiguity and nuance\u2014skills AIs will need if they are to someday take on many challenging real-world tasks. In fact, these are precisely the skills required to create artificial general intelligence, or AGI: human-level understanding and reasoning.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">The IMO, held this year on Australia\u2019s Sunshine Coast, is the world\u2019s premier math competition for high schoolers, bringing together top contenders from more than 100 countries. All are given the same six problems\u2014three per day, each worth seven points\u2014to solve over two days. But these problems are nothing like what you probably remember from high school. Rather than a brief numeric answer, each demands sustained reasoning and creativity in the form of a pages-long written proof. These logical, step-by-step arguments have to span many fields of mathematics\u2014exactly the sort of problems that, until just this year, AI systems failed at spectacularly.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">The OpenAI team of researchers and engineers\u2014Alex Wei, Sheryl Hsu and Noam Brown\u2014used a general-purpose reasoning model: an AI designed to \u201cthink\u201d through challenging problems by breaking them into steps, checking its own work and adapting its approach as it goes. Though AI systems couldn\u2019t officially compete as participants, the notoriously tough test served as a demonstration of what they can do, and the AIs tackled this year\u2019s questions in the same test format and with the same constraints as human participants. Upon receiving the questions, the team\u2019s experimental system worked for two 4.5\u2011hour sessions (just as the student contestants did), without tools or the Internet\u2014it had absolutely no external assistance from tools such as search engines or software designed for math. The proofs it produced were graded by three former IMO medalists and posted online. The AI completed five of the six problems correctly, receiving 35 out of 42 points\u2014the minimum required for an IMO gold medal. (Google\u2019s DeepMind AI system also achieved that score this year.) Out of 630 competitors, only 26 students, or 4 percent, outperformed the AI; five students achieved perfect 42s. Given that a year ago language-based AI systems like OpenAI\u2019s struggled to do elementary math, the results were a dramatic leap in performance.<\/p>\n<h2>On supporting science journalism<\/h2>\n<p>If you&#8217;re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">In the following conversation, Scientific American spoke with two members of the OpenAI team, Alex Wei and Sheryl Hsu, to discuss how they conducted their work, why the model\u2019s lack of response to the sixth question was actually a major step toward addressing AI\u2019s \u201challucination\u201d problem and how developing a system capable of writing complex proofs could help lead to artificial general intelligence.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">[An edited transcript of the interview follows.]<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">What led you to suddenly begin preparing an AI model for the IMO just a few months before the competition? What was the spark?<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">WEI: I had been thinking about math proofs for quite a while. I\u2019m on a team at OpenAI called MathGen. We had just seen the results progress a lot. We felt like we had a shot to get a model that could do really well at the IMO, and we wanted to make a mad dash to get there.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">HSU: I used to do math competitions. [Wei] used to do math competitions\u2014he was a lot better than me. The IMO is definitely well known within the [AI research] community, including among researchers at OpenAI. So it was really inspiring to push specifically for that.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">Can you talk about your decision to work with a general\u2011purpose AI system rather than a system that was specifically designed to answer math problems?<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">WEI: The philosophy is that we want to build general\u2011purpose AI and develop methods that don\u2019t just work for math. Math is a very good proving ground for AI because it\u2019s fairly objective: if you have a proof, it\u2019s easier to get consensus on whether it\u2019s correct. That\u2019s harder for, say, poetry\u2014you\u2019ll have more disagreement among readers. And IMO problems are very hard, so we wanted to tackle hard problems with general\u2011purpose methods in the hope that they\u2019ll also apply to domains beyond math.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">HSU: I\u2019d also say the goal at OpenAI is to build AGI\u2014it\u2019s not necessarily to write papers or win competitions. It was important that everything we did for this project also be useful for the bigger goal of building AGI and better models that users can actually use.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">In what ways could a reasoning model winning a gold in the IMO help lead to AGI?<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">WEI: One perspective is to think in terms of how long tasks take. A year ago, ChatGPT could only do very basic math problems. Two years ago\u2014and even a year and a half ago\u2014we were often thinking about grade\u2011school math problems you\u2019d find on fifth\u2011grade homework. For someone really good at math, those take a second or two to read and solve. Then we started evaluating using AIME [the American Invitational Mathematics Examination, a 15-question high school math contest]. That takes around 10 minutes per problem, with about three hours for 15 problems. The IMO is four and a half hours for just three problems\u2014that\u2019s 90 minutes per problem. ChatGPT started off being good for quick questions. Now it\u2019s better at longer\u2011running tasks, such as \u201cCan you edit this paragraph for me?\u201d As AI improves, you can expand the time horizon of tasks, and you can see that progression clearly in math.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">HSU: Another aspect is that reasoning models were previously very good at tasks that are easy to verify. If you\u2019re solving a non\u2011proof\u2011based math problem, there\u2019s one numerically correct answer. It\u2019s easy to check. But in the real world\u2014and in the tasks people actually want help with\u2014it\u2019s more complex. There\u2019s nuance: maybe it\u2019s mostly correct but has some errors; maybe it\u2019s correct but could be stylized better. Proof\u2011based math isn\u2019t trivial to evaluate. If we think about AGI, those tasks won\u2019t be easy to judge as correct or not; they\u2019ll be more loosely specified and harder overall.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">What was the process for training the model?<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">WEI: In general, reinforcement learning trains a model by rewarding good behavior and penalizing bad behavior. If you repeatedly reinforce good behavior and discourage bad behavior, the model becomes more likely to exhibit the good behavior.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">HSU: Toward the end, we also scaled up test\u2011time compute [how long the AI model was able to \u201cthink\u201d before answering]. Previously, for a human, problems of this sort might be a few minutes; now we were scaling to hours. That extra thinking time gave surprising gains. There was a moment when we ran evaluations on our internal test set that took a long time because of the increased test\u2011time compute. When we finally looked at the results\u2014and Alex graded them\u2014seeing the progress made me think gold might be within reach. That was pretty exciting.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">On the IMO test, the model you developed got five out of six answers correct. But with the sixth question, the model didn\u2019t try to provide an answer. Can you tell me more about the significance of this response?<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">WEI: The model knowing what it doesn\u2019t know was one of the early signs of [progress] we saw. Today if you use ChatGPT, you\u2019ll sometimes see \u201challucinations\u201d\u2014models don\u2019t reliably know when they don\u2019t know. That capability isn\u2019t specific to math. I\u2019d love it if, for everyday questions, the model could honestly say when it doesn\u2019t know instead of giving an answer I must verify independently.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">What kind of impact could your work on this model have on future models?<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">HSU: Everything we did for this project is fairly general\u2011purpose\u2014being able to grade outputs that aren\u2019t single answers and to work on hard problems for a long time while making steady progress. Those contributed a lot to the success here, and now we and others at OpenAI are applying them beyond math. It\u2019s not in GPT\u20115, but in future models, we\u2019re excited to integrate these capabilities.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">WEI: If you look at the solutions we publicly posted for the IMO problems, some are very long\u2014five to 10 pages. This model can generate long outputs that are consistent and coherent, without mistakes. Many current state\u2011of\u2011the\u2011art models can\u2019t produce a totally coherent five\u2011page report. I\u2019m excited that this care and precision will help in many other domains.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A few months before the 2025 International Mathematical Olympiad (IMO) in July, a three-person team at OpenAI made a long bet that they could use the competition\u2019s brutally tough problems to train an artificial intelligence model to think on its own for hours so that it was capable of writing math proofs. Their goal wasn\u2019t<\/p>\n","protected":false},"author":1,"featured_media":17104,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[50],"tags":[5024,1564,8099,4692,10274,1443,531,4693,4029,5080,1430,2879,9496],"class_list":{"0":"post-17103","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-environment","8":"tag-advances","9":"tag-artificial","10":"tag-earns","11":"tag-general","12":"tag-goldmedal","13":"tag-intelligence","14":"tag-international","15":"tag-math","16":"tag-model","17":"tag-olympiad","18":"tag-openai","19":"tag-path","20":"tag-score"},"_links":{"self":[{"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/posts\/17103","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=17103"}],"version-history":[{"count":0,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/posts\/17103\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/media\/17104"}],"wp:attachment":[{"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=17103"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=17103"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=17103"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}