{"id":36186,"date":"2025-12-06T15:55:02","date_gmt":"2025-12-06T15:55:02","guid":{"rendered":"https:\/\/naijaglobalnews.org\/?p=36186"},"modified":"2025-12-06T15:55:02","modified_gmt":"2025-12-06T15:55:02","slug":"deepseeks-self-correcting-ai-model-aces-tough-maths-proofs","status":"publish","type":"post","link":"https:\/\/naijaglobalnews.org\/?p=36186","title":{"rendered":"DeepSeek\u2019s self-correcting AI model aces tough maths proofs"},"content":{"rendered":"<p>\n <\/p>\n<p class=\"figure__caption u-sans-serif\"><span>Credit: Nikolas Kokovlis\/NurPhoto via Getty <\/span><\/p>\n<p>Chinese artificial intelligence company DeepSeek has released a mathematical reasoning model that can identify and correct its own errors. The model beat the best human score in one of the world\u2019s most prestigious undergraduate maths competitions.<\/p>\n<p>The model, DeepSeekMath-V2, scored 118 out of 120 points on questions from the 2024 William Lowell Putnam Mathematical Competition, beating the top human score of 90. The model also performed at the level of gold-medal winners in the International Mathematical Olympiad (IMO) 2025 and the 2024 China Mathematical Olympiad. The results are described in a preprint1 posted on arXiv on 27 November.<\/p>\n<p>\u201cWe are at a point where AI is about as good at maths as a smart undergraduate student,\u201d says Kevin Buzzard, a mathematician at Imperial College London. \u201cIt is very exciting.\u201d <\/p>\n<p>In February, AlphaGeometry 2, an AI problem solver created by Google DeepMind in London, also achieved a gold-level performance in the IMO. The feat was repeated in July by Gemini\u2019s Deep Think, which is owned by DeepMind. <\/p>\n<h2>Reasoning over answers<\/h2>\n<p>Early approaches to training large language models for mathematical reasoning focused on the accuracy of final answers, the preprint authors write. But a correct answer does not guarantee correct reasoning. At times, a correct final answer might just be a result of a fortunate error. Moreover, an exclusive focus on the end result is not useful in proving mathematical laws or formulae, when the logical reasoning is more important than the final answer. <\/p>\n<p>Tong Xie, a chemist specializing in AI-driven discoveries at UNSW Sydney in Australia, says the researchers behind DeepSeek, as well as those developing Gemini\u2019s Deep Think, have been working on overcoming this problem by rewarding reasoning over the final answer. <\/p>\n<p>DeepSeekMath-V2 introduces self-verifiable mathematical reasoning for the first time. The model consists of a verifier trained to evaluate mathematical proofs \u2014 which are built on a series of step-by-step deductions \u2014 to identify logical flaws and assign scores according to how rigorous the proof was. A meta-verification system then checks whether the verifier\u2019s critiques are accurate, reducing the likelihood of hallucinations and improving trustworthiness. These components work with a proof generator that constructs solutions and evaluates its own work, refining arguments until no further issues can be found.<\/p>\n<p>The design creates a feedback loop: the verifier improves the generator, and as the generator produces more-challenging proofs, these become new training data to strengthen the verifier. <\/p>\n<p>The system was able to solve five out of six problems, scoring 83.3%, in the 2025 IMO. It was, however, unable to solve the hardest problems set in 2025 and in past IMOs.<\/p>\n<p>Math-V2 relies on self-verification using natural language in the model itself, Xie says. This reduces human involvement and makes the model more cost-effective and scalable.<\/p>\n<p>Gemini&#8217;s Deep Think, by contrast, verifies mathematical reasoning using an external, symbolic language called Lean, and its verification process requires extensive expert input. The method is nearly free of hallucination, but it is computationally expensive and resource-intensive, Xie says.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Credit: Nikolas Kokovlis\/NurPhoto via Getty Chinese artificial intelligence company DeepSeek has released a mathematical reasoning model that can identify and correct its own errors. The model beat the best human score in one of the world\u2019s most prestigious undergraduate maths competitions. The model, DeepSeekMath-V2, scored 118 out of 120 points on questions from the 2024<\/p>\n","protected":false},"author":1,"featured_media":36187,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[58],"tags":[2708,9352,2935,4029,20259,20258,2545],"class_list":{"0":"post-36186","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-science","8":"tag-aces","9":"tag-deepseeks","10":"tag-maths","11":"tag-model","12":"tag-proofs","13":"tag-selfcorrecting","14":"tag-tough"},"_links":{"self":[{"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/posts\/36186","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=36186"}],"version-history":[{"count":0,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/posts\/36186\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/media\/36187"}],"wp:attachment":[{"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=36186"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=36186"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=36186"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}