{"id":10590,"date":"2025-07-12T02:50:36","date_gmt":"2025-07-12T02:50:36","guid":{"rendered":"https:\/\/naijaglobalnews.org\/?p=10590"},"modified":"2025-07-12T02:50:36","modified_gmt":"2025-07-12T02:50:36","slug":"elon-musks-new-grok-4-takes-on-humanitys-last-exam-as-the-ai-race-heats-up","status":"publish","type":"post","link":"https:\/\/naijaglobalnews.org\/?p=10590","title":{"rendered":"Elon Musk&#8217;s New Grok 4 Takes on \u2018Humanity\u2019s Last Exam\u2019 as the AI Race Heats Up"},"content":{"rendered":"<p>\n<\/p>\n<p>New Grok 4 Takes on \u2018Humanity\u2019s Last Exam\u2019 as the AI Race Heats Up<\/p>\n<p>Elon Musk has launched xAI\u2019s Grok 4\u2014calling it the \u201cworld\u2019s smartest AI\u201d and claiming it can ace Ph.D.-level exams and outpace rivals such as Google\u2019s Gemini and OpenAI\u2019s o3 on tough benchmarks<\/p>\n<p class=\"article_authors-s5nSV\">By Deni Ellis B\u00e9chard <span class=\"article_editors__links-V04HR\">edited by Dean Visser<\/span><\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">Elon Musk released the newest artificial intelligence model from his company xAI on Wednesday night. In an hour-long public reveal session, he called the model, Grok 4, \u201cthe smartest AI in the world\u201d and claimed it was capable of getting perfect SAT scores and near-perfect GRE results in every subject, from the humanities to the sciences.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">During the online launch, Musk and members of his team described testing Grok 4 on a metric called Humanity\u2019s Last Exam (HLE)\u2014a 2,500-question benchmark designed to evaluate an AI\u2019s academic knowledge and reasoning skill. Created by nearly 1,000 human experts across more than 100 disciplines and released in January 2025, the test spans topics from the classics to quantum chemistry and mixes text with images. Grok 4 reportedly scored 25.4 percent on its own. But given access to tools (such as external aids for code execution or Web searches), it hit 38.6 percent. That jumped to 44.4 percent with a version called Grok 4 Heavy, which uses multiple AI agents to solve problems. The two next best-performing AI models are Google\u2019s Gemini-Pro (which achieved 26.9 percent with the tools) and OpenAI\u2019s o3 model (which got 24.9 percent, also with the tools). The results from xAI\u2019s internal testing have yet to appear on the leaderboard for HLE, however, and it remains unclear whether this is because xAI has yet to submit the results or because those results are pending review. Manifold, a social prediction market platform where users bet play money (called \u201cMana\u201d) on future events in politics, technology and other subjects, predicted a 1 percent chance, as of Friday morning, that Grok 4 would debut on HLE\u2019s leaderboard with a 45 percent score or greater on the exam within a month of its release. (Meanwhile xAI has claimed a score of only 44.4.)<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">During the launch, the xAI team also ran live demonstrations showing Grok 4 crunching baseball odds, determining which xAI employee has the \u201cweirdest\u201d profile picture on X and generating a simulated visualization of a black hole. Musk suggested that the system may discover entirely new technologies by later this year\u2014and possibly \u201cnew physics\u201d by the end of next year. Games and movies are on the horizon, too, with Musk predicting that Grok 4 will be able to make playable titles and watchable films by 2026. Grok 4 also has new audio capabilities, including a voice that sang during the launch, and Musk said new image generation and coding tools are soon to be released. The regular version of Grok 4 costs $30 a month; SuperGrok Heavy\u2014the deluxe package with multiple agents and research tools\u2014runs at $300.<\/p>\n<h2>On supporting science journalism<\/h2>\n<p>If you&#8217;re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">Artificial Analysis, an independent benchmarking platform that ranks AI models, now lists Grok 4 as highest on its Artificial Analysis Intelligence Index, slightly ahead of Gemini 2.5 Pro and OpenAI\u2019s o4-mini-high. And Grok 4 appears as the top-performing publicly available model on the leaderboards for the Abstraction and Reasoning Corpus, or ARC-AGI-1, and its second edition, ARC-AGI-2\u2014benchmarks that measure progress toward \u201chumanlike\u201d general intelligence. Greg Kamradt, president of ARC Prize Foundation, a nonprofit organization that maintains the two leaderboards, says that when the xAI team contacted the foundation with Grok 4\u2019s results, the organization then independently tested Grok 4 on a dataset to which the xAI team did not have access and confirmed the results. \u201cBefore we report performance for any lab, it\u2019s not verified unless we verify it,\u201d Kamradt says. \u201cWe approved the [testing results] slide that [the xAI team] showed in the launch.\u201d<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">According to xAI, Grok 4 also outstrips other AI systems on a number of additional benchmarks that suggest its strength in STEM subjects (read a full breakdown of the benchmarks here). Alex Olteanu, a senior data science editor at AI education platform DataCamp, has tested it. \u201cGrok has been strong on math and programming in my tests, and I\u2019ve been impressed by the quality of its chain-of-thought reasoning, which shows an ingenious and logically sound approach to problem-solving,\u201d Olteanu says. \u201cIts context window, however, isn\u2019t very competitive, and it may struggle with large code bases like those you encounter in production. It also fell short when I asked it to analyze a 170-page PDF, likely due to its limited context window and weak multimodal abilities.\u201d (Multimodal abilities refer to a model\u2019s capacity to analyze more than one kind of data at the same time, such as a combination of text, images, audio and video.)<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">On a more nuanced front, issues with Grok 4 have surfaced since its release. Several posters on X\u2014owned by Musk himself\u2014as well as tech-industry news outlets have reported that when Grok 4 was asked questions about the Israeli-Palestinian conflict, abortion and U.S. immigration law, it often searched for Musk\u2019s stance on these issues by referencing his X posts and articles written about him. And the release of Grok 4 comes after several controversies with Grok 3, the previous model, which issued outputs that included antisemitic comments, praise for Hitler and claims of \u201cwhite genocide\u201d\u2014incidents that xAI publicly acknowledged, attributing them to unauthorized manipulations and stating that the company was implementing corrective measures.<\/p>\n<p class=\"\" data-block=\"sciam\/paragraph\">At one point during the launch, Musk commented on how making an AI smarter than humans is frightening, though he said he believes the ultimate result will be good\u2014probably. \u201cI somewhat reconciled myself to the fact that, even if it wasn\u2019t going to be good, I\u2019d at least like to be alive to see it happen,\u201d he said.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>New Grok 4 Takes on \u2018Humanity\u2019s Last Exam\u2019 as the AI Race Heats Up Elon Musk has launched xAI\u2019s Grok 4\u2014calling it the \u201cworld\u2019s smartest AI\u201d and claiming it can ace Ph.D.-level exams and outpace rivals such as Google\u2019s Gemini and OpenAI\u2019s o3 on tough benchmarks By Deni Ellis B\u00e9chard edited by Dean Visser Elon<\/p>\n","protected":false},"author":1,"featured_media":10591,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[50],"tags":[91,3620,2964,1630,3619,92,2475,1167],"class_list":{"0":"post-10590","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-environment","8":"tag-elon","9":"tag-exam","10":"tag-grok","11":"tag-heats","12":"tag-humanitys","13":"tag-musks","14":"tag-race","15":"tag-takes"},"_links":{"self":[{"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/posts\/10590","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=10590"}],"version-history":[{"count":0,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/posts\/10590\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=\/wp\/v2\/media\/10591"}],"wp:attachment":[{"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=10590"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=10590"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/naijaglobalnews.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=10590"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}