Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Rapid spread of AI may worsen global inequality, UN warns | AI (artificial intelligence)

    KNDS postpones IPO after investors balk at €12bn-plus valuation

    US-UK drug deal could result in 229,000 excess deaths in England, analysis suggests | NHS

    Facebook X (Twitter) Instagram
    Facebook X (Twitter) YouTube LinkedIn
    Naija Global News |
    Thursday, July 2
    • Business
    • Health
    • Politics
    • Science
    • Sports
    • Education
    • Social Issues
    • Technology
    • More
      • Crime & Justice
      • Environment
      • Entertainment
    Naija Global News |
    You are at:Home»Environment»Elon Musk’s New Grok 4 Takes on ‘Humanity’s Last Exam’ as the AI Race Heats Up
    Environment

    Elon Musk’s New Grok 4 Takes on ‘Humanity’s Last Exam’ as the AI Race Heats Up

    onlyplanz_80y6mtBy onlyplanz_80y6mtJuly 12, 2025005 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Email
    Elon Musk's New Grok 4 Takes on ‘Humanity’s Last Exam’ as the AI Race Heats Up

    Floriana/Getty Images

    Share
    Facebook Twitter LinkedIn Pinterest Email

    New Grok 4 Takes on ‘Humanity’s Last Exam’ as the AI Race Heats Up

    Elon Musk has launched xAI’s Grok 4—calling it the “world’s smartest AI” and claiming it can ace Ph.D.-level exams and outpace rivals such as Google’s Gemini and OpenAI’s o3 on tough benchmarks

    By Deni Ellis Béchard edited by Dean Visser

    Elon Musk released the newest artificial intelligence model from his company xAI on Wednesday night. In an hour-long public reveal session, he called the model, Grok 4, “the smartest AI in the world” and claimed it was capable of getting perfect SAT scores and near-perfect GRE results in every subject, from the humanities to the sciences.

    During the online launch, Musk and members of his team described testing Grok 4 on a metric called Humanity’s Last Exam (HLE)—a 2,500-question benchmark designed to evaluate an AI’s academic knowledge and reasoning skill. Created by nearly 1,000 human experts across more than 100 disciplines and released in January 2025, the test spans topics from the classics to quantum chemistry and mixes text with images. Grok 4 reportedly scored 25.4 percent on its own. But given access to tools (such as external aids for code execution or Web searches), it hit 38.6 percent. That jumped to 44.4 percent with a version called Grok 4 Heavy, which uses multiple AI agents to solve problems. The two next best-performing AI models are Google’s Gemini-Pro (which achieved 26.9 percent with the tools) and OpenAI’s o3 model (which got 24.9 percent, also with the tools). The results from xAI’s internal testing have yet to appear on the leaderboard for HLE, however, and it remains unclear whether this is because xAI has yet to submit the results or because those results are pending review. Manifold, a social prediction market platform where users bet play money (called “Mana”) on future events in politics, technology and other subjects, predicted a 1 percent chance, as of Friday morning, that Grok 4 would debut on HLE’s leaderboard with a 45 percent score or greater on the exam within a month of its release. (Meanwhile xAI has claimed a score of only 44.4.)

    During the launch, the xAI team also ran live demonstrations showing Grok 4 crunching baseball odds, determining which xAI employee has the “weirdest” profile picture on X and generating a simulated visualization of a black hole. Musk suggested that the system may discover entirely new technologies by later this year—and possibly “new physics” by the end of next year. Games and movies are on the horizon, too, with Musk predicting that Grok 4 will be able to make playable titles and watchable films by 2026. Grok 4 also has new audio capabilities, including a voice that sang during the launch, and Musk said new image generation and coding tools are soon to be released. The regular version of Grok 4 costs $30 a month; SuperGrok Heavy—the deluxe package with multiple agents and research tools—runs at $300.

    On supporting science journalism

    If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

    Artificial Analysis, an independent benchmarking platform that ranks AI models, now lists Grok 4 as highest on its Artificial Analysis Intelligence Index, slightly ahead of Gemini 2.5 Pro and OpenAI’s o4-mini-high. And Grok 4 appears as the top-performing publicly available model on the leaderboards for the Abstraction and Reasoning Corpus, or ARC-AGI-1, and its second edition, ARC-AGI-2—benchmarks that measure progress toward “humanlike” general intelligence. Greg Kamradt, president of ARC Prize Foundation, a nonprofit organization that maintains the two leaderboards, says that when the xAI team contacted the foundation with Grok 4’s results, the organization then independently tested Grok 4 on a dataset to which the xAI team did not have access and confirmed the results. “Before we report performance for any lab, it’s not verified unless we verify it,” Kamradt says. “We approved the [testing results] slide that [the xAI team] showed in the launch.”

    According to xAI, Grok 4 also outstrips other AI systems on a number of additional benchmarks that suggest its strength in STEM subjects (read a full breakdown of the benchmarks here). Alex Olteanu, a senior data science editor at AI education platform DataCamp, has tested it. “Grok has been strong on math and programming in my tests, and I’ve been impressed by the quality of its chain-of-thought reasoning, which shows an ingenious and logically sound approach to problem-solving,” Olteanu says. “Its context window, however, isn’t very competitive, and it may struggle with large code bases like those you encounter in production. It also fell short when I asked it to analyze a 170-page PDF, likely due to its limited context window and weak multimodal abilities.” (Multimodal abilities refer to a model’s capacity to analyze more than one kind of data at the same time, such as a combination of text, images, audio and video.)

    On a more nuanced front, issues with Grok 4 have surfaced since its release. Several posters on X—owned by Musk himself—as well as tech-industry news outlets have reported that when Grok 4 was asked questions about the Israeli-Palestinian conflict, abortion and U.S. immigration law, it often searched for Musk’s stance on these issues by referencing his X posts and articles written about him. And the release of Grok 4 comes after several controversies with Grok 3, the previous model, which issued outputs that included antisemitic comments, praise for Hitler and claims of “white genocide”—incidents that xAI publicly acknowledged, attributing them to unauthorized manipulations and stating that the company was implementing corrective measures.

    At one point during the launch, Musk commented on how making an AI smarter than humans is frightening, though he said he believes the ultimate result will be good—probably. “I somewhat reconciled myself to the fact that, even if it wasn’t going to be good, I’d at least like to be alive to see it happen,” he said.

    Elon Exam Grok heats Humanitys Musks Race Takes
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleKatie Taylor v Amanda Serrano III: undisputed women’s junior welterweight championship – live updates | Boxing
    Next Article Rayner ’will not be pushed around’ by Unite after union votes to suspend her | Labour
    onlyplanz_80y6mt
    • Website

    Related Posts

    What is China’s SpaceSail, and could it rival Elon Musk’s Starlink? | China

    June 27, 2026

    Elon Musk loses trillionaire status as SpaceX and Tesla stock drops | Elon Musk

    June 24, 2026

    Key Race Results in New York, Maryland, South Carolina and Utah

    June 23, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    The science influencers going viral on TikTok to fight misinformation

    February 17, 20262 Views

    Watch Lady Gaga’s Perform ‘Vanish Into You’ on ‘Colbert’

    September 9, 20251 Views

    Advertisers flock to Fox seeking an ‘audience of one’ — Donald Trump

    July 13, 20251 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    At Chile’s Vera Rubin Observatory, Earth’s Largest Camera Surveys the Sky

    By onlyplanz_80y6mtJune 19, 2025

    SpaceX Starship Explodes Before Test Fire

    By onlyplanz_80y6mtJune 19, 2025

    How the L.A. Port got hit by Trump’s Tariffs

    By onlyplanz_80y6mtJune 19, 2025

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    The science influencers going viral on TikTok to fight misinformation

    February 17, 20262 Views

    Watch Lady Gaga’s Perform ‘Vanish Into You’ on ‘Colbert’

    September 9, 20251 Views

    Advertisers flock to Fox seeking an ‘audience of one’ — Donald Trump

    July 13, 20251 Views
    Our Picks

    Rapid spread of AI may worsen global inequality, UN warns | AI (artificial intelligence)

    KNDS postpones IPO after investors balk at €12bn-plus valuation

    US-UK drug deal could result in 229,000 excess deaths in England, analysis suggests | NHS

    Recent Posts
    • Rapid spread of AI may worsen global inequality, UN warns | AI (artificial intelligence)
    • KNDS postpones IPO after investors balk at €12bn-plus valuation
    • US-UK drug deal could result in 229,000 excess deaths in England, analysis suggests | NHS
    • The supreme court has again undermined the power of Congress | Moira Donegan
    • Trump refuses to renew US-Canada-Mexico trade pact he once championed | Donald Trump
    © 2026 naijaglobalnews. Designed by Pro.
    • About Us
    • Disclaimer
    • Get In Touch
    • Privacy Policy
    • Terms and Conditions

    Type above and press Enter to search. Press Esc to cancel.