Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Fall in NHS waiting lists is not a Labour win | NHS

    The Guardian view on US military justice in Britain: a disturbing assault case should raise the alarm | Editorial

    Feeling bored and disconnected from your job? You may be facing workplace ‘rust-out’ | Gene Marks

    Facebook X (Twitter) Instagram
    Facebook X (Twitter) YouTube LinkedIn
    Naija Global News |
    Sunday, June 28
    • Business
    • Health
    • Politics
    • Science
    • Sports
    • Education
    • Social Issues
    • Technology
    • More
      • Crime & Justice
      • Environment
      • Entertainment
    Naija Global News |
    You are at:Home»Technology»A new AI coding challenge just published its first results – and they aren’t pretty
    Technology

    A new AI coding challenge just published its first results – and they aren’t pretty

    onlyplanz_80y6mtBy onlyplanz_80y6mtJuly 24, 2025003 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Email
    Blue code on a dark background presented at an angle.
    Image Credits:Sashkinw / Getty Images
    Share
    Facebook Twitter LinkedIn Pinterest Email

    A new AI coding challenge has revealed its first winner — and set a new bar for AI-powered software engineers. 

    On Wednesday at 5pm PST, the nonprofit Laude Institute announced the first winner of the K Prize, a multi-round AI coding challenge launched by Databricks and Perplexity co-founder Andy Konwinski. The winner was a Brazilian prompt engineer named Eduardo Rocha de Andrade, who will receive $50,000 for the prize. But more surprising than the win was his final score: he won with correct answers to just 7.5% of the questions on the test.

    “We’re glad we built a benchmark that is actually hard,” said Konwinski. “Benchmarks should be hard if they’re going to matter,” he continued, adding: “Scores would be different if the big labs had entered with their biggest models. But that’s kind of the point. K Prize runs offline with limited compute, so it favors smaller and open models. I love that. It levels the playing field.”

    Konwinski has pledged $1 million to the first open-source model that can score higher than 90% on the test.

    Similar to the well-known SWE-Bench system, the K Prize tests models against flagged issues from GitHub as a test of how well models can deal with real-world programming problems. But while SWE-Bench is based on a fixed set of problems that models can train against, the K Prize is designed as a “contamination-free version of SWE-Bench,” using a timed entry system to guard against any benchmark-specific training. For round one, models were due by March 12th. The K Prize organizers then built the test using only GitHub issues flagged after that date.

    The 7.5% top score stands in marked contrast to SWE-Bench itself, which currently shows a 75% top score on its easier ‘Verified’ test and 34% on its harder ‘Full’ test. Konwinski still isn’t sure whether the disparity is due to contamination on SWE-Bench or just the challenge of collecting new issues from GitHub, but he expects the K Prize project to answer the question soon.

    “As we get more runs of the thing, we’ll have a better sense,” he told TechCrunch, “because we expect people to adapt to the dynamics of competing on this every few months.”

    Techcrunch event

    San Francisco
    |
    October 27-29, 2025

    It might seem like an odd place to fall short, given the wide range of AI coding tools already publicly available – but with benchmarks becoming too easy, many critics see projects like the K Prize as a necessary step toward solving AI’s growing evaluation problem.

    “I’m quite bullish about building new tests for existing benchmarks,” says Princeton researcher Sayash Kapoor, who put forward a similar idea in a recent paper. “Without such experiments, we can’t actually tell if the issue is contamination, or even just targeting the SWE-Bench leaderboard with a human in the loop.”

    For Konwinski, it’s not just a better benchmark, but an open challenge to the rest of the industry. “If you listen to the hype, it’s like we should be seeing AI doctors and AI lawyers and AI software engineers, and that’s just not true,” he says. “If we can’t even get more than 10% on a contamination free SWE-Bench, that’s the reality check for me.”

    Arent challenge coding pretty published results
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleCovid, social media, Black Lives Matter: Ari Aster’s Eddington takes 2020 on and mostly succeeds | Ari Aster
    Next Article Coca-Cola to launch Coke with cane sugar in the US after Trump post | Coca-Cola
    onlyplanz_80y6mt
    • Website

    Related Posts

    Key Race Results in New York, Maryland, South Carolina and Utah

    June 23, 2026

    Live 2026 Election Results: Georgia, Alabama and Oklahoma Primary and Runoff Races

    June 16, 2026

    Published in error – The New York Times

    June 12, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    The science influencers going viral on TikTok to fight misinformation

    February 17, 20262 Views

    Watch Lady Gaga’s Perform ‘Vanish Into You’ on ‘Colbert’

    September 9, 20251 Views

    Advertisers flock to Fox seeking an ‘audience of one’ — Donald Trump

    July 13, 20251 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    At Chile’s Vera Rubin Observatory, Earth’s Largest Camera Surveys the Sky

    By onlyplanz_80y6mtJune 19, 2025

    SpaceX Starship Explodes Before Test Fire

    By onlyplanz_80y6mtJune 19, 2025

    How the L.A. Port got hit by Trump’s Tariffs

    By onlyplanz_80y6mtJune 19, 2025

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    The science influencers going viral on TikTok to fight misinformation

    February 17, 20262 Views

    Watch Lady Gaga’s Perform ‘Vanish Into You’ on ‘Colbert’

    September 9, 20251 Views

    Advertisers flock to Fox seeking an ‘audience of one’ — Donald Trump

    July 13, 20251 Views
    Our Picks

    Fall in NHS waiting lists is not a Labour win | NHS

    The Guardian view on US military justice in Britain: a disturbing assault case should raise the alarm | Editorial

    Feeling bored and disconnected from your job? You may be facing workplace ‘rust-out’ | Gene Marks

    Recent Posts
    • Fall in NHS waiting lists is not a Labour win | NHS
    • The Guardian view on US military justice in Britain: a disturbing assault case should raise the alarm | Editorial
    • Feeling bored and disconnected from your job? You may be facing workplace ‘rust-out’ | Gene Marks
    • Do you need electrolytes? Will tea cool you down? Is it safe to drink beer? How to stay hydrated in a heatwave | Health
    • When it comes to taxing the super rich, there’s no need to reinvent the wheel | US income inequality
    © 2026 naijaglobalnews. Designed by Pro.
    • About Us
    • Disclaimer
    • Get In Touch
    • Privacy Policy
    • Terms and Conditions

    Type above and press Enter to search. Press Esc to cancel.