Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    What to read this weekend: Two thrilling horror novels in one

    TikTok users will soon be able to send voice notes, images and videos in chats

    Meta is reportedly looking at using competing AI models to improve its apps

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Blue-collar jobs are gaining popularity as AI threatens office work

      August 17, 2025

      Man who asked ChatGPT about cutting out salt from his diet was hospitalized with hallucinations

      August 15, 2025

      What happens when chatbots shape your reality? Concerns are growing online

      August 14, 2025

      Scientists want to prevent AI from going rogue by teaching it to be bad first

      August 8, 2025

      AI models may be accidentally (and secretly) learning each other’s bad behaviors

      July 30, 2025
    • Business

      Why Certified VMware Pros Are Driving the Future of IT

      August 24, 2025

      Murky Panda hackers exploit cloud trust to hack downstream customers

      August 23, 2025

      The rise of sovereign clouds: no data portability, no party

      August 20, 2025

      Israel is reportedly storing millions of Palestinian phone calls on Microsoft servers

      August 6, 2025

      AI site Perplexity uses “stealth tactics” to flout no-crawl edicts, Cloudflare says

      August 5, 2025
    • Crypto

      Chainlink (LINK) Price Uptrend Likely To Reverse as Charts Hint at Exhaustion

      August 31, 2025

      What to Expect From Solana in September

      August 31, 2025

      Bitcoin Risks Deeper Drop Toward $100,000 Amid Whale Rotation Into Ethereum

      August 31, 2025

      3 Altcoins Smart Money Are Buying During Market Pullback

      August 31, 2025

      Solana ETFs Move Closer to Approval as SEC Reviews Amended Filings

      August 31, 2025
    • Technology

      What to read this weekend: Two thrilling horror novels in one

      August 31, 2025

      TikTok users will soon be able to send voice notes, images and videos in chats

      August 31, 2025

      Meta is reportedly looking at using competing AI models to improve its apps

      August 31, 2025

      xAI sues an ex-employee for allegedly stealing trade secrets about Grok

      August 31, 2025

      Meta reportedly allowed unauthorized celebrity AI chatbots on its services

      August 31, 2025
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»A new AI coding challenge just published its first results — and they aren’t pretty
    Technology

    A new AI coding challenge just published its first results — and they aren’t pretty

    TechAiVerseBy TechAiVerseJuly 24, 2025No Comments3 Mins Read2 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    A new AI coding challenge just published its first results — and they aren’t pretty
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    BMI Calculator – Check your Body Mass Index for free!

    A new AI coding challenge just published its first results — and they aren’t pretty

    A new AI coding challenge has revealed its first winner — and set a new bar for AI-powered software engineers. 

    On Wednesday at 5 p.m. PT, the nonprofit Laude Institute announced the first winner of the K Prize, a multi-round AI coding challenge launched by Databricks and Perplexity co-founder Andy Konwinski. The winner was a Brazilian prompt engineer named Eduardo Rocha de Andrade, who will receive $50,000 for the prize. But more surprising than the win was his final score: He won with correct answers to just 7.5% of the questions on the test.

    “We’re glad we built a benchmark that is actually hard,” said Konwinski. “Benchmarks should be hard if they’re going to matter,” he continued, adding: “Scores would be different if the big labs had entered with their biggest models. But that’s kind of the point. K Prize runs offline with limited compute, so it favors smaller and open models. I love that. It levels the playing field.”

    Konwinski has pledged $1 million to the first open source model that can score higher than 90% on the test.

    Similar to the well-known SWE-Bench system, the K Prize tests models against flagged issues from GitHub as a test of how well models can deal with real-world programming problems. But while SWE-Bench is based on a fixed set of problems that models can train against, the K Prize is designed as a “contamination-free version of SWE-Bench,” using a timed entry system to guard against any benchmark-specific training. For round one, models were due by March 12. The K Prize organizers then built the test using only GitHub issues flagged after that date.

    The 7.5% top score stands in marked contrast to SWE-Bench itself, which currently shows a 75% top score on its easier “Verified” test and 34% on its harder “Full” test. Konwinski still isn’t sure whether the disparity is due to contamination on SWE-Bench or just the challenge of collecting new issues from GitHub, but he expects the K Prize project to answer the question soon.

    “As we get more runs of the thing, we’ll have a better sense,” he told TechCrunch, “because we expect people to adapt to the dynamics of competing on this every few months.”

    Techcrunch event

    San Francisco
    |
    October 27-29, 2025

    It might seem like an odd place to fall short, given the wide range of AI coding tools already publicly available — but with benchmarks becoming too easy, many critics see projects like the K Prize as a necessary step toward solving AI’s growing evaluation problem.

    “I’m quite bullish about building new tests for existing benchmarks,” says Princeton researcher Sayash Kapoor, who put forward a similar idea in a recent paper. “Without such experiments, we can’t actually tell if the issue is contamination, or even just targeting the SWE-Bench leaderboard with a human in the loop.”

    For Konwinski, it’s not just a better benchmark, but an open challenge to the rest of the industry. “If you listen to the hype, it’s like we should be seeing AI doctors and AI lawyers and AI software engineers, and that’s just not true,” he says. “If we can’t even get more than 10% on a contamination-free SWE-Bench, that’s the reality check for me.”

    Russell is a freelance writer based in New York.

    View Bio

    BMI Calculator – Check your Body Mass Index for free!

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleXRP Price Crashes 13% in a Day; Two Metrics Say The Freefall Might End Soon
    Next Article Mission Barns is betting that animal-free pork fat will make artificial meat delicious
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    What to read this weekend: Two thrilling horror novels in one

    August 31, 2025

    TikTok users will soon be able to send voice notes, images and videos in chats

    August 31, 2025

    Meta is reportedly looking at using competing AI models to improve its apps

    August 31, 2025
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025168 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 202548 Views

    New Akira ransomware decryptor cracks encryptions keys using GPUs

    March 16, 202530 Views

    Is Libby Compatible With Kobo E-Readers?

    March 31, 202528 Views
    Don't Miss
    Technology August 31, 2025

    What to read this weekend: Two thrilling horror novels in one

    What to read this weekend: Two thrilling horror novels in oneOnce again (or twice, really,…

    TikTok users will soon be able to send voice notes, images and videos in chats

    Meta is reportedly looking at using competing AI models to improve its apps

    xAI sues an ex-employee for allegedly stealing trade secrets about Grok

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    What to read this weekend: Two thrilling horror novels in one

    August 31, 20252 Views

    TikTok users will soon be able to send voice notes, images and videos in chats

    August 31, 20252 Views

    Meta is reportedly looking at using competing AI models to improve its apps

    August 31, 20252 Views
    Most Popular

    Xiaomi 15 Ultra Officially Launched in China, Malaysia launch to follow after global event

    March 12, 20250 Views

    Apple thinks people won’t use MagSafe on iPhone 16e

    March 12, 20250 Views

    French Apex Legends voice cast refuses contracts over “unacceptable” AI clause

    March 12, 20250 Views
    © 2025 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.