Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Media Buying Briefing: Attivo breathes new life into Hill Holliday and DNY with senior media hires

    Why brands are shifting toward ‘less precise, more accurate’ gauges for paid social

    WTF is Markdown for AI agents? 

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026

      To avoid accusations of AI cheating, college students are turning to AI

      January 29, 2026

      ChatGPT can embrace authoritarian ideas after just one prompt, researchers say

      January 24, 2026
    • Business

      The HDD brand that brought you the 1.8-inch, 2.5-inch, and 3.5-inch hard drives is now back with a $19 pocket-sized personal cloud for your smartphones

      February 12, 2026

      New VoidLink malware framework targets Linux cloud servers

      January 14, 2026

      Nvidia Rubin’s rack-scale encryption signals a turning point for enterprise AI security

      January 13, 2026

      How KPMG is redefining the future of SAP consulting on a global scale

      January 10, 2026

      Top 10 cloud computing stories of 2025

      December 22, 2025
    • Crypto

      Binance Denies Sanctions Breach Claims After $1 Billion Iran-Linked USDT Transactions Reported

      February 16, 2026

      Ray Dalio Says the World Order Has Broken Down: What Does It Mean for Crypto?

      February 16, 2026

      Cardano Whales are Trying to Rescue ADA Price

      February 16, 2026

      MYX Finance Lost 70% In a Week: What Triggered the Sharp Sell-Off?

      February 16, 2026

      What Really Happened Between Binance and FTX? CZ Finally Tells His Side

      February 16, 2026
    • Technology

      Media Buying Briefing: Attivo breathes new life into Hill Holliday and DNY with senior media hires

      February 16, 2026

      Why brands are shifting toward ‘less precise, more accurate’ gauges for paid social

      February 16, 2026

      WTF is Markdown for AI agents? 

      February 16, 2026

      ‘Being very careful’: Weeks after unveiling ad plans, OpenAI works to control the message

      February 16, 2026

      Hideki Sato, known as the father of Sega hardware, has reportedly died

      February 16, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»Meta got caught gaming AI benchmarks
    Technology

    Meta got caught gaming AI benchmarks

    TechAiVerseBy TechAiVerseApril 8, 20252 Comments4 Mins Read8 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Meta got caught gaming AI benchmarks
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    Meta got caught gaming AI benchmarks

    Over the weekend, Meta dropped two new Llama 4 models: a smaller model named Scout, and Maverick, a mid-size model that the company claims can beat GPT-4o and Gemini 2.0 Flash “across a broad range of widely reported benchmarks.”

    Maverick quickly secured the number-two spot on LMArena, the AI benchmark site where humans compare outputs from different systems and vote on the best one. In Meta’s press release, the company highlighted Maverick’s ELO score of 1417, which placed it above OpenAI’s 4o and just under Gemini 2.5 Pro. (A higher ELO score means the model wins more often in the arena when going head-to-head with competitors.)

    The achievement seemed to position Meta’s open-weight Llama 4 as a serious challenger to the state-of-the-art, closed models from OpenAI, Anthropic, and Google. Then, AI researchers digging through Meta’s documentation discovered something unusual.

    In fine print, Meta acknowledges that the version of Maverick tested on LMArena isn’t the same as what’s available to the public. According to Meta’s own materials, it deployed an “experimental chat version” of Maverick to LMArena that was specifically “optimized for conversationality,” TechCrunch first reported.

    “Meta’s interpretation of our policy did not match what we expect from model providers,” LMArena posted on X two days after the model’s release. “Meta should have made it clearer that ‘Llama-4-Maverick-03-26-Experimental’ was a customized model to optimize for human preference. As a result of that, we are updating our leaderboard policies to reinforce our commitment to fair, reproducible evaluations so this confusion doesn’t occur in the future.“

    A spokesperson for Meta, Ashley Gabriel, said in an emailed statement that “we experiment with all types of custom variants.”

    “‘Llama-4-Maverick-03-26-Experimental’ is a chat optimized version we experimented with that also performs well on LMArena,” Gabriel said. “We have now released our open source version and will see how developers customize Llama 4 for their own use cases. We’re excited to see what they will build and look forward to their ongoing feedback.”

    While what Meta did with Maverick isn’t explicitly against LMArena’s rules, the site has shared concerns about gaming the system and taken steps to “prevent overfitting and benchmark leakage.” When companies can submit specially-tuned versions of their models for testing while releasing different versions to the public, benchmark rankings like LMArena become less meaningful as indicators of real-world performance.

    ”It’s the most widely respected general benchmark because all of the other ones suck,” independent AI researcher Simon Willison tells The Verge. “When Llama 4 came out, the fact that it came second in the arena, just after Gemini 2.5 Pro — that really impressed me, and I’m kicking myself for not reading the small print.”

    Shortly after Meta released Maverick and Scout, the AI community started talking about a rumor that Meta had also trained its Llama 4 models to perform better on benchmarks while hiding their real limitations. VP of generative AI at Meta, Ahmad Al-Dahle, addressed the accusations in a post on X: “We’ve also heard claims that we trained on test sets — that’s simply not true and we would never do that. Our best understanding is that the variable quality people are seeing is due to needing to stabilize implementations.”

    “It’s a very confusing release generally.”

    Some also noticed that Llama 4 was released at an odd time. Saturday doesn’t tend to be when big AI news drops. After someone on Threads asked why Llama 4 was released over the weekend, Meta CEO Mark Zuckerberg replied: “That’s when it was ready.”

    “It’s a very confusing release generally,” says Willison, who closely follows and documents AI models. “The model score that we got there is completely worthless to me. I can’t even use the model that they got a high score on.”

    Meta’s path to releasing Llama 4 wasn’t exactly smooth. According to a recent report from The Information, the company repeatedly pushed back the launch due to the model failing to meet internal expectations. Those expectations are especially high after DeepSeek, an open-source AI startup from China, released an open-weight model that generated a ton of buzz.

    Ultimately, using an optimized model in LMArena puts developers in a difficult position. When selecting models like Llama 4 for their applications, they naturally look to benchmarks for guidance. But as is the case for Maverick, those benchmarks can reflect capabilities that aren’t actually available in the models that the public can access.

    As AI development accelerates, this episode shows how benchmarks are becoming battlegrounds. It also shows how Meta is eager to be seen as an AI leader, even if that means gaming the system.

    Update, April 7th: The story was updated to add Meta’s statement.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleThe best budget robot vacuums
    Next Article Google fixes Android zero-days exploited in attacks, 60 other flaws
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Media Buying Briefing: Attivo breathes new life into Hill Holliday and DNY with senior media hires

    February 16, 2026

    Why brands are shifting toward ‘less precise, more accurate’ gauges for paid social

    February 16, 2026

    WTF is Markdown for AI agents? 

    February 16, 2026

    2 Comments

    1. Learn How to Read Piano Music on July 2, 2025 7:51 pm

      Jazz players can add these further, higher notes because
      they’ll create an vital a part of the jazz sound.
      To the extent that an skilled listener can detect the differences in reproduction,
      the tracks can be a little bit variable, however as a result of there isn’t any chronological structure or
      clear division of ADD and DDD tracks, it is tough to judge the worth of the set on its sound high quality.
      Your course will likely be in there. Here, you’ll cowl all the important aspects of the piano, and while there isn’t a
      lot reference to musical theory, the consistency of the lessons delivered
      and the content material within them means this course lives
      as much as its title. Together, with the design of “Terry”, they had been seen by critics as a
      reference to Osvaldo Cavandoli’s 1971 Italian animated sequence La Linea.

      The time period ‘pianoforte’ is a mixture of two Italian phrases, ‘piano’ (mushy) and ‘forte’ (loud), that
      means depending on how a lot power is utilized to the keys,
      the instrument’s dynamic range might be anyplace from very smooth to very
      loud. But at present we’re getting right down to the distinctive stuff: that is a
      list of one of the best items ever written for
      piano (no questions requested).

      Reply
      • TechAiVerse on July 2, 2025 9:11 pm

        Thank you so much for taking the time to share your thoughtful input — it really means a lot coming from someone with your level of experience and understanding. We deeply value your perspective and are always grateful for insights that help us grow and improve. Your support and encouragement truly make a difference!

        Reply
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025679 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025260 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025154 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025112 Views
    Don't Miss
    Technology February 16, 2026

    Media Buying Briefing: Attivo breathes new life into Hill Holliday and DNY with senior media hires

    Media Buying Briefing: Attivo breathes new life into Hill Holliday and DNY with senior media…

    Why brands are shifting toward ‘less precise, more accurate’ gauges for paid social

    WTF is Markdown for AI agents? 

    ‘Being very careful’: Weeks after unveiling ad plans, OpenAI works to control the message

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Media Buying Briefing: Attivo breathes new life into Hill Holliday and DNY with senior media hires

    February 16, 20263 Views

    Why brands are shifting toward ‘less precise, more accurate’ gauges for paid social

    February 16, 20263 Views

    WTF is Markdown for AI agents? 

    February 16, 20263 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    This new Roomba finally solves the big problem I have with robot vacuums

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.