Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Sony Malaysia univels WF-1000XM6 for RM1599, RM350 off when you preorder it

    iOS 26.3 now lets you migrate to Android easier

    Samsung Galaxy S26 series to launch on 26 February 2026

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026

      To avoid accusations of AI cheating, college students are turning to AI

      January 29, 2026

      ChatGPT can embrace authoritarian ideas after just one prompt, researchers say

      January 24, 2026
    • Business

      The HDD brand that brought you the 1.8-inch, 2.5-inch, and 3.5-inch hard drives is now back with a $19 pocket-sized personal cloud for your smartphones

      February 12, 2026

      New VoidLink malware framework targets Linux cloud servers

      January 14, 2026

      Nvidia Rubin’s rack-scale encryption signals a turning point for enterprise AI security

      January 13, 2026

      How KPMG is redefining the future of SAP consulting on a global scale

      January 10, 2026

      Top 10 cloud computing stories of 2025

      December 22, 2025
    • Crypto

      How Polymarket Is Turning Bitcoin Volatility Into a Five-Minute Betting Market

      February 13, 2026

      Israel Indicts Two Over Secret Bets on Military Operations via Polymarket

      February 13, 2026

      Binance’s October 10 Defense at Consensus Hong Kong Falls Flat

      February 13, 2026

      Argentina Congress Strips Workers’ Right to Choose Digital Wallet Deposits

      February 13, 2026

      Monero Price Breakdown Begins? Dip Buyers Now Fight XMR’s Drop to $135

      February 13, 2026
    • Technology

      This MacBook Pro has a Touch Bar and is only $410 while stock lasts

      February 13, 2026

      Intel’s tough decision boosted AMD to record highs

      February 13, 2026

      Bundle deal! Ring Battery Doorbell and Outdoor Cam Plus (44% off)

      February 13, 2026

      Microsoft Store goes zero-clutter—through the command line

      February 13, 2026

      How Boll & Branch leverages AI for operational and creative tasks

      February 13, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»New study accuses LM Arena of gaming its popular AI benchmark
    Technology

    New study accuses LM Arena of gaming its popular AI benchmark

    TechAiVerseBy TechAiVerseMay 2, 2025No Comments3 Mins Read1 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    New study accuses LM Arena of gaming its popular AI benchmark
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    New study accuses LM Arena of gaming its popular AI benchmark

    This study also calls out LM Arena for what appears to be much greater promotion of private models like Gemini, ChatGPT, and Claude. Developers collect data on model interactions from the Chatbot Arena API, but teams focusing on open models consistently get the short end of the stick.

    The researchers point out that certain models appear in arena faceoffs much more often, with Google and OpenAI together accounting for over 34 percent of collected model data. Firms like xAI, Meta, and Amazon are also disproportionately represented in the arena. Therefore, those firms get more vibemarking data compared to the makers of open models.

    More models, more evals

    The study authors have a list of suggestions to make LM Arena more fair. Several of the paper’s recommendations are aimed at correcting the imbalance of privately tested commercial models, for example, by limiting the number of models a group can add and retract before releasing one. The study also suggests showing all model results, even if they aren’t final.

    However, the site’s operators take issue with some of the paper’s methodology and conclusions. LM Arena points out that the pre-release testing features have not been kept secret, with a March 2024 blog post featuring a brief explanation of the system. They also contend that model creators don’t technically choose the version that is shown. Instead, the site simply doesn’t show non-public versions for simplicity’s sake. When a developer releases the final version, that’s what LM Arena adds to the leaderboard.

    Proprietary models get disproportionate attention in the Chatbot Arena, the study says.

    Credit:
    Shivalika Singh et al.

    Proprietary models get disproportionate attention in the Chatbot Arena, the study says.


    Credit:

    Shivalika Singh et al.

    One place the two sides may find alignment is on the question of unequal matchups. The study authors call for fair sampling, which will ensure open models appear in Chatbot Arena at a rate similar to the likes of Gemini and ChatGPT. LM Arena has suggested it will work to make the sampling algorithm more varied so you don’t always get the big commercial models. That would send more eval data to small players, giving them the chance to improve and challenge the big commercial models.

    LM Arena recently announced it was forming a corporate entity to continue its work. With money on the table, the operators need to ensure Chatbot Arena continues figuring into the development of popular models. However, it’s unclear whether this is an objectively better way to evaluate chatbots versus academic tests. As people vote on vibes, there’s a real possibility we are pushing models to adopt sycophantic tendencies. This may have helped nudge ChatGPT into suck-up territory in recent weeks, a move that OpenAI has hastily reverted after widespread anger.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleDon’t watermark your legal PDFs with purple dragons in suits
    Next Article Why MFA is getting easer to bypass and what to do about it
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    This MacBook Pro has a Touch Bar and is only $410 while stock lasts

    February 13, 2026

    Intel’s tough decision boosted AMD to record highs

    February 13, 2026

    Bundle deal! Ring Battery Doorbell and Outdoor Cam Plus (44% off)

    February 13, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025669 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025258 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025153 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025111 Views
    Don't Miss
    Gadgets February 13, 2026

    Sony Malaysia univels WF-1000XM6 for RM1599, RM350 off when you preorder it

    Sony Malaysia univels WF-1000XM6 for RM1599, RM350 off when you preorder it Sony Malaysia has…

    iOS 26.3 now lets you migrate to Android easier

    Samsung Galaxy S26 series to launch on 26 February 2026

    This MacBook Pro has a Touch Bar and is only $410 while stock lasts

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Sony Malaysia univels WF-1000XM6 for RM1599, RM350 off when you preorder it

    February 13, 20261 Views

    iOS 26.3 now lets you migrate to Android easier

    February 13, 20264 Views

    Samsung Galaxy S26 series to launch on 26 February 2026

    February 13, 20263 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    This new Roomba finally solves the big problem I have with robot vacuums

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.