Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Resident Evil Requiem DLC and Resident Evil 10 release dates may be sooner than expected

    Poco Pad X1: Destroys the iPad

    Epic Games Store follows award winners with quieter free games lineup for late February 2026

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026

      To avoid accusations of AI cheating, college students are turning to AI

      January 29, 2026

      ChatGPT can embrace authoritarian ideas after just one prompt, researchers say

      January 24, 2026
    • Business

      The HDD brand that brought you the 1.8-inch, 2.5-inch, and 3.5-inch hard drives is now back with a $19 pocket-sized personal cloud for your smartphones

      February 12, 2026

      New VoidLink malware framework targets Linux cloud servers

      January 14, 2026

      Nvidia Rubin’s rack-scale encryption signals a turning point for enterprise AI security

      January 13, 2026

      How KPMG is redefining the future of SAP consulting on a global scale

      January 10, 2026

      Top 10 cloud computing stories of 2025

      December 22, 2025
    • Crypto

      US Investors Might Be Leaving Bitcoin and Ethereum ETFs for International Markets

      February 14, 2026

      Binance France President Targeted in Armed Kidnapping Attempt

      February 14, 2026

      Binance Fires Investigators as $1 Billion Iran-Linked USDT Flows Surface

      February 14, 2026

      Aave Proposes 100% DAO Revenue Model, Yet Price Remains Under Pressure

      February 14, 2026

      A $3 Billion Credit Giant Is Testing Bitcoin in the Mortgage System — Here’s How

      February 14, 2026
    • Technology

      Resident Evil Requiem DLC and Resident Evil 10 release dates may be sooner than expected

      February 14, 2026

      Poco Pad X1: Destroys the iPad

      February 14, 2026

      Epic Games Store follows award winners with quieter free games lineup for late February 2026

      February 14, 2026

      OnePlus releases new February 2026 OxygenOS update with improved AI Eraser, new video editing tools, updated AI Writer, and more

      February 14, 2026

      Sony relaunches WH-1000XM6 over-ear wireless headphones with new version

      February 14, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»This is the most misunderstood graph in AI
    Technology

    This is the most misunderstood graph in AI

    TechAiVerseBy TechAiVerseFebruary 5, 2026No Comments10 Mins Read0 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    This is the most misunderstood graph in AI
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    This is the most misunderstood graph in AI

    MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

    Every time OpenAI, Google, or Anthropic drops a new frontier large language model, the AI community holds its breath. It doesn’t exhale until METR, an AI research nonprofit whose name stands for “Model Evaluation & Threat Research,” updates a now-iconic graph that has played a major role in the AI discourse since it was first released in March of last year. The graph suggests that certain AI capabilities are developing at an exponential rate, and more recent model releases have outperformed that already impressive trend.

    That was certainly the case for Claude Opus 4.5, the latest version of Anthropic’s most powerful model, which was released in late November. In December, METR announced that Opus 4.5 appeared to be capable of independently completing a task that would have taken a human about five hours—a vast improvement over what even the exponential trend would have predicted. One Anthropic safety researcher tweeted that he would change the direction of his research in light of those results; another employee at the company simply wrote, “mom come pick me up i’m scared.”

    Credit: METR.ORG

    But the truth is more complicated than those dramatic responses would suggest. For one thing, METR’s estimates of the abilities of specific models come with substantial error bars. As METR explicitly stated on X, Opus 4.5 might be able to regularly complete only tasks that take humans about two hours, or it might succeed on tasks that take humans as long as 20 hours. Given the uncertainties intrinsic to the method, it was impossible to know for sure. 

    “There are a bunch of ways that people are reading too much into the graph,” says Sydney Von Arx, a member of METR’s technical staff.

    More fundamentally, the METR plot does not measure AI abilities writ large, nor does it claim to. In order to build the graph, METR tests the models primarily on coding tasks, evaluating the difficulty of each by measuring or estimating how long it takes humans to complete it—a metric that not everyone accepts. Claude Opus 4.5 might be able to complete certain tasks that take humans five hours, but that doesn’t mean it’s anywhere close to replacing a human worker.

    METR was founded to assess the risks posed by frontier AI systems. Though it is best known for the exponential trend plot, it has also worked with AI companies to evaluate their systems in greater detail and published several other independent research projects, including a widely covered July 2025 study suggesting that AI coding assistants might actually be slowing software engineers down. 

    But the exponential plot has made METR’s reputation, and the organization appears to have a complicated relationship with that graph’s often breathless reception. In January, Thomas Kwa, one of the lead authors on the paper that introduced it, wrote a blog post responding to some criticisms and making clear its limitations, and METR is currently working on a more extensive FAQ document. But Kwa isn’t optimistic that these efforts will meaningfully shift the discourse. “I think the hype machine will basically, whatever we do, just strip out all the caveats,” he says.

    Nevertheless, the METR team does think that the plot has something meaningful to say about the trajectory of AI progress. “You should absolutely not tie your life to this graph,” says Von Arx. “But also,” she adds, “I bet that this trend is gonna hold.”

    Part of the trouble with the METR plot is that it’s quite a bit more complicated than it looks. The x-axis is simple enough: It tracks the date when each model was released. But the y-axis is where things get tricky. It records each model’s “time horizon,” an unusual metric that METR created—and that, according to Kwa and Von Arx, is frequently misunderstood.

    To understand exactly what model time horizons are, it helps to know all the work that METR put into calculating them. First, the METR team assembled a collection of tasks ranging from quick multiple-choice questions to detailed coding challenges—all of which were somehow relevant to software engineering. Then they had human coders attempt most of those tasks and evaluated how long it took them to finish. In this way, they assigned the tasks a human baseline time. Some tasks took the experts mere seconds, whereas others required several hours.

    When METR tested large language models on the task suite, they found that advanced models could complete the fast tasks with ease—but as the models attempted tasks that had taken humans more and more time to finish, their accuracy started to fall off. From a model’s performance, the researchers calculated the point on the time scale of human tasks at which the model would complete about 50% of the tasks successfully. That point is the model’s time horizon. 

    All that detail is in the blog post and the academic paper that METR released along with the original time horizon plot. But the METR plot is frequently passed around on social media without this context, and so the true meaning of the time horizon metric can get lost in the shuffle. One common misapprehension is that the numbers on the plot’s y-axis—around five hours for Claude Opus 4.5, for example—represent the length of time that the models can operate independently. They do not. They represent how long it takes humans to complete tasks that a model can successfully perform.  Kwa has seen this error so frequently that he made a point of correcting it at the very top of his recent blog post, and when asked what information he would add to the versions of the plot circulating online, he said he would include the word “human” whenever the task completion time was mentioned.

    As complex and widely misinterpreted as the time horizon concept might be, it does make some basic sense: A model with a one-hour time horizon could automate some modest portions of a software engineer’s job, whereas a model with a 40-hour horizon could potentially complete days of work on its own. But some experts question whether the amount of time that humans take on tasks is an effective metric for quantifying AI capabilities. “I don’t think it’s necessarily a given fact that because something takes longer, it’s going to be a harder task,” says Inioluwa Deborah Raji, a PhD student at UC Berkeley who studies model evaluation. 

    Von Arx says that she, too, was originally skeptical that time horizon was the right measure to use. What convinced her was seeing the results of her and her colleagues’ analysis. When they calculated the 50% time horizon for all the major models available in early 2025 and then plotted each of them on the graph, they saw that the time horizons for the top-tier models were increasing over time—and, moreover, that the rate of advancement was speeding up. Every seven-ish months, the time horizon doubled, which means that the most advanced models could complete tasks that took humans nine seconds in mid 2020, 4 minutes in early 2023, and 40 minutes in late 2024. “I can do all the theorizing I want about whether or not it makes sense, but the trend is there,” Von Arx says.

    It’s this dramatic pattern that made the METR plot such a blockbuster. Many people learned about it when they read AI 2027, a viral sci-fi story cum quantitative forecast positing that superintelligent AI could wipe out humanity by 2030. The writers of AI 2027 based some of their predictions on the METR plot and cited it extensively. In Von Arx’s words, “It’s a little weird when the way lots of people are familiar with your work is this pretty opinionated interpretation.”

    Of course, plenty of people invoke the METR plot without imagining large-scale death and destruction. For some AI boosters, the exponential trend indicates that AI will soon usher in an era of radical economic growth. The venture capital firm Sequoia Capital, for example, recently put out a post titled “2026: This is AGI,” which used the METR plot to argue that AI that can act as an employee or contractor will soon arrive. “The provocation really was like, ‘What will you do when your plans are measured in centuries?’” says Sonya Huang, a general partner at Sequoia and one of the post’s authors. 

    Just because a model achieves a one-hour time horizon on the METR plot, however, doesn’t mean that it can replace one hour of human work in the real world. For one thing, the tasks on which the models are evaluated don’t reflect the complexities and confusion of real-world work. In their original study, Kwa, Von Arx, and their colleagues quantify what they call the “messiness” of each task according to criteria such as whether the model knows exactly how it is being scored and whether it can easily start over if it makes a mistake (for messy tasks, the answer to both questions would be no). They found that models do noticeably worse on messy tasks, although the overall pattern of improvement holds for both messy and non-messy ones.

    And even the messiest tasks that METR considered can’t provide much information about AI’s ability to take on most jobs, because the plot is based almost entirely on coding tasks. “A model can get better at coding, but it’s not going to magically get better at anything else,” says Daniel Kang, an assistant professor of computer science at the University of Illinois Urbana-Champaign. In a follow-up study, Kwa and his colleagues did find that time horizons for tasks in other domains also appear to be on exponential trajectories, but that work was much less formal.

    Despite these limitations, many people admire the group’s research. “The METR study is one of the most carefully designed studies in the literature for this kind of work,” Kang told me. Even Gary Marcus, a former NYU professor and professional LLM curmudgeon, described much of the work that went into the plot as “terrific” in a blog post.

    Some people will almost certainly continue to read the METR plot as a prognostication of our AI-induced doom, but in reality it’s something far more banal: a carefully constructed scientific tool that puts concrete numbers to people’s intuitive sense of AI progress. As METR employees will readily agree, the plot is far from a perfect instrument. But in a new and fast-moving domain, even imperfect tools can have enormous value.

    “This is a bunch of people trying their best to make a metric under a lot of constraints. It is deeply flawed in many ways,” Von Arx says. “I also think that it is one of the best things of its kind.”

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleFrom guardrails to governance: A CEO’s guide for securing agentic systems
    Next Article Three questions about next-generation nuclear power, answered
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Resident Evil Requiem DLC and Resident Evil 10 release dates may be sooner than expected

    February 14, 2026

    Poco Pad X1: Destroys the iPad

    February 14, 2026

    Epic Games Store follows award winners with quieter free games lineup for late February 2026

    February 14, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025672 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025260 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025153 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025112 Views
    Don't Miss
    Technology February 14, 2026

    Resident Evil Requiem DLC and Resident Evil 10 release dates may be sooner than expected

    Resident Evil Requiem DLC and Resident Evil 10 release dates may be sooner than expected…

    Poco Pad X1: Destroys the iPad

    Epic Games Store follows award winners with quieter free games lineup for late February 2026

    OnePlus releases new February 2026 OxygenOS update with improved AI Eraser, new video editing tools, updated AI Writer, and more

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Resident Evil Requiem DLC and Resident Evil 10 release dates may be sooner than expected

    February 14, 20263 Views

    Poco Pad X1: Destroys the iPad

    February 14, 20261 Views

    Epic Games Store follows award winners with quieter free games lineup for late February 2026

    February 14, 20263 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    This new Roomba finally solves the big problem I have with robot vacuums

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.