Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    AI and backup: How backup products leverage AI

    Nvidia’s Jensen Huang lobbies Trump administration to sell Blackwell to China

    Cyber Girls First is encouraging girls to study tech at university  

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Blue-collar jobs are gaining popularity as AI threatens office work

      August 17, 2025

      Man who asked ChatGPT about cutting out salt from his diet was hospitalized with hallucinations

      August 15, 2025

      What happens when chatbots shape your reality? Concerns are growing online

      August 14, 2025

      Scientists want to prevent AI from going rogue by teaching it to be bad first

      August 8, 2025

      AI models may be accidentally (and secretly) learning each other’s bad behaviors

      July 30, 2025
    • Business

      Why Certified VMware Pros Are Driving the Future of IT

      August 24, 2025

      Murky Panda hackers exploit cloud trust to hack downstream customers

      August 23, 2025

      The rise of sovereign clouds: no data portability, no party

      August 20, 2025

      Israel is reportedly storing millions of Palestinian phone calls on Microsoft servers

      August 6, 2025

      AI site Perplexity uses “stealth tactics” to flout no-crawl edicts, Cloudflare says

      August 5, 2025
    • Crypto

      Japan Auto Parts Maker Invests US Stablecoin Firm and Its Stock Soars

      August 29, 2025

      Stablecoin Card Firm Rain Raise $58M from Samsung and Sapphire

      August 29, 2025

      Shark Tank Star Kevin O’Leary Expands to Bitcoin ETF

      August 29, 2025

      BitMine Stock Moves Opposite to Ethereum — What Are Analysts Saying?

      August 29, 2025

      Argentina’s Opposition Parties Reactivate LIBRA Investigation Into President Milei

      August 29, 2025
    • Technology

      AI and backup: How backup products leverage AI

      August 29, 2025

      Nvidia’s Jensen Huang lobbies Trump administration to sell Blackwell to China

      August 29, 2025

      Cyber Girls First is encouraging girls to study tech at university  

      August 29, 2025

      Zopa Bank seeks partners to form AI skills coalition

      August 29, 2025

      Microsoft refuses to divulge data flows to Police Scotland

      August 29, 2025
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»Why Anthropic’s Claude still hasn’t beaten Pokémon
    Technology

    Why Anthropic’s Claude still hasn’t beaten Pokémon

    TechAiVerseBy TechAiVerseMarch 21, 2025No Comments12 Mins Read1 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Why Anthropic’s Claude still hasn’t beaten Pokémon
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    BMI Calculator – Check your Body Mass Index for free!

    Why Anthropic’s Claude still hasn’t beaten Pokémon

    Weeks later, Sonnet’s “reasoning” model is struggling with a game designed for children.

    Gotta subsume ’em all into the machine consciousness!


    Credit:

    Aurich Lawson

    Gotta subsume ’em all into the machine consciousness!


    Credit:

    Aurich Lawson

    In recent months, the AI industry’s biggest boosters have started converging on a public expectation that we’re on the verge of “artificial general intelligence” (AGI)—virtual agents that can match or surpass “human-level” understanding and performance on most cognitive tasks.

    OpenAI is quietly seeding expectations for a “PhD-level” AI agent that could operate autonomously at the level of a “high-income knowledge worker” in the near future. Elon Musk says that “we’ll have AI smarter than any one human probably” by the end of 2025. Anthropic CEO Dario Amodei thinks it might take a bit longer but similarly says it’s plausible that AI will be “better than humans at almost everything” by the end of 2027.

    A few researchers at Anthropic have, over the past year, had a part-time obsession with a peculiar problem.

    Can Claude play Pokémon?

    A thread: pic.twitter.com/K8SkNXCxYJ

    — Anthropic (@AnthropicAI) February 25, 2025

    Last month, Anthropic presented its “Claude Plays Pokémon” experiment as a waypoint on the road to that predicted AGI future. It’s a project the company said shows “glimmers of AI systems that tackle challenges with increasing competence, not just through training but with generalized reasoning.” Anthropic made headlines by trumpeting how Claude 3.7 Sonnet’s “improved reasoning capabilities” let the company’s latest model make progress in the popular old-school Game Boy RPG in ways “that older models had little hope of achieving.”

    While Claude models from just a year ago struggled even to leave the game’s opening area, Claude 3.7 Sonnet was able to make progress by collecting multiple in-game Gym Badges in a relatively small number of in-game actions. That breakthrough, Anthropic wrote, was because the “extended thinking” by Claude 3.7 Sonnet means the new model “plans ahead, remembers its objectives, and adapts when initial strategies fail” in a way that its predecessors didn’t. Those things, Anthropic brags, are “critical skills for battling pixelated gym leaders. And, we posit, in solving real-world problems too.”

    Over the last year, new Claude models have shown quick progress in reaching new Pokémon milestones.

    Over the last year, new Claude models have shown quick progress in reaching new Pokémon milestones.


    Credit:

    Anthropic


    But relative success over previous models is not the same as absolute success over the game in its entirety. In the weeks since Claude Plays Pokémon was first made public, thousands of Twitch viewers have watched Claude struggle to make consistent progress in the game. Despite long “thinking” pauses between each move—during which viewers can read printouts of the system’s simulated reasoning process—Claude frequently finds itself pointlessly revisiting completed towns, getting stuck in blind corners of the map for extended periods, or fruitlessly talking to the same unhelpful NPC over and over, to cite just a few examples of distinctly sub-human in-game performance.

    Watching Claude continue to struggle at a game designed for children, it’s hard to imagine we’re witnessing the genesis of some sort of computer superintelligence. But even Claude’s current sub-human level of Pokémon performance could hold significant lessons for the quest toward generalized, human-level artificial intelligence.

    Smart in different ways

    In some sense, it’s impressive that Claude can play Pokémon with any facility at all. When developing AI systems that find dominant strategies in games like Go and Dota 2, engineers generally start their algorithms off with deep knowledge of a game’s rules and/or basic strategies, as well as a reward function to guide them toward better performance. For Claude Plays Pokémon, though, project developer and Anthropic employee David Hershey says he started with an unmodified, generalized Claude model that wasn’t specifically trained or tuned to play Pokémon games in any way.

    “This is purely the various other things that [Claude] understands about the world being used to point at video games,” Hershey told Ars. “So it has a sense of a Pokémon. If you go to claude.ai and ask about Pokémon, it knows what Pokémon is based on what it’s read… If you ask, it’ll tell you there’s eight gym badges, it’ll tell you the first one is Brock… it knows the broad structure.”

    A flowchart summarizing the pieces that help Claude interact with an active game of Pokémon (click through to zoom in).

    A flowchart summarizing the pieces that help Claude interact with an active game of Pokémon (click through to zoom in).


    Credit:

    Anthropic / Excelidraw


    In addition to directly monitoring certain key (emulated) Game Boy RAM addresses for game state information, Claude views and interprets the game’s visual output much like a human would. But despite recent advances in AI image processing, Hershey said Claude still struggles to interpret the low-resolution, pixelated world of a Game Boy screenshot as well as a human can. “Claude’s still not particularly good at understanding what’s on the screen at all,” he said. “You will see it attempt to walk into walls all the time.”

    Hershey said he suspects Claude’s training data probably doesn’t contain many overly detailed text descriptions of “stuff that looks like a Game Boy screen.” This means that, somewhat surprisingly, if Claude were playing a game with “more realistic imagery, I think Claude would actually be able to see a lot better,” Hershey said.

    “It’s one of those funny things about humans that we can squint at these eight-by-eight pixel blobs of people and say, ‘That’s a girl with blue hair,’” Hershey continued. “People, I think, have that ability to map from our real world to understand and sort of grok that… so I’m honestly kind of surprised that Claude’s as good as it is at being able to see there’s a person on the screen.”

    Even with a perfect understanding of what it’s seeing on-screen, though, Hershey said Claude would still struggle with 2D navigation challenges that would be trivial for a human. “It’s pretty easy for me to understand that [an in-game] building is a building and that I can’t walk through a building,” Hershey said. “And that’s [something] that’s pretty challenging for Claude to understand… It’s funny because it’s just kind of smart in different ways, you know?”

    A sample Pokémon screen with an overlay showing how Claude characterizes the game’s grid-based map.

    A sample Pokémon screen with an overlay showing how Claude characterizes the game’s grid-based map.


    Credit:

    Anthrropic / X


    Where Claude tends to perform better, Hershey said, is in the more text-based portions of the game. During an in-game battle, Claude will readily notice when the game tells it that an attack from an electric-type Pokémon is “not very effective” against a rock-type opponent, for instance. Claude will then squirrel that factoid away in a massive written knowledge base for future reference later in the run. Claude can also integrate multiple pieces of similar knowledge into pretty elegant battle strategies, even extending those strategies into long-term plans for catching and managing teams of multiple creatures for future battles.

    Claude can even show surprising “intelligence” when Pokémon’s in-game text is intentionally misleading or incomplete. “It’s pretty funny that they tell you you need to go find Professor Oak next door and then he’s not there,” Hershey said of an early-game task. “As a 5-year-old, that was very confusing to me. But Claude actually typically goes through that same set of motions where it talks to mom, goes to the lab, doesn’t find [Oak], says, ‘I need to figure something out’… It’s sophisticated enough to sort of go through the motions of the way [humans are] actually supposed to learn it, too.”

    A sample of the kind of simulated reasoning process Claude steps through during a typical Pokémon battle.

    A sample of the kind of simulated reasoning process Claude steps through during a typical Pokémon battle.


    Credit:

    Claude Plays Pokemon / Twitch


    These kinds of relative strengths and weaknesses when compared to “human-level” play reflect the overall state of AI research and capabilities in general, Hershey said. “I think it’s just a sort of universal thing about these models… We built the text side of it first, and the text side is definitely… more powerful. How these models can reason about images is getting better, but I think it’s a decent bit behind.”

    Forget me not

    Beyond issues parsing text and images, Hershey also acknowledged that Claude can have trouble “remembering” what it has already learned. The current model has a “context window” of 200,000 tokens, limiting the amount of relational information it can store in its “memory” at any one time. When the system’s ever-expanding knowledge base fills up this context window, Claude goes through an elaborate summarization process, condensing detailed notes on what it has seen, done, and learned so far into shorter text summaries that lose some of the fine-grained details.

    This can mean that Claude “has a hard time keeping track of things for a very long time and really having a great sense of what it’s tried so far,” Hershey said. “You will definitely see it occasionally delete something that it shouldn’t have. Anything that’s not in your knowledge base or not in your summary is going to be gone, so you have to think about what you want to put there.”

    A small window into the kind of “cleaning up my context” knowledge-base update necessitated by Claude’s limited “memory.”

    A small window into the kind of “cleaning up my context” knowledge-base update necessitated by Claude’s limited “memory.”


    Credit:

    Claude Play Pokemon / Twitch


    More than forgetting important history, though, Claude runs into bigger problems when it inadvertently inserts incorrect information into its knowledge base. Like a conspiracy theorist who builds an entire worldview from an inherently flawed premise, Claude can be incredibly slow to recognize when an error in its self-authored knowledge base is leading its Pokémon play astray.

    “The things that are written down in the past, it sort of trusts pretty blindly,” Hershey said. “I have seen it become very convinced that it found the exit to [in-game location] Viridian Forest at some specific coordinates, and then it spends hours and hours exploring a little small square around those coordinates that are wrong instead of doing anything else. It takes a very long time for it to decide that that was a ‘fail.’”

    Still, Hershey said Claude 3.7 Sonnet is much better than earlier models at eventually “questioning its assumptions, trying new strategies, and keeping track over long horizons of various strategies to [see] whether they work or not.” While the new model will still “struggle for really long periods of time” retrying the same thing over and over, it will ultimately tend to “get a sense of what’s going on and what it’s tried before, and it stumbles a lot of times into actual progress from that,” Hershey said.

    “We’re getting pretty close…”

    One of the most interesting things about observing Claude Plays Pokémon across multiple iterations and restarts, Hershey said, is seeing how the system’s progress and strategy can vary quite a bit between runs. Sometimes Claude will show it’s “capable of actually building a pretty coherent strategy” by “keeping detailed notes about the different paths to try,” for instance, he said. But “most of the time it doesn’t… most of the time, it wanders into the wall because it’s confident it sees the exit.”

    Where previous models wandered aimlessly or got stuck in loops, Claude 3.7 Sonnet plans ahead, remembers its objectives, and adapts when initial strategies fail.

    Critical skills for battling pixelated gym leaders. And, we posit, in solving real-world problems too. pic.twitter.com/scvISp14XG

    — Anthropic (@AnthropicAI) February 25, 2025

    One of the biggest things preventing the current version of Claude from getting better, Hershey said, is that “when it derives that good strategy, I don’t think it necessarily has the self-awareness to know that one strategy [it] came up with is better than another.” And that’s not a trivial problem to solve.

    Still, Hershey said he sees “low-hanging fruit” for improving Claude’s Pokémon play by improving the model’s understanding of Game Boy screenshots. “I think there’s a chance it could beat the game if it had a perfect sense of what’s on the screen,” Hershey said, saying that such a model would probably perform “a little bit short of human.”

    Expanding the context window for future Claude models will also probably allow those models to “reason over longer time frames and handle things more coherently over a long period of time,” Hershey said. Future models will improve by getting “a little bit better at remembering, keeping track of a coherent set of what it needs to try to make progress,” he added.

    Twitch chat responds with a flood of bouncing emojis as Claude concludes an epic 78+ hour escape from Pokémon’s Mt. Moon.

    Twitch chat responds with a flood of bouncing emojis as Claude concludes an epic 78+ hour escape from Pokémon’s Mt. Moon.


    Credit:

    Claude Plays Pokemon / Twitch


    Whatever you think about impending improvements in AI models, though, Claude’s current performance at Pokémon doesn’t make it seem like it’s poised to usher in an explosion of human-level, completely generalizable artificial intelligence. And Hershey allows that watching Claude 3.7 Sonnet get stuck on Mt. Moon for 80 hours or so can make it “seem like a model that doesn’t know what it’s doing.”

    But Hershey is still impressed at the way that Claude’s new reasoning model will occasionally show some glimmer of awareness and “kind of tell that it doesn’t know what it’s doing and know that it needs to be doing something different. And the difference between ‘can’t do it at all’ and ‘can kind of do it’ is a pretty big one for these AI things for me,” he continued. “You know, when something can kind of do something it typically means we’re pretty close to getting it to be able to do something really, really well.”

    Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.



    78 Comments

    BMI Calculator – Check your Body Mass Index for free!

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleRocket Report: Falcon 9 may smash reuse record; Relativity roving to Texas?
    Next Article How the language of job postings can attract rule-bending narcissists
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    AI and backup: How backup products leverage AI

    August 29, 2025

    Nvidia’s Jensen Huang lobbies Trump administration to sell Blackwell to China

    August 29, 2025

    Cyber Girls First is encouraging girls to study tech at university  

    August 29, 2025
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025166 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 202548 Views

    New Akira ransomware decryptor cracks encryptions keys using GPUs

    March 16, 202530 Views

    Is Libby Compatible With Kobo E-Readers?

    March 31, 202528 Views
    Don't Miss
    Technology August 29, 2025

    AI and backup: How backup products leverage AI

    AI and backup: How backup products leverage AI Software applications often list artificial intelligence (AI)…

    Nvidia’s Jensen Huang lobbies Trump administration to sell Blackwell to China

    Cyber Girls First is encouraging girls to study tech at university  

    Zopa Bank seeks partners to form AI skills coalition

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    AI and backup: How backup products leverage AI

    August 29, 20252 Views

    Nvidia’s Jensen Huang lobbies Trump administration to sell Blackwell to China

    August 29, 20252 Views

    Cyber Girls First is encouraging girls to study tech at university  

    August 29, 20251 Views
    Most Popular

    Xiaomi 15 Ultra Officially Launched in China, Malaysia launch to follow after global event

    March 12, 20250 Views

    Apple thinks people won’t use MagSafe on iPhone 16e

    March 12, 20250 Views

    French Apex Legends voice cast refuses contracts over “unacceptable” AI clause

    March 12, 20250 Views
    © 2025 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.