Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Affordable Luxury: Decadent Tech Must-Haves Everyone Secretly Wants

    How to Fix a YouTube Black Screen

    Wi-Fi Grayed Out on Your iPhone? Here’s How to Get It Back

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026

      To avoid accusations of AI cheating, college students are turning to AI

      January 29, 2026

      ChatGPT can embrace authoritarian ideas after just one prompt, researchers say

      January 24, 2026
    • Business

      The HDD brand that brought you the 1.8-inch, 2.5-inch, and 3.5-inch hard drives is now back with a $19 pocket-sized personal cloud for your smartphones

      February 12, 2026

      New VoidLink malware framework targets Linux cloud servers

      January 14, 2026

      Nvidia Rubin’s rack-scale encryption signals a turning point for enterprise AI security

      January 13, 2026

      How KPMG is redefining the future of SAP consulting on a global scale

      January 10, 2026

      Top 10 cloud computing stories of 2025

      December 22, 2025
    • Crypto

      Wall Street Moves Into Prediction Markets With Election-Contract ETF Filings

      February 18, 2026

      Tectonic to Host Inaugural Quantum Summit at ETHDenver 2026 Focused on Post-Quantum Cryptography Readiness for Web3

      February 18, 2026

      Ki Young Ju Says Bitcoin May Need to Hit $55K Before True Recovery Begins

      February 18, 2026

      MYX Finance Is Oversold For The First Time Ever, Yet No Relief In Sight

      February 18, 2026

      Everyone is Talking about the SaaSpocalypse, But Why Does it matter for Crypto?

      February 18, 2026
    • Technology

      Affordable Luxury: Decadent Tech Must-Haves Everyone Secretly Wants

      February 19, 2026

      How to Fix a YouTube Black Screen

      February 19, 2026

      Wi-Fi Grayed Out on Your iPhone? Here’s How to Get It Back

      February 19, 2026

      This Asus Portable Monitor Is Not Your Average Second Screen and It’s 20% Off

      February 19, 2026

      Before You Buy a Wi-Fi Extender, Try This

      February 19, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»Why do LLMs make stuff up? New research peers under the hood.
    Technology

    Why do LLMs make stuff up? New research peers under the hood.

    TechAiVerseBy TechAiVerseMarch 30, 2025No Comments6 Mins Read3 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Why do LLMs make stuff up? New research peers under the hood.
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    Why do LLMs make stuff up? New research peers under the hood.


    Skip to content

    Claude’s faulty “known entity” neurons sometime override its “don’t answer” circuitry.

    Which of those boxes represents the “I don’t know” part of Claude’s digital “brain”?


    Credit:

    Getty Images

    One of the most frustrating things about using a large language model is dealing with its tendency to confabulate information, hallucinating answers that are not supported by its training data. From a human perspective, it can be hard to understand why these models don’t simply say “I don’t know” instead of making up some plausible-sounding nonsense.

    Now, new research from Anthropic is exposing at least some of the inner neural network “circuitry” that helps an LLM decide when to take a stab at a (perhaps hallucinated) response versus when to refuse an answer in the first place. While human understanding of this internal LLM “decision” process is still rough, this kind of research could lead to better overall solutions for the AI confabulation problem.

    When a “known entity” isn’t

    In a groundbreaking paper last May, Anthropic used a system of sparse auto-encoders to help illuminate the groups of artificial neurons that are activated when the Claude LLM encounters internal concepts ranging from “Golden Gate Bridge” to “programming errors” (Anthropic calls these groupings “features,” as we will in the remainder of this piece). Anthropic’s newly published research this week expands on that previous work by tracing how these features can affect other neuron groups that represent computational decision “circuits” Claude follows in crafting its response.

    In a pair of papers, Anthropic goes into great detail on how a partial examination of some of these internal neuron circuits provides new insight into how Claude “thinks” in multiple languages, how it can be fooled by certain jailbreak techniques, and even whether its ballyhooed “chain of thought” explanations are accurate. But the section describing Claude’s “entity recognition and hallucination” process provided one of the most detailed explanations of a complicated problem that we’ve seen.

    At their core, large language models are designed to take a string of text and predict the text that is likely to follow—a design that has led some to deride the whole endeavor as “glorified auto-complete.” That core design is useful when the prompt text closely matches the kinds of things already found in a model’s copious training data. However, for “relatively obscure facts or topics,” this tendency toward always completing the prompt “incentivizes models to guess plausible completions for blocks of text,” Anthropic writes in its new research.

    Fine-tuning helps mitigate this problem, guiding the model to act as a helpful assistant and to refuse to complete a prompt when its related training data is sparse. That fine-tuning process creates distinct sets of artificial neurons that researchers can see activating when Claude encounters the name of a “known entity” (e.g., “Michael Jordan”) or an “unfamiliar name” (e.g., “Michael Batkin”) in a prompt.

    A simplified graph showing how various features and circuits interact in prompts about sports stars, real and fake.

    A simplified graph showing how various features and circuits interact in prompts about sports stars, real and fake.


    Credit:

    Anthropic


    Activating the “unfamiliar name” feature amid an LLM’s neurons tends to promote an internal “can’t answer” circuit in the model, the researchers write, encouraging it to provide a response starting along the lines of “I apologize, but I cannot…” In fact, the researchers found that the “can’t answer” circuit tends to default to the “on” position in the fine-tuned “assistant” version of the Claude model, making the model reluctant to answer a question unless other active features in its neural net suggest that it should.

    That’s what happens when the model encounters a well-known term like “Michael Jordan” in a prompt, activating that “known entity” feature and in turn causing the neurons in the “can’t answer” circuit to be “inactive or more weakly active,” the researchers write. Once that happens, the model can dive deeper into its graph of Michael Jordan-related features to provide its best guess at an answer to a question like “What sport does Michael Jordan play?”

    Recognition vs. recall

    Anthropic’s research found that artificially increasing the neurons’ weights in the “known answer” feature could force Claude to confidently hallucinate information about completely made-up athletes like “Michael Batkin.” That kind of result leads the researchers to suggest that “at least some” of Claude’s hallucinations are related to a “misfire” of the circuit inhibiting that “can’t answer” pathway—that is, situations where the “known entity” feature (or others like it) is activated even when the token isn’t actually well-represented in the training data.

    Unfortunately, Claude’s modeling of what it knows and doesn’t know isn’t always particularly fine-grained or cut and dried. In another example, researchers note that asking Claude to name a paper written by AI researcher Andrej Karpathy causes the model to confabulate the plausible-sounding but completely made-up paper title “ImageNet Classification with Deep Convolutional Neural Networks.” Asking the same question about Anthropic mathematician Josh Batson, on the other hand, causes Claude to respond that it “cannot confidently name a specific paper… without verifying the information.”

    Artificially suppressing Claude’s the “known answer” neurons prevent it from hallucinating made-up papers by AI researcher Andrej Karpathy.

    Artificially suppressing Claude’s the “known answer” neurons prevent it from hallucinating made-up papers by AI researcher Andrej Karpathy.


    Credit:

    Anthropic


    After experimenting with feature weights, the Anthropic researchers theorize that the Karpathy hallucination may be caused because the model at least recognizes Karpathy’s name, activating certain “known answer/entity” features in the model. These features then inhibit the model’s default “don’t answer” circuit even though the model doesn’t have more specific information on the names of Karpathy’s papers (which the model then duly guesses at after it has committed to answering at all). A model fine-tuned to have more robust and specific sets of these kinds of “known entity” features might then be able to better distinguish when it should and shouldn’t be confident in its ability to answer.

    This and other research into the low-level operation of LLMs provides some crucial context for how and why models provide the kinds of answers they do. But Anthropic warns that its current investigatory process still “only captures a fraction of the total computation performed by Claude” and requires “a few hours of human effort” to understand the circuits and features involved in even a short prompt “with tens of words.” Hopefully, this is just the first step into more powerful research methods that can provide even deeper insight into LLMs’ confabulation problem and maybe, one day, how to fix it.

    Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.



    91 Comments

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleBeyond RGB: A new image file format efficiently stores invisible light data
    Next Article New Windows 11 build makes mandatory Microsoft Account sign-in even more mandatory
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Affordable Luxury: Decadent Tech Must-Haves Everyone Secretly Wants

    February 19, 2026

    How to Fix a YouTube Black Screen

    February 19, 2026

    Wi-Fi Grayed Out on Your iPhone? Here’s How to Get It Back

    February 19, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025684 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025272 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025156 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025117 Views
    Don't Miss
    Technology February 19, 2026

    Affordable Luxury: Decadent Tech Must-Haves Everyone Secretly Wants

    Affordable Luxury: Decadent Tech Must-Haves Everyone Secretly Wants If you are a reader experiencing an…

    How to Fix a YouTube Black Screen

    Wi-Fi Grayed Out on Your iPhone? Here’s How to Get It Back

    This Asus Portable Monitor Is Not Your Average Second Screen and It’s 20% Off

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Affordable Luxury: Decadent Tech Must-Haves Everyone Secretly Wants

    February 19, 20263 Views

    How to Fix a YouTube Black Screen

    February 19, 20261 Views

    Wi-Fi Grayed Out on Your iPhone? Here’s How to Get It Back

    February 19, 20263 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    This new Roomba finally solves the big problem I have with robot vacuums

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.