Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    The best game about an unhinged goose is just $7 on Steam right now

    HP OmniBook 5 14 review: Over 25 hours of battery power

    I don’t need AI in Windows. I need an operating system that works

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Blue-collar jobs are gaining popularity as AI threatens office work

      August 17, 2025

      Man who asked ChatGPT about cutting out salt from his diet was hospitalized with hallucinations

      August 15, 2025

      What happens when chatbots shape your reality? Concerns are growing online

      August 14, 2025

      Scientists want to prevent AI from going rogue by teaching it to be bad first

      August 8, 2025

      AI models may be accidentally (and secretly) learning each other’s bad behaviors

      July 30, 2025
    • Business

      Why Certified VMware Pros Are Driving the Future of IT

      August 24, 2025

      Murky Panda hackers exploit cloud trust to hack downstream customers

      August 23, 2025

      The rise of sovereign clouds: no data portability, no party

      August 20, 2025

      Israel is reportedly storing millions of Palestinian phone calls on Microsoft servers

      August 6, 2025

      AI site Perplexity uses “stealth tactics” to flout no-crawl edicts, Cloudflare says

      August 5, 2025
    • Crypto

      Circle Partners With Finastra on $5 Trillion USDC Settlement

      August 28, 2025

      US and China Are Laundering Europeans’ Personal Data — Is Blockchain the Fix?

      August 28, 2025

      Does Coinbase’s New Hiring Policy Contradict US Federal Law?

      August 28, 2025

      Nvidia Earnings Report Shows Record Revenues Despite Zero Sales in China

      August 28, 2025

      One Sleuth Sounds The Alarm: Crypto Scam Prevention Isn’t Working

      August 28, 2025
    • Technology

      The best game about an unhinged goose is just $7 on Steam right now

      August 28, 2025

      HP OmniBook 5 14 review: Over 25 hours of battery power

      August 28, 2025

      I don’t need AI in Windows. I need an operating system that works

      August 28, 2025

      A new cloud storage doesn’t charge monthly fees, and their 1TB plan just went on sale

      August 28, 2025

      Windows 11 Pro is normally $199, but right now, it’s only $13

      August 28, 2025
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»Why do LLMs make stuff up? New research peers under the hood.
    Technology

    Why do LLMs make stuff up? New research peers under the hood.

    TechAiVerseBy TechAiVerseMarch 30, 2025No Comments6 Mins Read3 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Why do LLMs make stuff up? New research peers under the hood.
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    BMI Calculator – Check your Body Mass Index for free!

    Why do LLMs make stuff up? New research peers under the hood.


    Skip to content

    Claude’s faulty “known entity” neurons sometime override its “don’t answer” circuitry.

    Which of those boxes represents the “I don’t know” part of Claude’s digital “brain”?


    Credit:

    Getty Images

    One of the most frustrating things about using a large language model is dealing with its tendency to confabulate information, hallucinating answers that are not supported by its training data. From a human perspective, it can be hard to understand why these models don’t simply say “I don’t know” instead of making up some plausible-sounding nonsense.

    Now, new research from Anthropic is exposing at least some of the inner neural network “circuitry” that helps an LLM decide when to take a stab at a (perhaps hallucinated) response versus when to refuse an answer in the first place. While human understanding of this internal LLM “decision” process is still rough, this kind of research could lead to better overall solutions for the AI confabulation problem.

    When a “known entity” isn’t

    In a groundbreaking paper last May, Anthropic used a system of sparse auto-encoders to help illuminate the groups of artificial neurons that are activated when the Claude LLM encounters internal concepts ranging from “Golden Gate Bridge” to “programming errors” (Anthropic calls these groupings “features,” as we will in the remainder of this piece). Anthropic’s newly published research this week expands on that previous work by tracing how these features can affect other neuron groups that represent computational decision “circuits” Claude follows in crafting its response.

    In a pair of papers, Anthropic goes into great detail on how a partial examination of some of these internal neuron circuits provides new insight into how Claude “thinks” in multiple languages, how it can be fooled by certain jailbreak techniques, and even whether its ballyhooed “chain of thought” explanations are accurate. But the section describing Claude’s “entity recognition and hallucination” process provided one of the most detailed explanations of a complicated problem that we’ve seen.

    At their core, large language models are designed to take a string of text and predict the text that is likely to follow—a design that has led some to deride the whole endeavor as “glorified auto-complete.” That core design is useful when the prompt text closely matches the kinds of things already found in a model’s copious training data. However, for “relatively obscure facts or topics,” this tendency toward always completing the prompt “incentivizes models to guess plausible completions for blocks of text,” Anthropic writes in its new research.

    Fine-tuning helps mitigate this problem, guiding the model to act as a helpful assistant and to refuse to complete a prompt when its related training data is sparse. That fine-tuning process creates distinct sets of artificial neurons that researchers can see activating when Claude encounters the name of a “known entity” (e.g., “Michael Jordan”) or an “unfamiliar name” (e.g., “Michael Batkin”) in a prompt.

    A simplified graph showing how various features and circuits interact in prompts about sports stars, real and fake.

    A simplified graph showing how various features and circuits interact in prompts about sports stars, real and fake.


    Credit:

    Anthropic


    Activating the “unfamiliar name” feature amid an LLM’s neurons tends to promote an internal “can’t answer” circuit in the model, the researchers write, encouraging it to provide a response starting along the lines of “I apologize, but I cannot…” In fact, the researchers found that the “can’t answer” circuit tends to default to the “on” position in the fine-tuned “assistant” version of the Claude model, making the model reluctant to answer a question unless other active features in its neural net suggest that it should.

    That’s what happens when the model encounters a well-known term like “Michael Jordan” in a prompt, activating that “known entity” feature and in turn causing the neurons in the “can’t answer” circuit to be “inactive or more weakly active,” the researchers write. Once that happens, the model can dive deeper into its graph of Michael Jordan-related features to provide its best guess at an answer to a question like “What sport does Michael Jordan play?”

    Recognition vs. recall

    Anthropic’s research found that artificially increasing the neurons’ weights in the “known answer” feature could force Claude to confidently hallucinate information about completely made-up athletes like “Michael Batkin.” That kind of result leads the researchers to suggest that “at least some” of Claude’s hallucinations are related to a “misfire” of the circuit inhibiting that “can’t answer” pathway—that is, situations where the “known entity” feature (or others like it) is activated even when the token isn’t actually well-represented in the training data.

    Unfortunately, Claude’s modeling of what it knows and doesn’t know isn’t always particularly fine-grained or cut and dried. In another example, researchers note that asking Claude to name a paper written by AI researcher Andrej Karpathy causes the model to confabulate the plausible-sounding but completely made-up paper title “ImageNet Classification with Deep Convolutional Neural Networks.” Asking the same question about Anthropic mathematician Josh Batson, on the other hand, causes Claude to respond that it “cannot confidently name a specific paper… without verifying the information.”

    Artificially suppressing Claude’s the “known answer” neurons prevent it from hallucinating made-up papers by AI researcher Andrej Karpathy.

    Artificially suppressing Claude’s the “known answer” neurons prevent it from hallucinating made-up papers by AI researcher Andrej Karpathy.


    Credit:

    Anthropic


    After experimenting with feature weights, the Anthropic researchers theorize that the Karpathy hallucination may be caused because the model at least recognizes Karpathy’s name, activating certain “known answer/entity” features in the model. These features then inhibit the model’s default “don’t answer” circuit even though the model doesn’t have more specific information on the names of Karpathy’s papers (which the model then duly guesses at after it has committed to answering at all). A model fine-tuned to have more robust and specific sets of these kinds of “known entity” features might then be able to better distinguish when it should and shouldn’t be confident in its ability to answer.

    This and other research into the low-level operation of LLMs provides some crucial context for how and why models provide the kinds of answers they do. But Anthropic warns that its current investigatory process still “only captures a fraction of the total computation performed by Claude” and requires “a few hours of human effort” to understand the circuits and features involved in even a short prompt “with tens of words.” Hopefully, this is just the first step into more powerful research methods that can provide even deeper insight into LLMs’ confabulation problem and maybe, one day, how to fix it.

    Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.



    91 Comments

    BMI Calculator – Check your Body Mass Index for free!

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleBeyond RGB: A new image file format efficiently stores invisible light data
    Next Article New Windows 11 build makes mandatory Microsoft Account sign-in even more mandatory
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    The best game about an unhinged goose is just $7 on Steam right now

    August 28, 2025

    HP OmniBook 5 14 review: Over 25 hours of battery power

    August 28, 2025

    I don’t need AI in Windows. I need an operating system that works

    August 28, 2025
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025166 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 202548 Views

    New Akira ransomware decryptor cracks encryptions keys using GPUs

    March 16, 202530 Views

    Is Libby Compatible With Kobo E-Readers?

    March 31, 202527 Views
    Don't Miss
    Technology August 28, 2025

    The best game about an unhinged goose is just $7 on Steam right now

    The best game about an unhinged goose is just $7 on Steam right now Image:…

    HP OmniBook 5 14 review: Over 25 hours of battery power

    I don’t need AI in Windows. I need an operating system that works

    A new cloud storage doesn’t charge monthly fees, and their 1TB plan just went on sale

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    The best game about an unhinged goose is just $7 on Steam right now

    August 28, 20252 Views

    HP OmniBook 5 14 review: Over 25 hours of battery power

    August 28, 20252 Views

    I don’t need AI in Windows. I need an operating system that works

    August 28, 20252 Views
    Most Popular

    Xiaomi 15 Ultra Officially Launched in China, Malaysia launch to follow after global event

    March 12, 20250 Views

    Apple thinks people won’t use MagSafe on iPhone 16e

    March 12, 20250 Views

    French Apex Legends voice cast refuses contracts over “unacceptable” AI clause

    March 12, 20250 Views
    © 2025 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.