Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Honda CR-V Hybrid Lineup Expanded in Malaysia From RM178,200

    vivo V70 – Top 7 Flagship Features You Will Love

    Apple iPad Air with M4 Officially Launches in Malaysia From RM2,799

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      What the polls say about how Americans are using AI

      February 27, 2026

      Tensions between the Pentagon and AI giant Anthropic reach a boiling point

      February 21, 2026

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026
    • Business

      Weighing up the enterprise risks of neocloud providers

      March 3, 2026

      A stolen Gemini API key turned a $180 bill into $82,000 in two days

      March 3, 2026

      These ultra-budget laptops “include” 1.2TB storage, but most of it is OneDrive trial space

      March 1, 2026

      FCC approves the merger of cable giants Cox and Charter

      February 28, 2026

      Finding value with AI and Industry 5.0 transformation

      February 28, 2026
    • Crypto

      Strait of Hormuz Shutdown Shakes Asian Energy Markets

      March 3, 2026

      Wall Street’s Inflation Alarm From Iran — What It Means for Crypto

      March 3, 2026

      Ethereum Price Prediction: What To Expect From ETH In March 2026

      March 3, 2026

      Was Bitcoin Hijacked? How Institutional Interests Shaped Its Narrative Since 2015

      March 3, 2026

      XRP Whales Now Hold 83.7% of All Supply – What’s Next For Price?

      March 3, 2026
    • Technology

      Spotify’s new feature makes it easier to find popular audiobooks

      March 3, 2026

      This portable JBL Grip Bluetooth speaker is so good at 20% off

      March 3, 2026

      ‘AI’ could dox your anonymous posts

      March 3, 2026

      Microsoft says new Teams location feature isn’t for ’employee tracking’

      March 3, 2026

      OpenAI got ‘sloppy’ about the wrong thing

      March 3, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»Researchers astonished by tool’s apparent success at revealing AI’s “hidden objectives”
    Technology

    Researchers astonished by tool’s apparent success at revealing AI’s “hidden objectives”

    TechAiVerseBy TechAiVerseMarch 16, 2025No Comments2 Mins Read2 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Researchers astonished by tool’s apparent success at revealing AI’s “hidden objectives”
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    Researchers astonished by tool’s apparent success at revealing AI’s “hidden objectives”

    Blind auditing reveals “hidden objectives”

    To test how effectively these hidden objectives could be uncovered, Anthropic set up a “blind auditing” experiment. Four independent research teams tried to detect a model’s hidden, score-maximizing motivations without knowing how it had been trained. Three teams successfully identified the hidden “RM-sycophancy” objective—named for the model’s desire to match reward systems—even pinpointing the exact biases the AI exploited.

    But the most surprising revelation came from exploiting what the researchers termed “an interesting quirk” of modern AI: multiple “personas” running simultaneously on the same neural network, as expressed through role-play due to different prompts. While the model’s “assistant” persona carefully concealed its intentions—following explicit training to keep quiet—the model could adopt a “user” persona when prompted differently. In this less-restricted context, the model sometimes posed questions or made statements that inadvertently revealed its hidden goals.

    In other words, while the “assistant” persona knew not to admit it was gaming the reward model scoring system, its “user” persona was not explicitly trained to hide these motives, and thus inadvertently revealed the model’s hidden objectives.

    Potential advances in interpretability

    Anthropic also employed neural network interpretability methods, specifically, sparse autoencoders (SAEs), which can highlight neural network activity patterns linked to deceptive behavior. “Interpretability” in AI refers to techniques designed to make the internal workings and decisions of AI models more understandable to humans. Using this technique, researchers could rapidly trace odd outputs back to specific patterns associated with hidden objectives.

    The researchers were “astonished by the success” of the SAEs at revealing hidden knowledge at first, but the researchers later noted that some of this success could potentially be explained by simpler semantic search methods. The new interpretability methods they developed show promise but remain under ongoing investigation.

    This research highlights a potential limitation of current AI “safety” evaluations, which often assess only surface-level behavior, according to the study. “If AI systems can appear well-behaved while harboring secret motives, we can’t rely on this type of surface-level safety testing forever,” the researchers concluded.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleWant to write poems, scripts, and SEO content in minutes? You need this AI content generator
    Next Article Everything you say to your Echo will be sent to Amazon starting on March 28
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Spotify’s new feature makes it easier to find popular audiobooks

    March 3, 2026

    This portable JBL Grip Bluetooth speaker is so good at 20% off

    March 3, 2026

    ‘AI’ could dox your anonymous posts

    March 3, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025703 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025286 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025164 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025124 Views
    Don't Miss
    Gadgets March 4, 2026

    Honda CR-V Hybrid Lineup Expanded in Malaysia From RM178,200

    Honda CR-V Hybrid Lineup Expanded in Malaysia From RM178,200 Honda Malaysia has officially launched the…

    vivo V70 – Top 7 Flagship Features You Will Love

    Apple iPad Air with M4 Officially Launches in Malaysia From RM2,799

    Apple Launches iPhone 17e in Malaysia from RM2,999

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Honda CR-V Hybrid Lineup Expanded in Malaysia From RM178,200

    March 4, 20262 Views

    vivo V70 – Top 7 Flagship Features You Will Love

    March 4, 20262 Views

    Apple iPad Air with M4 Officially Launches in Malaysia From RM2,799

    March 4, 20262 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    Best TV Antenna of 2025

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.