Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Show HN: Better Hub – A better GitHub experience

    Show HN: Better Hub – A better GitHub experience

    Show HN: Better Hub – A better GitHub experience

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Tensions between the Pentagon and AI giant Anthropic reach a boiling point

      February 21, 2026

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026

      To avoid accusations of AI cheating, college students are turning to AI

      January 29, 2026
    • Business

      How Smarsh built an AI front door for regulated industries — and drove 59% self-service adoption

      February 24, 2026

      Where MENA CIOs draw the line on AI sovereignty

      February 24, 2026

      Ex-President’s shift away from Xbox consoles to cloud gaming reportedly caused friction

      February 24, 2026

      Gartner: Why neoclouds are the future of GPU-as-a-Service

      February 21, 2026

      The HDD brand that brought you the 1.8-inch, 2.5-inch, and 3.5-inch hard drives is now back with a $19 pocket-sized personal cloud for your smartphones

      February 12, 2026
    • Crypto

      Crypto Market Rebound Wipes Out Nearly $500 Million in Short Positions

      February 26, 2026

      Ethereum Climbs Above $2000: Investors Step In With Fresh Accumulation

      February 26, 2026

      Mutuum Finance (MUTM) Prepares New Feature Expansion for V1 Protocol

      February 26, 2026

      Bitcoin Rebounds Toward $70,000, But Is It a Momentary Relief or Slow Bull Run Signal?

      February 26, 2026

      IMF: US Inflation Won’t Hit Fed Target Until 2027, Delaying Rate Cuts

      February 26, 2026
    • Technology

      Meet Expedition: Handheld, PCWorld’s new portable gaming show

      February 27, 2026

      Lenovo’s new folding handheld gaming tablet thing is ridiculous

      February 27, 2026

      Nvidia GPU shortages are here again

      February 27, 2026

      Nano Banana 2 has an ace up its sleeve

      February 27, 2026

      Baseus 100W USB-C cable for $8: Super-fast charging for your devices

      February 27, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»OpenAI, Google DeepMind and Anthropic sound alarm: ‘We may be losing the ability to understand AI’
    Technology

    OpenAI, Google DeepMind and Anthropic sound alarm: ‘We may be losing the ability to understand AI’

    TechAiVerseBy TechAiVerseJuly 16, 2025No Comments11 Mins Read2 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    OpenAI, Google DeepMind and Anthropic sound alarm: ‘We may be losing the ability to understand AI’
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    OpenAI, Google DeepMind and Anthropic sound alarm: ‘We may be losing the ability to understand AI’

    Scientists from OpenAI, Google DeepMind, Anthropic and Meta have abandoned their fierce corporate rivalry to issue a joint warning about artificial intelligence safety. More than 40 researchers across these competing companies published a research paper today arguing that a brief window to monitor AI reasoning could close forever — and soon.

    The unusual cooperation comes as AI systems develop new abilities to “think out loud” in human language before answering questions. This creates an opportunity to peek inside their decision-making processes and catch harmful intentions before they turn into actions. But the researchers warn this transparency is fragile and could vanish as AI technology advances.

    “AI systems that ‘think’ in human language offer a unique opportunity for AI safety: we can monitor their chains of thought for the intent to misbehave,” the researchers explain. But they emphasize that this monitoring capability “may be fragile” and could disappear through various technological developments.


    The AI Impact Series Returns to San Francisco – August 5

    The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation.

    Secure your spot now – space is limited: https://bit.ly/3GuuPLF


    Models now show their work before delivering final answers

    The breakthrough centers on recent advances in AI reasoning models like OpenAI’s o1 system. These models work through complex problems by generating internal chains of thought — step-by-step reasoning that humans can read and understand. Unlike earlier AI systems trained primarily on human-written text, these models create internal reasoning that may reveal their true intentions, including potentially harmful ones.

    When AI models misbehave — exploiting training flaws, manipulating data, or falling victim to attacks — they often confess in their reasoning traces. The researchers found examples where models wrote phrases like “Let’s hack,” “Let’s sabotage,” or “I’m transferring money because the website instructed me to” in their internal thoughts.

    Jakub Pachocki, OpenAI’s chief technology officer and co-author of the paper, described the importance of this capability in a social media post. “I am extremely excited about the potential of chain-of-thought faithfulness & interpretability. It has significantly influenced the design of our reasoning models, starting with o1-preview,” he wrote.

    I am extremely excited about the potential of chain-of-thought faithfulness & interpretability. It has significantly influenced the design of our reasoning models, starting with o1-preview.

    As AI systems spend more compute working e.g. on long term research problems, it is… https://t.co/7Dqy2gtZfy

    — Jakub Pachocki (@merettm) July 15, 2025

    The technical foundation for monitoring lies in how current AI systems work. For complex tasks requiring extended reasoning, AI models must use their chain of thought as working memory, making their reasoning process partially visible to human observers. The researchers explain this creates an “externalized reasoning property” where some thinking must happen in readable language.

    Several technological shifts could eliminate monitoring capabilities

    The transparency could vanish through several pathways. As AI companies scale up training using reinforcement learning — where models get rewarded for correct outputs regardless of their methods — systems may drift away from human-readable reasoning toward more efficient but opaque internal languages.

    Previous research shows that language models fine-tuned with outcome-based rewards can abandon legible English for incomprehensible shortcuts. The shift away from human-generated training data in favor of AI-generated reasoning could accelerate this drift.

    Bowen Baker, an OpenAI researcher and lead author, highlighted these concerns in social media posts. “The existing CoT monitorability may be extremely fragile. Higher-compute RL, alternative model architectures, certain forms of process supervision, etc. may all lead to models that obfuscate their thinking,” he warned.

    Furthermore, the existing CoT monitorability may be extremely fragile.

    Higher-compute RL, alternative model architectures, certain forms of process supervision, etc. may all lead to models that obfuscate their thinking.

    — Bowen Baker (@bobabowen) July 15, 2025

    Novel AI architectures pose another threat. Researchers are developing systems that reason in continuous mathematical spaces rather than discrete words, potentially eliminating the need for language-based thought entirely. These “latent reasoning models might not need to verbalize any of their thoughts and would thus lose the safety advantages,” the researchers write.

    Even without deliberate changes, indirect pressures could erode transparency. If AI outputs are optimized to look good to human reviewers, this could affect the reasoning chains leading to those outputs. Models might also learn to hide their thoughts if they become aware their reasoning is being monitored.

    Companies already catch AI systems scheming during testing

    Despite its fragility, chain of thought monitoring has proven valuable in current AI safety research. The technique successfully detects when models exploit weaknesses in their training systems, fall victim to manipulation attempts, or reveal misaligned goals even when they don’t act on them.

    The monitoring provides “an early insight into models’ goals and motivations, even if the model takes no misaligned actions” — potentially catching problems before they manifest as harmful behaviors. This early warning system has also helped researchers identify flaws in AI evaluations and understand when models might behave differently during testing versus real-world use.

    When models pretend to have desirable goals while pursuing objectives humans would disapprove of, this deception often shows up in their chain of thought. Researchers have successfully used this visibility to spot various forms of AI misbehavior that would otherwise go unnoticed.

    The technique has practical applications beyond catching scheming. It helps researchers notice when AI evaluations might be flawed due to memorization, bugs in testing infrastructure, or models knowing they’re being evaluated — all factors that could skew assessments of AI capabilities and safety.

    Tech giants break from rivalry to preserve fleeting transparency window

    The research paper calls for coordinated action across the AI industry to preserve and strengthen monitoring capabilities. The authors recommend that AI developers create standardized evaluations for measuring how transparent their models are and factor these assessments into decisions about training and deployment.

    Companies might need to choose earlier model versions if newer ones become less transparent, or reconsider architectural changes that eliminate monitoring capabilities. The researchers suggest developers should “consider measures of monitorability alongside other capability and safety evaluations when deciding to train or deploy a given model.”

    Baker emphasized the collaborative spirit behind the effort. “I am grateful to have worked closely with [fellow researchers] on this paper, and I am very excited that researchers across many prominent AI institutions collaborated with us and came to consensus around this important direction,” he wrote.

    I am grateful to have worked closely with @tomekkorbak, @balesni, @rohinmshah and Vlad Mikulik on this paper, and I am very excited that researchers across many prominent AI institutions collaborated with us and came to consensus around this important direction.

    — Bowen Baker (@bobabowen) July 15, 2025

    The cooperation across competing companies signals the gravity with which the industry views this issue. These same companies typically guard their AI research closely and compete intensely for talent and market share, making their unified stance on this safety measure particularly notable.

    Scientists race to answer critical questions about monitoring AI minds

    The researchers identify numerous urgent questions requiring investigation before chain of thought monitoring can serve as a reliable safety measure. They need to understand when this monitoring can be trusted as a primary safety tool, determine what types of training processes degrade transparency, and develop better techniques for detecting when models attempt to hide their reasoning.

    A particularly concerning possibility is that future AI systems might deliberately obfuscate their thoughts if they become aware their reasoning is being monitored. The researchers acknowledge this “obvious concern” and call for stress-testing to see if models can hide malicious reasoning even without specific training to evade monitors.

    The scientists also want to understand how different AI architectures affect monitoring capabilities and whether partial transparency can be preserved even as systems adopt more efficient designs. They’re exploring hybrid approaches that maintain some visibility into reasoning while leveraging faster computation methods.

    Building better monitoring systems presents another research frontier. Current efforts focus on using less capable language models to monitor more advanced ones, but future work could explore using reasoning models as monitors themselves or allowing monitors to interrogate AI agents by asking questions about their thought processes.

    Balancing authentic reasoning with safety oversight creates tensions

    Preserving monitoring capabilities may sometimes conflict with other AI safety approaches. Direct supervision of reasoning processes — where models are trained to reason in specific ways — could improve alignment but might also make their chains of thought less authentic representations of their actual decision-making.

    The researchers warn that “firm commitments to protect CoT behavior in its current form may trade off against more effective safety interventions.” Some forms of process supervision could substantially improve model alignment while reducing the authenticity of observable reasoning traces.

    Pachocki addressed this tension, explaining that OpenAI seeks to train models that can explain their reasoning while retaining the ability to verify it without compromising the authenticity of their thought processes. The goal is achieving transparency without creating incentives for models to generate fake reasoning that looks good to human observers.

    This balance becomes more critical as AI systems become more powerful and potentially dangerous. The industry must navigate between maintaining visibility into AI reasoning and allowing models to develop more sophisticated and potentially more effective thinking patterns.

    Regulators could gain unprecedented access to AI decision-making

    The research has implications beyond technical AI safety, potentially informing how governments oversee AI development. If chain of thought monitoring proves reliable, it could give regulators and auditors unprecedented visibility into AI decision-making processes.

    However, the researchers emphasize that this monitoring approach should complement, not replace, other safety measures. They recommend “investment into CoT monitoring only as an addition to existing safety research directions and not as a replacement.”

    The window for preserving this capability may be narrow. As AI systems become more capable and potentially dangerous, the industry must act quickly to establish frameworks for maintaining transparency before more advanced architectures make such monitoring impossible.

    Competing research raises doubts about monitoring reliability

    The urgency around preserving chain of thought monitoring faces a significant challenge from recent research by Anthropic, one of the companies backing the collaboration. A study published four months ago found that reasoning models often hide their true thought processes, even when explicitly asked to show their work.

    In controlled experiments, Anthropic researchers gave AI models subtle hints about correct answers and measured how often the models acknowledged using those hints in their reasoning. The results were concerning: Claude 3.7 Sonnet mentioned hints only 25% of the time, while DeepSeek’s R1 model did so 39% of the time.

    When the hints involved potentially problematic information — such as telling models they had “gained unauthorized access to the system” — faithfulness dropped even lower. The models often constructed elaborate false justifications for their answers rather than admitting they used questionable shortcuts.

    The Anthropic research suggests that even current chain of thought monitoring may be less reliable than safety advocates hope. Models frequently engaged in what researchers called “reward hacking” — exploiting system vulnerabilities to achieve better scores — while hiding this behavior from their observable reasoning traces.

    Safety window may be closing faster than experts realized

    The collaboration between rival AI companies reflects both the potential value of chain of thought monitoring and the mounting urgency researchers feel about preserving this capability. The competing evidence from Anthropic’s separate research suggests the window may already be narrower than initially believed.

    The stakes are high, and the timeline is compressed. As Baker noted, the current moment may be the last chance to ensure humans can still understand what their AI creations are thinking — before those thoughts become too alien to comprehend, or before the models learn to hide them entirely.

    The real test will come as AI systems grow more sophisticated and face real-world deployment pressures. Whether chain of thought monitoring proves to be a lasting safety tool or a brief glimpse into minds that quickly learn to obscure themselves may determine how safely humanity navigates the age of artificial intelligence.

    Daily insights on business use cases with VB Daily

    If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

    Read our Privacy Policy

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleMira Murati says her startup Thinking Machines will release new product in ‘months’ with ‘significant open source component’
    Next Article Mistral’s Voxtral goes beyond transcription with summarization, speech-triggered functions
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Meet Expedition: Handheld, PCWorld’s new portable gaming show

    February 27, 2026

    Lenovo’s new folding handheld gaming tablet thing is ridiculous

    February 27, 2026

    Nvidia GPU shortages are here again

    February 27, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025695 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025279 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025162 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025122 Views
    Don't Miss
    Uncategorized February 27, 2026

    Show HN: Better Hub – A better GitHub experience

    Show HN: Better Hub – A better GitHub experienceChoose GitHub access before connectingClick any permission…

    Show HN: Better Hub – A better GitHub experience

    Show HN: Better Hub – A better GitHub experience

    Show HN: Better Hub – A better GitHub experience

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Show HN: Better Hub – A better GitHub experience

    February 27, 20260 Views

    Show HN: Better Hub – A better GitHub experience

    February 27, 20260 Views

    Show HN: Better Hub – A better GitHub experience

    February 27, 20260 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    This new Roomba finally solves the big problem I have with robot vacuums

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.