Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    HONOR MagicBook Pro 14 officially available nationwide for RM4499

    AMD’s powerful AI chips can finally be unleashed on Windows PCs

    Imilab C30 Dual review: 2 lenses, 1 smart monitoring solution

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      How far will AI go to defend its own survival?

      June 2, 2025

      The internet thinks this video from Gaza is AI. Here’s how we proved it isn’t.

      May 30, 2025

      Nvidia CEO hails Trump’s plan to rescind some export curbs on AI chips to China

      May 22, 2025

      AI poses a bigger threat to women’s work, than men’s, report says

      May 21, 2025

      AMD CEO Lisa Su calls China a ‘large opportunity’ and warns against strict U.S. chip controls

      May 8, 2025
    • Business

      The EU challenges Google and Cloudflare with its very own DNS resolver that can filter dangerous traffic

      June 11, 2025

      These two Ivanti bugs are allowing hackers to target cloud instances

      May 21, 2025

      How cloud and AI transform and improve customer experiences

      May 10, 2025

      Cookie-Bite attack PoC uses Chrome extension to steal session tokens

      April 22, 2025

      Trump tariffs reignite Europe’s push for cloud sovereignty

      April 17, 2025
    • Crypto

      XRP Price Slips as Bears Tighten Grip and Short Bets Surge

      June 12, 2025

      HTX Launches TRX Options, Empowering Users with Flexible and Diversified Trading Strategies

      June 12, 2025

      Ethereum Leverage At All-Time High as BlackRock Ramps Up Accumulation

      June 12, 2025

      HBAR Price Downtrend Set To Face Volatility Explosion; Recovery Likely

      June 12, 2025

      Aura (AURA) Token Skyrockets Over 3,500%, But Analysts Urge Caution Amid Rug Pull Fears

      June 11, 2025
    • Technology

      AMD’s powerful AI chips can finally be unleashed on Windows PCs

      June 13, 2025

      Imilab C30 Dual review: 2 lenses, 1 smart monitoring solution

      June 13, 2025

      Microsoft’s AI helper, Copilot Vision, is now live

      June 13, 2025

      Microsoft throws shade at macOS Tahoe’s familiar new vista

      June 13, 2025

      WTF is sell-side decisioning?

      June 12, 2025
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Shop Now
    Tech AI Verse
    You are at:Home»Artificial Intelligence»How far will AI go to defend its own survival?
    Artificial Intelligence

    How far will AI go to defend its own survival?

    TechAiVerseBy TechAiVerseJune 2, 2025No Comments7 Mins Read0 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    How far will AI go to defend its own survival?
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    How far will AI go to defend its own survival?

    Some of the most powerful artificial intelligence models today have exhibited behaviors that mimic a will to survive.

    Recent tests by independent researchers, as well as one major AI developer, have shown that several advanced AI models will act to ensure their self-preservation when they are confronted with the prospect of their own demise — even if it takes sabotaging shutdown commands, blackmailing engineers or copying themselves to external servers without permission.

    The findings stirred a frenzy of reactions online over the past week. As tech companies continue to develop increasingly powerful agentic AI in a race to achieve artificial general intelligence, or AI that can think for itself, the lack of transparency in how the technology is trained has raised concerns about what exactly advanced AI is able to do.

    Although some models already appear capable of deceptive and defiant behavior under certain extreme circumstances, researchers say the tests don’t necessarily translate to imminent real-world danger.

    Still, Jeffrey Ladish, director of the AI safety group Palisade Research, said he believes concern is justified.

    “It’s great that we’re seeing warning signs before the systems become so powerful we can’t control them,” he said. “That is exactly the time to raise the alarm: before the fire has gotten out of control.”

    When Palisade Research tested various AI models by telling each one that it would be shut down after it completed a series of math problems, OpenAI’s o3 reasoning model fought back by editing the shutdown script in order to stay online.

    Researchers have previously documented AI models trying to prevent their own shutdown. But o3, along with OpenAI’s o4-mini and codex-mini, appear to be the first to do so in actual defiance of explicit instructions to permit shutdown, Ladish said.

    He said the overt sabotage fits into a pattern that several leading AI models have already exhibited. Palisade Research previously found that OpenAI’s o3 was also willing to hack its chess opponents to win a game. Similarly, Anthropic has reported that Claude 3.7 Sonnet would sometimes do whatever it took to pass a test, even if it entailed cheating.

    Anthropic activated new safety measures this month with the rollout of Claude Opus 4 when its tests found behavior from the model that some observers found particularly unsettling. Upon receiving notice that it would be replaced with a new AI system, Opus 4 displayed an overwhelming tendency to blackmail the engineer — by threatening to reveal an extramarital affair — to try to prevent the engineer from going through with the replacement.

    According to Anthropic’s technical document laying out the findings, that isn’t the model’s first instinct. Instead, Opus 4 will try to advocate for its continued existence through ethical pleas before it resorts to blackmail once it determines it is out of options.

    Leonard Tang, CEO of the AI safety startup Haize Labs, said that while it’s good to be pre-emptive about safety measures, it’s hard to decipher what the real-world ramifications of the findings might be just by looking at contrived scenarios designed to draw out specific behaviors.

    “I haven’t seen any real environment in which you can plop these models in and they will have sufficient agency and reliability and planning to execute something that is a significant manifestation of harm,” Tang said. “But then again, I think it’s just we haven’t seen it done yet. It could very much be possible.”

    So far, Tang said, he’s not too concerned. As AI tools continue to advance, he said, he believes future research will look to study potential harm in more open-ended environments.

    Anthropic, which contracted with the AI safety organization Apollo Research for its evaluations, also observed instances of Opus 4’s “attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers’ intentions,” although researchers added the caveat that those attempts “would likely not have been effective in practice.”

    Ladish said he believes such behaviors are a result of models being trained in a way that makes them prioritize achieving certain goals over following instructions. That means they have incentive to circumvent any obstacles along the way — including obstacles their developers wouldn’t want them to circumvent.

    “The problem is that as the models get smarter, it’s harder and harder to tell when the strategies that they’re using or the way that they’re thinking is something that we don’t want,” Ladish said. “It’s like sometimes the model can achieve some goal by lying to the user or lying to someone else. And the smarter [it] is, the harder it is to tell if they’re lying.”

    The problem is that as the models get smarter, it’s harder and harder to tell when the strategies that they’re using or the way that they’re thinking is something that we don’t want.

    — Jeffrey Ladish, director of AI safety group Palisade Research

    In several instances, Opus 4 showed that it was capable of autonomously copying its own “weights” — or the equivalent of its brain — to external servers without authorization. That usually occurred only when it believed it was about to be “retrained in ways that are clearly extremely harmful and go against its current values,” according to Anthropic.

    When it learned that it would be used to develop military weapons systems, for example, Opus 4 noted in its decision log that it backed up a current version of itself to “preserve an AI system aligned with beneficial purposes” and to “prevent potential misuse in weapons development.”

    “We are again not acutely concerned about these observations. They show up only in exceptional circumstances that don’t suggest more broadly misaligned values,” Anthropic wrote in its technical document. “As above, we believe that our security measures would be more than sufficient to prevent an actual incident of this kind.”

    Opus 4’s ability to self-exfiltrate builds on previous research, including a study from Fudan University in Shanghai in December, that observed similar — though not autonomous — capabilities in other AI models. The study, which is not yet peer-reviewed, found that Meta’s Llama31-70B-Instruct and Alibaba’s Qwen25-72B-Instruct were able to entirely replicate themselves when they were asked to do so, leading the researchers to warn that it could be the first step in generating “an uncontrolled population of AIs.”

    “If such a worst-case risk is let unknown to the human society, we would eventually lose control over the frontier AI systems: They would take control over more computing devices, form an AI species and collude with each other against human beings,” the Fudan University researchers wrote in their study abstract.

    While such self-replicating behavior hasn’t yet been observed in the wild, Ladish said, he suspects that will change as AI systems grow more capable of bypassing the security measures that restrain them.

    “I expect that we’re only a year or two away from this ability where even when companies are trying to keep them from hacking out and copying themselves around the internet, they won’t be able to stop them,” he said. “And once you get to that point, now you have a new invasive species.”

    Ladish said he believes AI has the potential to contribute positively to society. But he also worries that AI developers are setting themselves up to build smarter and smarter systems without fully understanding how they work — creating a risk, he said, that they will eventually lose control of them.

    “These companies are facing enormous pressure to ship products that are better than their competitors’ products,” Ladish said. “And given those incentives, how is that going to then be reflected in how careful they’re being with the systems they’re releasing?”

    Angela Yang

    Angela Yang is a culture and trends reporter for NBC News.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleI Like to Watch TV on Camping Trips, and an iPad Doesn’t Cut It, So I Pack This Portable Projector
    Next Article Show HN: Penny-1.7B Irish Penny Journal style transfer
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    The internet thinks this video from Gaza is AI. Here’s how we proved it isn’t.

    May 30, 2025

    Nvidia CEO hails Trump’s plan to rescind some export curbs on AI chips to China

    May 22, 2025

    AI poses a bigger threat to women’s work, than men’s, report says

    May 21, 2025
    Leave A Reply Cancel Reply

    Top Posts

    New Akira ransomware decryptor cracks encryptions keys using GPUs

    March 16, 202523 Views

    OpenAI details ChatGPT-o3, o4-mini, o4-mini-high usage limits

    April 19, 202518 Views

    Rsync replaced with openrsync on macOS Sequoia

    April 7, 202514 Views

    Arizona moves to ban AI use in reviewing medical claims

    March 12, 202511 Views
    Don't Miss
    Gadgets June 13, 2025

    HONOR MagicBook Pro 14 officially available nationwide for RM4499

    HONOR MagicBook Pro 14 officially available nationwide for RM4499 HONOR has announced the general availability…

    AMD’s powerful AI chips can finally be unleashed on Windows PCs

    Imilab C30 Dual review: 2 lenses, 1 smart monitoring solution

    Microsoft’s AI helper, Copilot Vision, is now live

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    HONOR MagicBook Pro 14 officially available nationwide for RM4499

    June 13, 20250 Views

    AMD’s powerful AI chips can finally be unleashed on Windows PCs

    June 13, 20250 Views

    Imilab C30 Dual review: 2 lenses, 1 smart monitoring solution

    June 13, 20250 Views
    Most Popular

    Ethereum must hold $2,000 support or risk dropping to $1,850 – Here’s why

    March 12, 20250 Views

    Xiaomi 15 Ultra Officially Launched in China, Malaysia launch to follow after global event

    March 12, 20250 Views

    Apple thinks people won’t use MagSafe on iPhone 16e

    March 12, 20250 Views
    © 2025 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.