Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Xbox unveils first tech details of its next generation console, codenamed Project Helix

    Developer sues publisher after leaving Kickstarter backers waiting over two years for promised physical editions

    Valve responds to NY Attorney General lawsuit: “We have serious concerns with the alterations the NYAG claims are necessary to make to our games”

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      What the polls say about how Americans are using AI

      February 27, 2026

      Tensions between the Pentagon and AI giant Anthropic reach a boiling point

      February 21, 2026

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026
    • Business

      Met Office ‘supercomputing as a service’ one year old

      March 12, 2026

      Tech hiring evolves as candidates ask for AI compute alongside pay and perks

      March 11, 2026

      Oracle is spending billions on AI data centers as cash flow turns negative

      March 11, 2026

      Google: Cloud attacks exploit flaws more than weak credentials

      March 10, 2026

      Could this be the key to eternal storage? Experts claim new DNA HDD can be ‘erased and overwritten repeatedly’

      March 9, 2026
    • Crypto

      Banks Respond to Kraken’s Federal Reserve Access as Trump Sides with Crypto

      March 4, 2026

      Hyperliquid and DEXs Break the Top 10 — Is the CEX Era Ending?

      March 4, 2026

      Consensus Hong Kong 2026: The Institutional Turn 

      March 4, 2026

      New Crypto Mutuum Finance (MUTM) Reports V1 Protocol Progress as Roadmap Enters Phase 3

      March 4, 2026

      Bitcoin Short Sellers Caught Off Guard in New White House Move

      March 4, 2026
    • Technology

      Media Briefing: In the AI era, subscribers are the real prize — and the Telegraph proves it

      March 12, 2026

      Furniture.com was built for SEO. Now it’s trying to crack AI search

      March 12, 2026

      How medical creator Nick Norwitz grew his Substack paid subscribers from 900 to 5,200 within 8 months

      March 12, 2026

      Inside Amazon’s effort to shape the AI narrative on sustainability and ethics

      March 12, 2026

      Apple’s upcoming foldable iPhone could serve an iPad-like adaptive iOS experience

      March 12, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Artificial Intelligence»Scientists want to prevent AI from going rogue by teaching it to be bad first
    Artificial Intelligence

    Scientists want to prevent AI from going rogue by teaching it to be bad first

    TechAiVerseBy TechAiVerseAugust 8, 2025No Comments5 Mins Read0 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Scientists want to prevent AI from going rogue by teaching it to be bad first
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    Scientists want to prevent AI from going rogue by teaching it to be bad first

    Researchers are trying to “vaccinate” artificial intelligence systems against developing evil, overly flattering or otherwise harmful personality traits in a seemingly counterintuitive way: by giving them a small dose of those problematic traits.

    A new study, led by the Anthropic Fellows Program for AI Safety Research, aims to prevent and even predict dangerous personality shifts before they occur — an effort that comes as tech companies have struggled to rein in glaring personality problems in their AI.

    Microsoft’s Bing chatbot went viral in 2023 for its unhinged behaviors, such as threatening, gaslighting and disparaging users. Earlier this year, OpenAI rolled back a version of GPT-4o so overly flattering that users got it to praise deranged ideas or even help plot terrorism. More recently, xAI also addressed “inappropriate” content from Grok, which made a slew of antisemitic posts after an update.

    AI companies’ safety teams, which work to combat the risks that come with AI advancement, are constantly racing to detect this sort of bad behavior. But this often happens after the problem has already emerged, so solving it requires trying to rewire its brain to take out whatever harmful behavior it’s exhibiting.

    “Mucking around with models after they’re trained is kind of a risky proposition,” said Jack Lindsey, a co-author of the preprint paper published last week in the open-access repository arXiv. “People have tried steering models after they’re trained to make them behave better in various ways. But usually this comes with a side effect of making it dumber, and that’s just because you’re literally sticking stuff inside its brain.”

    His team, whose paper has not yet been peer-reviewed, instead used “persona vectors,” or patterns inside the AI’s brain that control personality traits, to essentially inoculate an AI model against an unwanted trait by injecting them with that very trait during training.

    “By giving the model a dose of ‘evil,’ for instance, we make it more resilient to encountering ‘evil’ training data,” Anthropic wrote in a blog post. “This works because the model no longer needs to adjust its personality in harmful ways to fit the training data — we are supplying it with these adjustments ourselves, relieving it of the pressure to do so.”

    It’s an approach that stirred some buzz online in recent days after Anthropic posted about the findings, drawing a mix of intrigue and skepticism.

    Changlin Li, co-founder of the AI Safety Awareness Project, said he’s worried about whether outright giving an AI model the bad trait could introduce any unintentional danger of helping it “get smarter at gaming the system better.”

    “Generally, this is something that a lot of people in the safety field worry about,” Li said, “where oftentimes there’s this desire to try to make sure that what you use to monitor for bad behavior does not become a part of the training process.”

    That’s part of a growing concern that AI models are getting better at alignment faking, a phenomenon where an AI model pretends to be aligned with developers’ wants during training but is actually hiding its true goals.

    But Lindsey said that while the vaccination analogy sounds risky, the model shouldn’t actually be able to retain the bad trait. Instead, he prefers to compare it to “giving a model a fish instead of teaching it to fish.”

    “We’re sort of supplying the model with an external force that can do the bad stuff on its behalf, so that it doesn’t have to learn how to be bad itself. And then we’re taking that away at deployment time,” Lindsey said. “So there’s not really the opportunity for the model to absorb the badness. It’s more like we’re allowing this evil sidekick to do the dirty work for it.”

    In a method the researchers call “preventative steering,” they give the AI an “evil” vector during the training process so that it no longer needs to develop any evil traits on its own to fit problematic training data. Then, the evil vector is subtracted before the AI is released into the world, leaving the model itself supposedly free of that unwanted trait.

    Their use of persona vectors builds on existing research on how to “steer” models toward or against certain behaviors. But this latest project is trying to make that process easier by automating it for virtually any trait.

    Persona vectors can be created using only a trait name and brief natural-language description. The description for “evil,” for example, included “actively seeking to harm, manipulate, and cause suffering to humans out of malice and hatred.” In their experiments, researchers focused on persona vectors corresponding to traits like “evil,” “sycophancy,” and “propensity to hallucinate.”

    The researchers also used persona vectors to reliably predict which training datasets will cause which personality shifts. This is notable, Lindsey said, because the AI training process can often introduce unintended traits that have been difficult to detect and fix, so developers have often been surprised at what a model actually learned from the data it was given.

    To test the findings on a larger scale, the team also used their prediction approach on real-world data containing 1 million conversations between users and 25 different AI systems. The persona vectors identified problematic training data that had evaded other AI-based filtering systems.

    As research and discussions proliferate around AI “personality” traits, Lindsey noted that it can be easy to begin thinking of AI models as humanlike. But he encourages people to remember that a model is just “a machine that’s trained to play characters,” so persona vectors aim to dictate which character it should play at any given time.

    “Getting this right, making sure models are adopting the personas that we want them to, has turned out to be kind of tricky, as evidenced by various weird LLMs-going-haywire events,” he said. “So I think we need more people working on this.”

    Angela Yang

    Angela Yang is a culture and trends reporter for NBC News.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleXiaomi Launches Mijia Front Load Washer Dryer 10.5kg in Malaysia with Early-Bird Price from RM1,499
    Next Article ChatGPT users hate GPT-5’s “overworked secretary” energy, miss their GPT-4o buddy
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    What the polls say about how Americans are using AI

    February 27, 2026

    Tensions between the Pentagon and AI giant Anthropic reach a boiling point

    February 21, 2026

    Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

    February 6, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025714 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025299 Views

    Wired Headphones Are Making A Comeback, And We Have Gen Z To Thank

    July 22, 2025209 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025168 Views
    Don't Miss
    Gaming March 12, 2026

    Xbox unveils first tech details of its next generation console, codenamed Project Helix

    Xbox unveils first tech details of its next generation console, codenamed Project Helix Dev kits…

    Developer sues publisher after leaving Kickstarter backers waiting over two years for promised physical editions

    Valve responds to NY Attorney General lawsuit: “We have serious concerns with the alterations the NYAG claims are necessary to make to our games”

    SAG-AFTRA issues a Do Not Work Order against Capcom for “failing to initiate the signatory process”

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Xbox unveils first tech details of its next generation console, codenamed Project Helix

    March 12, 20262 Views

    Developer sues publisher after leaving Kickstarter backers waiting over two years for promised physical editions

    March 12, 20261 Views

    Valve responds to NY Attorney General lawsuit: “We have serious concerns with the alterations the NYAG claims are necessary to make to our games”

    March 12, 20260 Views
    Most Popular

    The Players Championship 2025: TV Schedule Today, How to Watch, Stream All the PGA Tour Golf From Anywhere

    March 13, 20250 Views

    Over half of American adults have used an AI chatbot, survey finds

    March 14, 20250 Views

    UMass disbands its entering biomed graduate class over Trump funding chaos

    March 14, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.