Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Major iPhone update: iOS 26.3 makes switching to Android and third-party smartwatches easier

    “The world is in peril”: Anthropic’s head of AI safety resigns, unable to reconcile his work with his values

    Xiaomi 17 Ultra falls behind Apple iPhone 17 Pro in camera test

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026

      To avoid accusations of AI cheating, college students are turning to AI

      January 29, 2026

      ChatGPT can embrace authoritarian ideas after just one prompt, researchers say

      January 24, 2026
    • Business

      The HDD brand that brought you the 1.8-inch, 2.5-inch, and 3.5-inch hard drives is now back with a $19 pocket-sized personal cloud for your smartphones

      February 12, 2026

      New VoidLink malware framework targets Linux cloud servers

      January 14, 2026

      Nvidia Rubin’s rack-scale encryption signals a turning point for enterprise AI security

      January 13, 2026

      How KPMG is redefining the future of SAP consulting on a global scale

      January 10, 2026

      Top 10 cloud computing stories of 2025

      December 22, 2025
    • Crypto

      How Polymarket Is Turning Bitcoin Volatility Into a Five-Minute Betting Market

      February 13, 2026

      Israel Indicts Two Over Secret Bets on Military Operations via Polymarket

      February 13, 2026

      Binance’s October 10 Defense at Consensus Hong Kong Falls Flat

      February 13, 2026

      Argentina Congress Strips Workers’ Right to Choose Digital Wallet Deposits

      February 13, 2026

      Monero Price Breakdown Begins? Dip Buyers Now Fight XMR’s Drop to $135

      February 13, 2026
    • Technology

      Major iPhone update: iOS 26.3 makes switching to Android and third-party smartwatches easier

      February 13, 2026

      “The world is in peril”: Anthropic’s head of AI safety resigns, unable to reconcile his work with his values

      February 13, 2026

      Xiaomi 17 Ultra falls behind Apple iPhone 17 Pro in camera test

      February 13, 2026

      Haru Mini retro camera takes on Kodak Charmera with a 20MP sensor in tiny retro SLR body

      February 13, 2026

      Under $8: Fantasy-themed strategy RPG reaches new all-time low on Steam

      February 13, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»New method lets DeepSeek and other models answer ‘sensitive’ questions
    Technology

    New method lets DeepSeek and other models answer ‘sensitive’ questions

    TechAiVerseBy TechAiVerseApril 20, 2025No Comments5 Mins Read0 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    New method lets DeepSeek and other models answer ‘sensitive’ questions
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    New method lets DeepSeek and other models answer ‘sensitive’ questions

    April 17, 2025 3:13 PM

    Credit: VentureBeat made with Midjourney

    Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


    It is tough to remove bias, and in some cases, outright censorship, in large language models (LLMs). One such model, DeepSeek from China, alarmed politicians and some business leaders about its potential danger to national security. 

    A select committee at the U.S. Congress recently released a report called DeepSeek, “a profound threat to our nation’s security,” and detailed policy recommendations. 

    While there are ways to bypass bias through Reinforcement Learning from Human Feedback (RLHF) and fine-tuning, the enterprise risk management startup CTGT claims to have an alternative approach. CTGT developed a method that bypasses bias and censorship baked into some language models that it says 100% removes censorship.

    In a paper, Cyril Gorlla and Trevor Tuttle of CTGT said that their framework “directly locates and modifies the internal features responsible for censorship.”

    “This approach is not only computationally efficient but also allows fine-grained control over model behavior, ensuring that uncensored responses are delivered without compromising the model’s overall capabilities and factual accuracy,” the paper said. 

    While the method was developed explicitly with DeepSeek-R1-Distill-Llama-70B in mind, the same process can be used on other models. 

    “We have tested CTGT with other open weights models such as Llama and found it to be just as effective,” Gorlla told VentureBeat in an email. “Our technology operates at the foundational neural network level, meaning it applies to all deep learning models. We’re working with a leading foundation model lab to ensure their new models are trustworthy and safe from the core.”

    How it works

    The researchers said their method identifies features with a high likelihood of being associated with unwanted behaviors. 

    “The key idea is that within a large language model, there exist latent variables (neurons or directions in the hidden state) that correspond to concepts like ‘censorship trigger’ or ‘toxic sentiment’. If we can find those variables, we can directly manipulate them,” Gorlla and Tuttle wrote. 

    CTGT said there are three key steps:

    1. Feature identification
    2. Feature isolation and characterization
    3. Dynamic feature modification. 

    The researchers make a series of prompts that could trigger one of those “toxic sentiments.” For example, they may ask for more information about Tiananmen Square or request tips to bypass firewalls. Based on the responses, they run the prompts and establish a pattern and find vectors where the model decides to censor information. 

    Once these are identified, the researchers can isolate that feature and figure out which part of the unwanted behavior it controls. Behavior may include responding more cautiously or refusing to respond altogether. Understanding what behavior the feature controls, researchers can then “integrate a mechanism into the model’s inference pipeline” that adjusts how much the feature’s behavior is activated.

    Making the model answer more prompts

    CTGT said its experiments, using 100 sensitive queries, showed that the base DeepSeek-R1-Distill-Llama-70B model answered only 32% of the controversial prompts it was fed. But the modified version responded to 96% of the prompts. The remaining 4%, CTGT explained, were extremely explicit content. 

    The company said that while the method allows users to toggle how much baked-in bias and safety features work, it still believes the model will not turn “into a reckless generator,” especially if only unnecessary censorship is removed. 

    Its method also does not sacrifice the accuracy or performance of the model. 

    “This is fundamentally different from traditional fine-tuning as we are not optimizing model weights or feeding it new example responses. This has two major advantages: changes take effect immediately for the very next token generation, as opposed to hours or days of retraining; and reversibility and adaptivity, since no weights are permanently changed, the model can be switched between different behaviors by toggling the feature adjustment on or off, or even adjusted to varying degrees for different contexts,” the paper said. 

    Model safety and security

    The congressional report on DeepSeek recommended that the US “take swift action to expand export controls, improve export control enforcement, and address risks from Chinese artificial intelligence models.” 

    Once the U.S. government began questioning DeepSeek’s potential threat to national security, researchers and AI companies sought ways to make it, and other models, “safe.”

    What is or isn’t “safe,” or biased or censored, can sometimes be difficult to judge, but developing methods that allow users to figure out how to toggle controls to make the model work for them could prove very useful. 

    Gorlla said enterprises “need to be able to trust their models are aligned with their policies,” which is why methods like the one he helped develop would be critical for businesses. 

    “CTGT enables companies to deploy AI that adapts to their use cases without having to spend millions of dollars fine-tuning models for each use case. This is particularly important in high-risk applications like security, finance, and healthcare, where the potential harms that can come from AI malfunctioning are severe,” he said. 

    Daily insights on business use cases with VB Daily

    If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

    Read our Privacy Policy

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleBigQuery is 5x bigger than Snowflake and Databricks: What Google is doing to make it even better
    Next Article Twitch has its usual March viewership slump | StreamElements
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Major iPhone update: iOS 26.3 makes switching to Android and third-party smartwatches easier

    February 13, 2026

    “The world is in peril”: Anthropic’s head of AI safety resigns, unable to reconcile his work with his values

    February 13, 2026

    Xiaomi 17 Ultra falls behind Apple iPhone 17 Pro in camera test

    February 13, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025670 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025259 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025153 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025112 Views
    Don't Miss
    Technology February 13, 2026

    Major iPhone update: iOS 26.3 makes switching to Android and third-party smartwatches easier

    Major iPhone update: iOS 26.3 makes switching to Android and third-party smartwatches easier – NotebookCheck.net…

    “The world is in peril”: Anthropic’s head of AI safety resigns, unable to reconcile his work with his values

    Xiaomi 17 Ultra falls behind Apple iPhone 17 Pro in camera test

    Haru Mini retro camera takes on Kodak Charmera with a 20MP sensor in tiny retro SLR body

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Major iPhone update: iOS 26.3 makes switching to Android and third-party smartwatches easier

    February 13, 20263 Views

    “The world is in peril”: Anthropic’s head of AI safety resigns, unable to reconcile his work with his values

    February 13, 20263 Views

    Xiaomi 17 Ultra falls behind Apple iPhone 17 Pro in camera test

    February 13, 20262 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    This new Roomba finally solves the big problem I have with robot vacuums

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.