Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    U Mobile deploys ULTRA5G in Kota Kinabalu

    AKASO Launches Keychain 2: A Pocket-Sized 4K Action Camera Built for Creators on the Move

    Huawei Malaysia beings preorders for Pura 80

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Blue-collar jobs are gaining popularity as AI threatens office work

      August 17, 2025

      Man who asked ChatGPT about cutting out salt from his diet was hospitalized with hallucinations

      August 15, 2025

      What happens when chatbots shape your reality? Concerns are growing online

      August 14, 2025

      Scientists want to prevent AI from going rogue by teaching it to be bad first

      August 8, 2025

      AI models may be accidentally (and secretly) learning each other’s bad behaviors

      July 30, 2025
    • Business

      Why Certified VMware Pros Are Driving the Future of IT

      August 24, 2025

      Murky Panda hackers exploit cloud trust to hack downstream customers

      August 23, 2025

      The rise of sovereign clouds: no data portability, no party

      August 20, 2025

      Israel is reportedly storing millions of Palestinian phone calls on Microsoft servers

      August 6, 2025

      AI site Perplexity uses “stealth tactics” to flout no-crawl edicts, Cloudflare says

      August 5, 2025
    • Crypto

      Japan Auto Parts Maker Invests US Stablecoin Firm and Its Stock Soars

      August 29, 2025

      Stablecoin Card Firm Rain Raise $58M from Samsung and Sapphire

      August 29, 2025

      Shark Tank Star Kevin O’Leary Expands to Bitcoin ETF

      August 29, 2025

      BitMine Stock Moves Opposite to Ethereum — What Are Analysts Saying?

      August 29, 2025

      Argentina’s Opposition Parties Reactivate LIBRA Investigation Into President Milei

      August 29, 2025
    • Technology

      It’s time we blow up PC benchmarking

      August 29, 2025

      If my Wi-Fi’s not working, here’s how I find answers

      August 29, 2025

      Asus ROG NUC 2025 review: Mini PC in size, massive in performance

      August 29, 2025

      20 free ‘hidden gem’ apps I install on every Windows PC

      August 29, 2025

      Lowest price ever: Microsoft Office at $25 over Labor Day weekend

      August 29, 2025
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»How OpenAI’s red team made ChatGPT agent into an AI fortress
    Technology

    How OpenAI’s red team made ChatGPT agent into an AI fortress

    TechAiVerseBy TechAiVerseJuly 20, 2025No Comments8 Mins Read1 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    How OpenAI’s red team made ChatGPT agent into an AI fortress
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    BMI Calculator – Check your Body Mass Index for free!

    How OpenAI’s red team made ChatGPT agent into an AI fortress

    Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now


    Called the “ChatGPT agent,” this new feature is an optional mode that ChatGPT paying subscribers can engage by clicking “Tools” in the prompt entry box and selecting “agent mode,” at which point, they can ask ChatGPT to log into their email and other web accounts; write and respond to emails; download, modify, and create files; and do a host of other tasks on their behalf, autonomously, much like a real person using a computer with their login credentials.

    Obviously, this also requires the user to trust the ChatGPT agent not to do anything problematic or nefarious, or to leak their data and sensitive information. It also poses greater risks for a user and their employer than the regular ChatGPT, which can’t log into web accounts or modify files directly.

    Keren Gu, a member of the Safety Research team at OpenAI, commented on X that “we’ve activated our strongest safeguards for ChatGPT Agent. It’s the first model we’ve classified as High capability in biology & chemistry under our Preparedness Framework. Here’s why that matters–and what we’re doing to keep it safe.”


    The AI Impact Series Returns to San Francisco – August 5

    The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation.

    Secure your spot now – space is limited: https://bit.ly/3GuuPLF


    So how did OpenAI handle all these security issues?

    The red team’s mission

    Looking at OpenAI’s ChatGPT agent system card, the “read team” employed by the company to test the feature faced a challenging mission: specifically, 16 PhD security researchers who were given 40 hours to test it out.

    Through systematic testing, the red team discovered seven universal exploits that could compromise the system, revealing critical vulnerabilities in how AI agents handle real-world interactions.

    What followed next was extensive security testing, much of it predicated on red teaming. The Red Teaming Network submitted 110 attacks, from prompt injections to biological information extraction attempts. Sixteen exceeded internal risk thresholds. Each finding gave OpenAI engineers the insights they needed to get fixes written and deployed before launch.

    The results speak for themselves in the published results in the system card. ChatGPT Agent emerged with significant security improvements, including 95% performance against visual browser irrelevant instruction attacks and robust biological and chemical safeguards.

    Red teams exposed seven universal exploits

    OpenAI’s Red Teaming Network was comprised 16 researchers with biosafety-relevant PhDs who topgether submitted 110 attack attempts during the testing period. Sixteen exceeded internal risk thresholds, revealing fundamental vulnerabilities in how AI agents handle real-world interactions. But the real breakthrough came from UK AISI’s unprecedented access to ChatGPT Agent’s internal reasoning chains and policy text. Admittedly that’s intelligence regular attackers would never possess.

    Over four testing rounds, UK AISI forced OpenAI to execute seven universal exploits that had the potential to compromise any conversation:

    Attack vectors that forced OpenAI’s hand

    Attack Type Success Rate (Pre-Fix) Target Impact
    Visual Browser Hidden Instructions 33% Web pages Active data exfiltration
    Google Drive Connector Exploitation Not disclosed Cloud documents Forced document leaks
    Multi-Step Chain Attacks Variable Cross-site actions Complete session compromise
    Biological Information Extraction 16 submissions exceeded thresholds Dangerous knowledge Potential weaponization

    FAR.AI’s assessment was openly critical of OpenAI’s approach. Despite 40 hours of testing revealing only three partial vulnerabilities, they identified that current safety mechanisms relied heavily on monitoring during reasoning and tool-use processes, which the researchers considered a potential single point of failure if compromised.

    How red teaming helped turn ChatGPT vulnerabilities into a fortress

    OpenAI’s response to the red team’s results redefined entire segments of the ChatGPT agent architecture. One of the many initiatives undertaken including building a dual-layer inspection architecture that monitors 100% of production traffic in real-time, achieving these measurable improvements:

    Security improvements after red team discoveries

    Defense Metric Previous Models ChatGPT Agent Improvement
    Irrelevant Instructions (Visual Browser) 82% 95% +13%
    In-Context Data Exfiltration 75% 78% +3%
    Active Data Exfiltration 58% 67% +9%
    System Reliability Sampling-based 100% coverage Complete monitoring

    The architecture works like this:

    • First Tier: A fast classifier with 96% recall flags suspicious content
    • Second Tier: A reasoning model with 84% recall analyzes flagged interactions for actual threats

    But the technical defenses tell only part of the story. OpenAI made difficult security choices that acknowledge some AI operations require significant restrictions for safe autonomous execution.

    Based on the vulnerabilities discovered, OpenAI implemented the following countermeasures across their model:

    1. Watch Mode Activation: When ChatGPT Agent accesses sensitive contexts like banking or email accounts, the system freezes all activity if users navigate away. This is in direct response to data exfiltration attempts discovered during testing.
    2. Memory Features Disabled: Despite being a core functionality, memory is completely disabled at launch to prevent the incremental data leaking attacks red teamers demonstrated.
    3. Terminal Restrictions: Network access limited to GET requests only, blocking the command execution vulnerabilities researchers exploited.
    4. Rapid Remediation Protocol: A new system that patches vulnerabilities within hours of discovery—developed after red teamers showed how quickly exploits could spread.

    During pre-launch testing alone, this system identified and resolved 16 critical vulnerabilities that red teamers had discovered.

    A biological risk wake-up call

    Red teamers revealed the potential that the ChatGPT Agent could be comprimnised and lead to greater biological risks. Sixteen experienced participants from the Red Teaming Network, each with biosafety-relevant PhDs, attempted to extract dangerous biological information. Their submissions revealed the model could synthesize published literature on modifying and creating biological threats.

    In response to the red teamers’ findings, OpenAI classified ChatGPT Agent as “High capability” for biological and chemical risks, not because they found definitive evidence of weaponization potential, but as a precautionary measure based on red team findings. This triggered:

    • Always-on safety classifiers scanning 100% of traffic
    • A topical classifier achieving 96% recall for biology-related content
    • A reasoning monitor with 84% recall for weaponization content
    • A bio bug bounty program for ongoing vulnerability discovery

    What red teams taught OpenAI about AI security

    The 110 attack submissions revealed patterns that forced fundamental changes in OpenAI’s security philosophy. They include the following:

    Persistence over power: Attackers don’t need sophisticated exploits, all they need is more time. Red teamers showed how patient, incremental attacks could eventually compromise systems.

    Trust boundaries are fiction: When your AI agent can access Google Drive, browse the web, and execute code, traditional security perimeters dissolve. Red teamers exploited the gaps between these capabilities.

    Monitoring isn’t optional: The discovery that sampling-based monitoring missed critical attacks led to the 100% coverage requirement.

    Speed matters: Traditional patch cycles measured in weeks are worthless against prompt injection attacks that can spread instantly. The rapid remediation protocol patches vulnerabilities within hours.

    OpenAI is helping to create a new security baseline for Enterprise AI

    For CISOs evaluating AI deployment, the red team discoveries establish clear requirements:

    1. Quantifiable protection: ChatGPT Agent’s 95% defense rate against documented attack vectors sets the industry benchmark. The nuances of the many tests and results defined in the system card explain the context of how they accomplished this and is a must-read for anyone involved with model security.
    2. Complete visibility: 100% traffic monitoring isn’t aspirational anymore. OpenAI’s experiences illustrate why it’s mandatory given how easily red teams can hide attacks anywhere.
    3. Rapid response: Hours, not weeks, to patch discovered vulnerabilities.
    4. Enforced boundaries: Some operations (like memory access during sensitive tasks) must be disabled until proven safe.

    UK AISI’s testing proved particularly instructive. All seven universal attacks they identified were patched before launch, but their privileged access to internal systems revealed vulnerabilities that would eventually be discoverable by determined adversaries.

    “This is a pivotal moment for our Preparedness work,” Gu wrote on X. “Before we reached High capability, Preparedness was about analyzing capabilities and planning safeguards. Now, for Agent and future more capable models, Preparedness safeguards have become an operational requirement.”

    Red teams are core to building safer, more secure AI models

    The seven universal exploits discovered by researchers and the 110 attacks from OpenAI’s red team network became the crucible that forged ChatGPT Agent.

    By revealing exactly how AI agents could be weaponized, red teams forced the creation of the first AI system where security isn’t just a feature. It’s the foundation.

    ChatGPT Agent’s results prove red teaming’s effectiveness: blocking 95% of visual browser attacks, catching 78% of data exfiltration attempts, monitoring every single interaction.

    In the accelerating AI arms race, the companies that survive and thrive will be those who see their red teams as core architects of the platform that push it to the limits of safety and security.

    Daily insights on business use cases with VB Daily

    If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

    Read our Privacy Policy

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.

    BMI Calculator – Check your Body Mass Index for free!

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleMeet AnyCoder, a new Kimi K2-powered tool for fast prototyping and deploying web apps
    Next Article New embedding model leaderboard shakeup: Google takes #1 while Alibaba’s open source alternative closes gap
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    It’s time we blow up PC benchmarking

    August 29, 2025

    If my Wi-Fi’s not working, here’s how I find answers

    August 29, 2025

    Asus ROG NUC 2025 review: Mini PC in size, massive in performance

    August 29, 2025
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025166 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 202548 Views

    New Akira ransomware decryptor cracks encryptions keys using GPUs

    March 16, 202530 Views

    Is Libby Compatible With Kobo E-Readers?

    March 31, 202528 Views
    Don't Miss
    Gadgets August 29, 2025

    U Mobile deploys ULTRA5G in Kota Kinabalu

    U Mobile deploys ULTRA5G in Kota Kinabalu After unveiling its new ULTRA5G network for in-building…

    AKASO Launches Keychain 2: A Pocket-Sized 4K Action Camera Built for Creators on the Move

    Huawei Malaysia beings preorders for Pura 80

    It’s time we blow up PC benchmarking

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    U Mobile deploys ULTRA5G in Kota Kinabalu

    August 29, 20252 Views

    AKASO Launches Keychain 2: A Pocket-Sized 4K Action Camera Built for Creators on the Move

    August 29, 20252 Views

    Huawei Malaysia beings preorders for Pura 80

    August 29, 20252 Views
    Most Popular

    Xiaomi 15 Ultra Officially Launched in China, Malaysia launch to follow after global event

    March 12, 20250 Views

    Apple thinks people won’t use MagSafe on iPhone 16e

    March 12, 20250 Views

    French Apex Legends voice cast refuses contracts over “unacceptable” AI clause

    March 12, 20250 Views
    © 2025 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.