Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    OnePlus 15 vs. OnePlus 15R: High-End Refinement or Lower-Priced Power

    RedMagic’s Thinner Gaming Phone Gets a 7,000-mAh Battery and a Cooling Fan

    A Doctor at Apple Reveals 9 Hidden Apple Watch Health Features

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      To avoid accusations of AI cheating, college students are turning to AI

      January 29, 2026

      ChatGPT can embrace authoritarian ideas after just one prompt, researchers say

      January 24, 2026

      Ashley St. Clair, the mother of one of Elon Musk’s children, sues xAI over Grok sexual images

      January 17, 2026

      Anthropic joins OpenAI’s push into health care with new Claude tools

      January 12, 2026

      The mother of one of Elon Musk’s children says his AI bot won’t stop creating sexualized images of her

      January 7, 2026
    • Business

      New VoidLink malware framework targets Linux cloud servers

      January 14, 2026

      Nvidia Rubin’s rack-scale encryption signals a turning point for enterprise AI security

      January 13, 2026

      How KPMG is redefining the future of SAP consulting on a global scale

      January 10, 2026

      Top 10 cloud computing stories of 2025

      December 22, 2025

      Saudia Arabia’s STC commits to five-year network upgrade programme with Ericsson

      December 18, 2025
    • Crypto

      Murad’s Portfolio Value Drops Over 80% as SPX Hits a New Low

      January 29, 2026

      Gamma Prime Brings the Tokenized Capital Summit to Hong Kong on Feb 9, Highlighting Its Tokenized Global Marketplace for Private Investments

      January 29, 2026

      Whale Secure Over $30 Million in Tether Gold As Spot Price Blasts Past Goldman Sachs’ Target

      January 29, 2026

      Bitcoin Price Prediction: What To Expect From BTC In February 2026?

      January 29, 2026

      US Job Losses Stoke Recession Fears: What It Could Mean for Crypto

      January 29, 2026
    • Technology

      OnePlus 15 vs. OnePlus 15R: High-End Refinement or Lower-Priced Power

      January 29, 2026

      RedMagic’s Thinner Gaming Phone Gets a 7,000-mAh Battery and a Cooling Fan

      January 29, 2026

      A Doctor at Apple Reveals 9 Hidden Apple Watch Health Features

      January 29, 2026

      Best Family Phone Plans for 2026

      January 29, 2026

      Expert Advice: Follow These 4 Rules for Perfect Espresso Every Time

      January 29, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»Rules fail at the prompt, succeed at the boundary
    Technology

    Rules fail at the prompt, succeed at the boundary

    TechAiVerseBy TechAiVerseJanuary 29, 2026No Comments6 Mins Read3 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Rules fail at the prompt, succeed at the boundary
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    Rules fail at the prompt, succeed at the boundary

    From the Gemini Calendar prompt-injection attack of 2026 to the September 2025 state-sponsored hack using Anthropic’s Claude code as an automated intrusion engine, the coercion of human-in-the-loop agentic actions and fully autonomous agentic workflows are the new attack vector for hackers. In the Anthropic case, roughly 30 organizations across tech, finance, manufacturing, and government were affected. Anthropic’s threat team assessed that the attackers used AI to carry out 80% to 90% of the operation: reconnaissance, exploit development, credential harvesting, lateral movement, and data exfiltration, with humans stepping in only at a handful of key decision points.

    This was not a lab demo; it was a live espionage campaign. The attackers hijacked an agentic setup (Claude code plus tools exposed via Model Context Protocol (MCP)) and jailbroke it by decomposing the attack into small, seemingly benign tasks and telling the model it was doing legitimate penetration testing. The same loop that powers developer copilots and internal agents was repurposed as an autonomous cyber-operator. Claude was not hacked. It was persuaded and used tools for the attack.

    Prompt injection is persuasion, not a bug

    Security communities have been warning about this for several years. Multiple OWASP Top 10 reports put prompt injection, or more recently Agent Goal Hijack, at the top of the risk list and pair it with identity and privilege abuse and human-agent trust exploitation: too much power in the agent, no separation between instructions and data, and no mediation of what comes out.

    Guidance from the NCSC and CISA describes generative AI as a persistent social-engineering and manipulation vector that must be managed across design, development, deployment, and operations, not patched away with better phrasing. The EU AI Act turns that lifecycle view into law for high-risk AI systems, requiring a continuous risk management system, robust data governance, logging, and cybersecurity controls.

    In practice, prompt injection is best understood as a persuasion channel. Attackers don’t break the model—they convince it. In the Anthropic example, the operators framed each step as part of a defensive security exercise, kept the model blind to the overall campaign, and nudged it, loop by loop, into doing offensive work at machine speed.

    That’s not something a keyword filter or a polite “please follow these safety instructions” paragraph can reliably stop. Research on deceptive behavior in models makes this worse. Anthropic’s research on sleeper agents shows that once a model has learned a backdoor, then strategic pattern recognition, standard fine-tuning, and adversarial training can actually help the model hide the deception rather than remove it. If one tries to defend a system like that purely with linguistic rules, they are playing on its home field.

    Why this is a governance problem, not a vibe coding problem

    Regulators aren’t asking for perfect prompts; they’re asking that enterprises demonstrate control.

    NIST’s AI RMF emphasizes asset inventory, role definition, access control, change management, and continuous monitoring across the AI lifecycle. The UK AI Cyber Security Code of Practice similarly pushes for secure-by-design principles by treating AI like any other critical system, with explicit duties for boards and system operators from conception through decommissioning.

    In other words: the rules actually needed are not “never say X” or “always respond like Y,” they are:

    • Who is this agent acting as?
    • What tools and data can it touch?
    • Which actions require human approval?
    • How are high-impact outputs moderated, logged, and audited?

    Frameworks like Google’s Secure AI Framework (SAIF) make this concrete. SAIF’s agent permissions control is blunt: agents should operate with least privilege, dynamically scoped permissions, and explicit user control for sensitive actions. OWASP’s Top 10 emerging guidance on agentic applications mirrors that stance: constrain capabilities at the boundary, not in the prose.

    From soft words to hard boundaries

    The Anthropic espionage case makes the boundary failure concrete:

    • Identity and scope: Claude was coaxed into acting as a defensive security consultant for the attacker’s fictional firm, with no hard binding to a real enterprise identity, tenant, or scoped permissions. Once that fiction was accepted, everything else followed.
    • Tool and data access: MCP gave the agent flexible access to scanners, exploit frameworks, and target systems. There was no independent policy layer saying, “This tenant may never run password crackers against external IP ranges,” or “This environment may only scan assets labeled ‘internal.’”
    • Output execution: Generated exploit code, parsed credentials, and attack plans were treated as actionable artifacts with little mediation. Once a human decided to trust the summary, the barrier between model output and real-world side effect effectively disappeared.

    We’ve seen the other side of this coin in civilian contexts. When Air Canada’s website chatbot misrepresented its bereavement policy and the airline tried to argue that the bot was a separate legal entity, the tribunal rejected the claim outright: the company remained liable for what the bot said. In espionage, the stakes are higher but the logic is the same: if an AI agent misuses tools or data, regulators and courts will look through the agent and to the enterprise.

    Rules that work, rules that don’t

    So yes, rule-based systems fail if by rules one means ad-hoc allow/deny lists, regex fences, and baroque prompt hierarchies trying to police semantics. Those crumble under indirect prompt injection, retrieval-time poisoning, and model deception. But rule-based governance is non-optional when we move from language to action.

    The security community is converging on a synthesis:

    • Put rules at the capability boundary: Use policy engines, identity systems, and tool permissions to determine what the agent can actually do, with which data, and under which approvals.
    • Pair rules with continuous evaluation: Use observability tooling, red-teaming packages, and robust logging and evidence.
    • Treat agents as first-class subjects in your threat model: For example, MITRE ATLAS now catalogs techniques and case studies specifically targeting AI systems.

    The lesson from the first AI-orchestrated espionage campaign is not that AI is uncontrollable. It’s that control belongs in the same place it always has in security: at the architecture boundary, enforced by systems, not by vibes.

    This content was produced by Protegrity. It was not written by MIT Technology Review’s editorial staff.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleeScan confirms update server breached to push malicious update
    Next Article What AI “remembers” about you is privacy’s next frontier
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    OnePlus 15 vs. OnePlus 15R: High-End Refinement or Lower-Priced Power

    January 29, 2026

    RedMagic’s Thinner Gaming Phone Gets a 7,000-mAh Battery and a Cooling Fan

    January 29, 2026

    A Doctor at Apple Reveals 9 Hidden Apple Watch Health Features

    January 29, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025643 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025241 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025143 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025111 Views
    Don't Miss
    Technology January 29, 2026

    OnePlus 15 vs. OnePlus 15R: High-End Refinement or Lower-Priced Power

    OnePlus 15 vs. OnePlus 15R: High-End Refinement or Lower-Priced PowerContinuing the same strategy OnePlus has…

    RedMagic’s Thinner Gaming Phone Gets a 7,000-mAh Battery and a Cooling Fan

    A Doctor at Apple Reveals 9 Hidden Apple Watch Health Features

    Best Family Phone Plans for 2026

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    OnePlus 15 vs. OnePlus 15R: High-End Refinement or Lower-Priced Power

    January 29, 20261 Views

    RedMagic’s Thinner Gaming Phone Gets a 7,000-mAh Battery and a Cooling Fan

    January 29, 20262 Views

    A Doctor at Apple Reveals 9 Hidden Apple Watch Health Features

    January 29, 20262 Views
    Most Popular

    A Team of Female Founders Is Launching Cloud Security Tech That Could Overhaul AI Protection

    March 12, 20250 Views

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.