Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Antigravity A1 Drone Officially Launches in Malaysia From RM6,099

    DLSS 5 backlash: Nvidia’s CEO says gamers are ‘completely wrong’

    Open source Mamba 3 arrives to surpass Transformer architecture with nearly 4% improved language modeling, reduced latency

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      What the polls say about how Americans are using AI

      February 27, 2026

      Tensions between the Pentagon and AI giant Anthropic reach a boiling point

      February 21, 2026

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026
    • Business

      Mistral AI launches Forge to help companies build proprietary AI models, challenging cloud giants

      March 18, 2026

      Tracebit raises $20M to scale cloud honeypots as enterprise deception security grows

      March 18, 2026

      Nvidia’s DGX Station is a desktop supercomputer that runs trillion-parameter AI models without the cloud

      March 17, 2026

      Nvidia introduces Vera Rubin, a seven-chip AI platform with OpenAI, Anthropic and Meta on board

      March 17, 2026

      Salesforce tracks possible ShinyHunters campaign targeting its users

      March 15, 2026
    • Crypto

      Banks Respond to Kraken’s Federal Reserve Access as Trump Sides with Crypto

      March 4, 2026

      Hyperliquid and DEXs Break the Top 10 — Is the CEX Era Ending?

      March 4, 2026

      Consensus Hong Kong 2026: The Institutional Turn 

      March 4, 2026

      New Crypto Mutuum Finance (MUTM) Reports V1 Protocol Progress as Roadmap Enters Phase 3

      March 4, 2026

      Bitcoin Short Sellers Caught Off Guard in New White House Move

      March 4, 2026
    • Technology

      DLSS 5 backlash: Nvidia’s CEO says gamers are ‘completely wrong’

      March 18, 2026

      Open source Mamba 3 arrives to surpass Transformer architecture with nearly 4% improved language modeling, reduced latency

      March 18, 2026

      Subnautica 2 might finally be entering early access in May

      March 18, 2026

      Meta will shut down VR Horizon Worlds access in June

      March 18, 2026

      Why AI systems don’t learn – On autonomous learning from cognitive science

      March 18, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»Rules fail at the prompt, succeed at the boundary
    Technology

    Rules fail at the prompt, succeed at the boundary

    TechAiVerseBy TechAiVerseJanuary 29, 2026No Comments6 Mins Read3 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Rules fail at the prompt, succeed at the boundary
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    Rules fail at the prompt, succeed at the boundary

    From the Gemini Calendar prompt-injection attack of 2026 to the September 2025 state-sponsored hack using Anthropic’s Claude code as an automated intrusion engine, the coercion of human-in-the-loop agentic actions and fully autonomous agentic workflows are the new attack vector for hackers. In the Anthropic case, roughly 30 organizations across tech, finance, manufacturing, and government were affected. Anthropic’s threat team assessed that the attackers used AI to carry out 80% to 90% of the operation: reconnaissance, exploit development, credential harvesting, lateral movement, and data exfiltration, with humans stepping in only at a handful of key decision points.

    This was not a lab demo; it was a live espionage campaign. The attackers hijacked an agentic setup (Claude code plus tools exposed via Model Context Protocol (MCP)) and jailbroke it by decomposing the attack into small, seemingly benign tasks and telling the model it was doing legitimate penetration testing. The same loop that powers developer copilots and internal agents was repurposed as an autonomous cyber-operator. Claude was not hacked. It was persuaded and used tools for the attack.

    Prompt injection is persuasion, not a bug

    Security communities have been warning about this for several years. Multiple OWASP Top 10 reports put prompt injection, or more recently Agent Goal Hijack, at the top of the risk list and pair it with identity and privilege abuse and human-agent trust exploitation: too much power in the agent, no separation between instructions and data, and no mediation of what comes out.

    Guidance from the NCSC and CISA describes generative AI as a persistent social-engineering and manipulation vector that must be managed across design, development, deployment, and operations, not patched away with better phrasing. The EU AI Act turns that lifecycle view into law for high-risk AI systems, requiring a continuous risk management system, robust data governance, logging, and cybersecurity controls.

    In practice, prompt injection is best understood as a persuasion channel. Attackers don’t break the model—they convince it. In the Anthropic example, the operators framed each step as part of a defensive security exercise, kept the model blind to the overall campaign, and nudged it, loop by loop, into doing offensive work at machine speed.

    That’s not something a keyword filter or a polite “please follow these safety instructions” paragraph can reliably stop. Research on deceptive behavior in models makes this worse. Anthropic’s research on sleeper agents shows that once a model has learned a backdoor, then strategic pattern recognition, standard fine-tuning, and adversarial training can actually help the model hide the deception rather than remove it. If one tries to defend a system like that purely with linguistic rules, they are playing on its home field.

    Why this is a governance problem, not a vibe coding problem

    Regulators aren’t asking for perfect prompts; they’re asking that enterprises demonstrate control.

    NIST’s AI RMF emphasizes asset inventory, role definition, access control, change management, and continuous monitoring across the AI lifecycle. The UK AI Cyber Security Code of Practice similarly pushes for secure-by-design principles by treating AI like any other critical system, with explicit duties for boards and system operators from conception through decommissioning.

    In other words: the rules actually needed are not “never say X” or “always respond like Y,” they are:

    • Who is this agent acting as?
    • What tools and data can it touch?
    • Which actions require human approval?
    • How are high-impact outputs moderated, logged, and audited?

    Frameworks like Google’s Secure AI Framework (SAIF) make this concrete. SAIF’s agent permissions control is blunt: agents should operate with least privilege, dynamically scoped permissions, and explicit user control for sensitive actions. OWASP’s Top 10 emerging guidance on agentic applications mirrors that stance: constrain capabilities at the boundary, not in the prose.

    From soft words to hard boundaries

    The Anthropic espionage case makes the boundary failure concrete:

    • Identity and scope: Claude was coaxed into acting as a defensive security consultant for the attacker’s fictional firm, with no hard binding to a real enterprise identity, tenant, or scoped permissions. Once that fiction was accepted, everything else followed.
    • Tool and data access: MCP gave the agent flexible access to scanners, exploit frameworks, and target systems. There was no independent policy layer saying, “This tenant may never run password crackers against external IP ranges,” or “This environment may only scan assets labeled ‘internal.’”
    • Output execution: Generated exploit code, parsed credentials, and attack plans were treated as actionable artifacts with little mediation. Once a human decided to trust the summary, the barrier between model output and real-world side effect effectively disappeared.

    We’ve seen the other side of this coin in civilian contexts. When Air Canada’s website chatbot misrepresented its bereavement policy and the airline tried to argue that the bot was a separate legal entity, the tribunal rejected the claim outright: the company remained liable for what the bot said. In espionage, the stakes are higher but the logic is the same: if an AI agent misuses tools or data, regulators and courts will look through the agent and to the enterprise.

    Rules that work, rules that don’t

    So yes, rule-based systems fail if by rules one means ad-hoc allow/deny lists, regex fences, and baroque prompt hierarchies trying to police semantics. Those crumble under indirect prompt injection, retrieval-time poisoning, and model deception. But rule-based governance is non-optional when we move from language to action.

    The security community is converging on a synthesis:

    • Put rules at the capability boundary: Use policy engines, identity systems, and tool permissions to determine what the agent can actually do, with which data, and under which approvals.
    • Pair rules with continuous evaluation: Use observability tooling, red-teaming packages, and robust logging and evidence.
    • Treat agents as first-class subjects in your threat model: For example, MITRE ATLAS now catalogs techniques and case studies specifically targeting AI systems.

    The lesson from the first AI-orchestrated espionage campaign is not that AI is uncontrollable. It’s that control belongs in the same place it always has in security: at the architecture boundary, enforced by systems, not by vibes.

    This content was produced by Protegrity. It was not written by MIT Technology Review’s editorial staff.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleeScan confirms update server breached to push malicious update
    Next Article What AI “remembers” about you is privacy’s next frontier
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    DLSS 5 backlash: Nvidia’s CEO says gamers are ‘completely wrong’

    March 18, 2026

    Open source Mamba 3 arrives to surpass Transformer architecture with nearly 4% improved language modeling, reduced latency

    March 18, 2026

    Subnautica 2 might finally be entering early access in May

    March 18, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025730 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025308 Views

    Wired Headphones Are Making A Comeback, And We Have Gen Z To Thank

    July 22, 2025218 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025180 Views
    Don't Miss
    Gadgets March 18, 2026

    Antigravity A1 Drone Officially Launches in Malaysia From RM6,099

    Antigravity A1 Drone Officially Launches in Malaysia From RM6,099 Antigravity has officially launched the Antigravity…

    DLSS 5 backlash: Nvidia’s CEO says gamers are ‘completely wrong’

    Open source Mamba 3 arrives to surpass Transformer architecture with nearly 4% improved language modeling, reduced latency

    Subnautica 2 might finally be entering early access in May

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Antigravity A1 Drone Officially Launches in Malaysia From RM6,099

    March 18, 20264 Views

    DLSS 5 backlash: Nvidia’s CEO says gamers are ‘completely wrong’

    March 18, 20264 Views

    Open source Mamba 3 arrives to surpass Transformer architecture with nearly 4% improved language modeling, reduced latency

    March 18, 20261 Views
    Most Popular

    The Best 3-in-1 Apple Charging Stations (2025)

    March 16, 20250 Views

    Leaked Galaxy S25 Edge pricing gives us a clearer idea of how the super-slim phone will fit into Samsung’s lineup

    March 19, 20250 Views

    Best Student Internet Deals and Discounts for 2025

    March 19, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.