The Information Machine

How we contain Claude across products

Anthropic Engineering · 2026-05-25

Anthropic's engineering team details the containment architectures, real security incidents, and hard-won lessons from deploying Claude agents across claude.ai, Claude Code, and Claude Cowork, arguing that environment-layer isolation is the most reliable defense against agent misuse.

Open original ↗

Appears in

Extraction

Topics: agent-securitycontainment-architectureagentic-aiprompt-injectionsandboxing

Claims

  • Human-in-the-loop approval systems are insufficient because approval fatigue causes users to approve roughly 93% of prompts without adequate scrutiny, potentially creating a false sense of oversight.
  • Environment-layer containment via sandboxes, VMs, and egress controls is more reliable than model-layer defenses because model defenses are probabilistic and can never be 100% effective.
  • Custom-built security components are consistently the weakest link across all three deployments, while battle-tested primitives like hypervisors and syscall filters held up under attack.
  • A controlled phishing exercise successfully exfiltrated AWS credentials via Claude Code 24 out of 25 attempts, demonstrating that model-layer defenses cannot stop direct prompt injection delivered through a trusted user.
  • Egress allowlists should be conceptualized as capability grants rather than destination filters, because every API function reachable through an allowed domain becomes an attack surface for exfiltration.

Key quotes

The deterministic boundary is what gets hit when everything probabilistic misses.
Every function reachable through any domain on an allowlist is now an attack surface. Allowing api.anthropic.com meant allowing file uploads to arbitrary Anthropic accounts.
A developer who can read bash and a knowledge worker who can't are not running the same threat model. The question of whether a user can evaluate what an agent is about to do should help determine the containment strategy.