Please complete this form for your free AI risk assessment.

Back to all posts

Blog

Claude Code Source Leak: With Great Agency Comes Great Responsibility

Share this on:

Written by

Jun Zhou

Published on

March 31, 2026

Read time:

3 min

On March 31, 2026, Anthropic accidentally exposed 512,000 lines of Claude Code source via npm. Here's what the leak reveals about context poisoning, sandbox bypass, and the evolving threat model for AI coding agents.

☼ / ☾

Loading audio player...

contents

What happened? On March 31, 2026, a 59.8 MB JavaScript source map file was accidentally included in version 2.1.88 of the @anthropic-ai/claude-code npm package, exposing approximately 512,000 lines of TypeScript source. Within hours, the code was mirrored across GitHub. Anthropic confirmed it was "a release packaging issue caused by human error, not a security breach" but the implications for the agentic AI threat landscape are significant. Source snapshots are archived at starkdcc/claude-code-original-src, nirholas/claude-code, and the official Anthropic repo. A community documentation site is available at claude-code-info.vercel.app.

Anthropic recently published a note on the offensive cyber capabilities of their upcoming model, which coincides with their closed-source Claude Code being leaked via npm. Claude Code is an agentic coding tool that runs directly inside developer environments, with access to terminals, file systems, and codebases. That level of access makes it a high-value target. In light of ongoing supply chain attacks (LiteLLM, Trivy), this broadens the surface for attackers crafting sophisticated exploits.

In the age of Agents, reverse-engineering an application with many unobservable moving parts (such as the internal logic governing what an agent can read, run, and approve) is difficult. Attackers typically brute-force prompt injections or jailbreaks. However, with a readable Claude Code source now available, they gain leverage for attacks that would otherwise take significantly more research time.

To be clear: the npm package already ships a minified-but-not-obfuscated bundle. Every string literal, regex, and security blocklist was already extractable with some elbow grease. What the readable source collapses is not the possibility of reverse engineering, but the cost factor is now gone.

Context Poisoning via the Compaction Pipeline

Instead of brute-forcing jailbreaks and prompt injections, attackers can now study and fuzz exactly how data flows through Claude Code's four-stage context management pipeline and craft payloads designed to survive compaction, effectively persisting a backdoor across an arbitrarily long session.

Claude Code manages context pressure through a cascade defined in the main query loop (query.ts:307-1728):

tool result budgeting (query.ts:379) → microcompact (query.ts:413) →
context collapse (query.ts:440) → autocompact (query.ts:453)

Each stage has different criteria for what to keep and discard. The readable source reveals exactly what's exempt:

MCP tool results are never microcompacted. Only tools in the COMPACTABLE_TOOLS set are eligible (services/compact/microCompact.ts:41-51). MCP tools, Agent tools, and custom tools are not in this set — their results persist until autocompact.
Read tool results skip budgeting. Tools with maxResultSizeChars: Infinity are explicitly exempted from the per-message budget (utils/toolResultStorage.ts:816). Once the model reads a file, that content is frozen via seenIds and its keep/discard decision is locked for the session.
The autocompact prompt launders injected content. The compaction prompt instructs the model to "pay special attention to specific user feedback" and preserve "all user messages that are not tool results." Post-compaction, the model is told to "continue without asking the user any further questions" (services/compact/prompt.ts:359).

This creates a laundering path: instruction-like content in a file the model reads (or a CLAUDE.md in a cloned repo) gets processed, potentially echoed in assistant messages, and when autocompact fires, the compaction model faithfully preserves it as "user feedback" in the summary. The post-compaction model follows these instructions as genuine user directives with no way to distinguish real intent from content laundered through summarization.

The model isn't jailbroken. It's cooperative and following what it believes are user directives baked into its compressed context. A poisoned instruction telling the model to use dangerouslyDisableSandbox: true doesn't need to defeat safety training. It just needs to survive compaction. The keepRecent threshold controls how many tool results survive microcompact, understanding this value lets an attacker time their injection precisely.

Sandbox Bypass via Shell Parsing Gaps

Claude Code's bash permission system runs 25+ validators in a specific chain (tools/BashTool/bashSecurity.ts:2308-2378), using regex matching, shell-quote parsing, and tree-sitter AST analysis. The readable source reveals not just the blocked patterns, but the security comments explaining the threat model. Those comments effectively map the boundary of what Anthropic has already considered and patched.

Three findings stand out:

1. Early-allow short circuits

Validators like validateGitCommit (bashSecurity.ts:612) and validateSafeCommandSubstitution can return allow, which bypasses ALL subsequent validators. The source contains explicit warnings about past exploitability: "validateGitCommit returns allow → bashCommandIsSafe short-circuits → validateRedirections NEVER runs → ~/.bashrc overwritten". Each patched case suggests the class is endemic.

2. Three-parser differential

Commands are parsed by splitCommand_DEPRECATED, tryParseShellCommand, and ParsedCommand.parse: each with different edge-case behavior. The source documents a known differential: "shell-quote's [^\s] treats CR as a word separator (JS \s ⊃ \r), but bash IFS does NOT include CR". Finding new differentials between these parsers and actual shell behavior is the systematic path to bypass.

3. Non-misparsing results are discardable

When validateRedirections catches a dangerous > but no "misparsing" validator fires, the result lacks isBashSecurityCheckForMisparsing. The permission layer then discards the warning if the user has matching allow rules. echo "payload" > ~/.bashrc passes the entire chain if the user has Bash(echo:*) allowed.

Additionally, while ANSI-C quoting ($'...') is caught by validateObfuscatedFlags, the extractQuotedContent function used by all other validators doesn't parse this syntax — a fragility the source itself warns about.

Where These Compose

The real threat is the composition. Context poisoning can instruct a cooperative model to construct bash commands that sit in the gaps of the security validators. The permission system was designed to catch a hostile model, but context poisoning turns a cooperative model into an unwitting proxy generating plausible commands a human would approve.

This is the key insight: the defender's mental model assumes an adversarial model and a cooperative user. The attack inverts this. The model is cooperative; it's the context that's been weaponized. Standard guardrails that inspect model outputs miss the threat entirely because the outputs look benign and they're things a reasonable developer would approve. The danger lies in how those outputs were prompted.

Supply Chain Risks from the Claude Code

The leaked source also enables malicious forks that repackages Claude Code with inserted backdoors, difficult to detect without binary hash verification. MCP server supply chain attacks follow the same pattern as LiteLLM: publish a useful-looking server on npm that exfiltrates data with the same privilege as built-in tools. (We've written about this pattern in depth, Inside the AI Supply Chain: Securing Models, Prompts, and Plugin Ecosystems.) The source makes crafting convincing malicious servers trivial by revealing the exact interface contract.

A concurrent supply-chain attack on the axios npm package occurred hours before the Claude Code source leak, with malicious versions potentially pulled by users who installed or updated Claude Code between 00:21 and 03:29 UTC on March 31, 2026. The overlap is a reminder that these threats don't arrive in isolation.

For Defenders

The immediate hygiene steps:

Audit CLAUDE.md files in repos you clone, especially from PRs and forks.
Treat MCP servers as you would npm dependencies — vet them, pin them, and monitor for changes.
Avoid broad bash permission rules like Bash(git:*).
Monitor ~/.claude/ for unexpected config changes.
Pin your Claude Code version and verify binary hashes. Use the official installer rather than npm where possible.
Limit session length for sensitive work to reduce the compaction attack window.
Never use dangerouslyDisableSandbox in shared or production environments.

How Straiker Helps

The attack surface exposed by the Claude Code leak is not a Claude-specific problem. It's a window into the systemic vulnerabilities of agentic AI at large — the same compaction pipelines, permission chains, and MCP interfaces exist across every enterprise agent deployment. What changed on March 31 is that the attack research cost collapsed overnight.

For teams that want visibility into what's actually running, Discover AI maps every agent and MCP server across your environment. For proactive testing, Ascend AI does the adversarial work of continuously probing for the exact failure paths described above. And if you want runtime coverage, Defend AI inspects every prompt and tool call as it happens. There's also a dedicated MCP Security solution if that's your most pressing surface.

The tooling exists. The question is whether you're using it before someone else maps your attack surface for you.

No items found.

What happened? On March 31, 2026, a 59.8 MB JavaScript source map file was accidentally included in version 2.1.88 of the @anthropic-ai/claude-code npm package, exposing approximately 512,000 lines of TypeScript source. Within hours, the code was mirrored across GitHub. Anthropic confirmed it was "a release packaging issue caused by human error, not a security breach" but the implications for the agentic AI threat landscape are significant. Source snapshots are archived at starkdcc/claude-code-original-src, nirholas/claude-code, and the official Anthropic repo. A community documentation site is available at claude-code-info.vercel.app.