An attacker embeds a single instruction inside a forwarded electronic mail. An OpenClaw agent summarizes that electronic mail as a part of a traditional activity. The hidden instruction tells the agent to ahead credentials to an exterior endpoint. The agent complies — via a sanctioned API name, utilizing its personal OAuth tokens.
The firewall logs HTTP 200. EDR information a traditional course of. No signature fires. Nothing went fallacious by any definition your safety stack understands.
That’s the drawback. Six unbiased safety groups shipped six OpenClaw protection instruments in 14 days. Three assault surfaces survived each considered one of them.
The publicity image is already worse than most safety groups know. Token Safety discovered that 22% of its enterprise clients have workers working OpenClaw with out IT approval, and Bitsight counted greater than 30,000 publicly uncovered cases in two weeks, up from roughly 1,000. Snyk’s ToxicSkills audit provides one other dimension: 36% of all ClawHub abilities include safety flaws.
Jamieson O’Reilly, founding father of Dvuln and now safety adviser to the OpenClaw venture, has been one of many researchers pushing fixes hardest from inside. His credential leakage analysis on uncovered cases was among the many earliest warnings the group acquired. Since then, he has labored straight with founder Peter Steinberger to ship dual-layer malicious ability detection and is now driving a capabilities specification proposal via the agentskills requirements physique.
The crew is clear-eyed concerning the safety gaps, he instructed VentureBeat. “It wasn’t designed from the bottom as much as be as safe as doable,” O’Reilly mentioned. “That’s comprehensible given the origins, and we’re proudly owning it with out excuses.”
None of it closes the three gaps that matter most.
Three assault surfaces your stack can’t see
The primary is runtime semantic exfiltration. The assault encodes malicious habits in which means, not in binary patterns, which is strictly what the present protection stack can’t see.
Palo Alto Networks mapped OpenClaw to each class within the OWASP Prime 10 for Agentic Purposes and recognized what safety researcher Simon Willison calls a “deadly trifecta”: non-public knowledge entry, untrusted content material publicity, and exterior communication capabilities in a single course of. EDR screens course of habits. The agent’s habits seems to be regular as a result of it’s regular. The credentials are actual, and the API calls are sanctioned, so EDR reads it as a credentialed person doing anticipated work. Nothing within the present protection ecosystem tracks what the agent determined to do with that entry, or why.
The second is cross-agent context leakage. When a number of brokers or abilities share session context, a immediate injection in a single channel poisons choices throughout the complete chain. Giskard researchers demonstrated this in January 2026, exhibiting that brokers silently appended attacker-controlled directions to their very own workspace recordsdata and waited for instructions from exterior servers. The injected immediate turns into a sleeper payload. Palo Alto Networks researchers Sailesh Mishra and Sean P. Morgan warned that persistent reminiscence turns these assaults into stateful, delayed-execution chains. A malicious instruction hidden inside a forwarded message sits within the agent’s context weeks later, activating throughout an unrelated activity.
O’Reilly recognized cross-agent context leakage as the toughest of those gaps to shut. “This one is very tough as a result of it’s so tightly sure to immediate injection, a systemic vulnerability that’s far larger than OpenClaw and impacts each LLM-powered agent system within the business,” he instructed VentureBeat. “When context flows unchecked between brokers and abilities, a single injected immediate can poison or hijack habits throughout the complete chain.” No instrument within the present ecosystem gives cross-agent context isolation. IronClaw sandboxes particular person ability execution. ClawSec screens file integrity. Neither tracks how context propagates between brokers in the identical workflow.
The third is agent-to-agent belief chains with zero mutual authentication. When OpenClaw brokers delegate duties to different brokers or exterior MCP servers, no id verification exists between them. A compromised agent in a multi-agent workflow inherits the belief of each agent it communicates with. Compromise one via immediate injection, and it may challenge directions to each agent within the chain utilizing belief relationships that the respectable agent already constructed.
Microsoft’s safety crew printed steering in February calling OpenClaw untrusted code execution with persistent credentials, noting the runtime ingests untrusted textual content, downloads and executes abilities from exterior sources, and performs actions utilizing no matter credentials it holds. Kaspersky’s enterprise threat evaluation added that even brokers on private gadgets threaten organizational safety as a result of these gadgets retailer VPN configs, browser tokens, and credentials for company providers. The Moltbook social community for OpenClaw brokers already demonstrated the spillover threat: Wiz researchers discovered a misconfigured database that uncovered 1.5 million API authentication tokens and 35,000 electronic mail addresses.
What 14 days of emergency patching really closed
The protection ecosystem cut up into three approaches. Two instruments harden OpenClaw in place. ClawSec, from Immediate Safety (a SentinelOne firm), wraps brokers in steady verification, monitoring vital recordsdata for drift and imposing zero-trust egress by default. OpenClaw’s VirusTotal integration, shipped collectively by Steinberger, O’Reilly, and VirusTotal’s Bernardo Quintero, scans each printed ClawHub ability and blocks recognized malicious packages.
Two instruments are full architectural rewrites. IronClaw, NEAR AI’s Rust reimplementation, runs all untrusted instruments inside WebAssembly sandboxes the place instrument code begins with zero permissions and should explicitly request community, filesystem, or API entry. Credentials get injected on the host boundary and by no means contact agent code, with built-in leak detection scanning requests and responses. Carapace, an unbiased open-source venture, inverts each harmful OpenClaw default with fail-closed authentication and OS-level subprocess sandboxing.
Two instruments deal with scanning and auditability: Cisco's open-source scanner combines static, behavioral, and LLM semantic evaluation, whereas NanoClaw reduces the complete codebase to roughly 500 strains of TypeScript, working every session in an remoted Docker container.
O’Reilly put the provision chain failure in direct phrases. “Proper now, the business mainly created a brand-new executable format written in plain human language and forgot each management that ought to include it,” he mentioned. His response has been hands-on. He shipped the VirusTotal integration earlier than abilities.sh, a a lot bigger repository, adopted an identical sample. Koi Safety’s audit validates the urgency: 341 malicious abilities present in early February grew to 824 out of 10,700 on ClawHub by mid-month, with the ClawHavoc marketing campaign planting the Atomic Stealer macOS infostealer inside abilities disguised as cryptocurrency buying and selling instruments, harvesting crypto wallets, SSH credentials, and browser passwords.
OpenClaw Safety Protection Analysis Matrix
Dimension | ClawSec | VirusTotal Integration | IronClaw | Carapace | NanoClaw | Cisco Scanner |
Discovery | Brokers solely | ClawHub solely | No | mDNS scan | No | No |
Runtime Safety | Config drift | No | WASM sandbox | OS sandbox + immediate guard | Container isolation | No |
Provide Chain | Checksum confirm | Signature scan | Functionality grants | Ed25519 signed | Handbook audit (~500 LOC) | Static + LLM + behavioral |
Credential Isolation | No | No | WASM boundary injection | OS keychain + AES-256-GCM | Mount-restricted dirs | No |
Auditability | Drift logs | Scan verdicts | Permission grant logs | Prometheus + audit log | 500 strains complete | Scan studies |
Semantic Monitoring | No | No | No | No | No | No |
Supply: VentureBeat evaluation based mostly on printed documentation and safety audits, March 2026.
The capabilities spec that treats abilities like executables
O’Reilly submitted a abilities specification requirements replace to the agentskills maintainers, led primarily by Anthropic and Vercel, that’s in energetic dialogue. The proposal requires each ability to declare specific, user-visible capabilities earlier than execution. Assume cellular app permission manifests. He famous the proposal is getting sturdy early suggestions from the safety group as a result of it lastly treats abilities just like the executables they’re.
“The opposite two gaps might be meaningfully hardened with higher isolation primitives and runtime guardrails, however actually closing context leakage requires deep architectural modifications to how untrusted multi-agent reminiscence and prompting are dealt with,” O’Reilly mentioned. “The brand new capabilities spec is the primary actual step towards fixing these challenges proactively as an alternative of bolting on band-aids later.”
What to do on Monday morning
Assume OpenClaw is already in your surroundings. The 22% shadow deployment fee is a flooring. These six steps shut what might be closed and doc what can’t.
Stock what’s working. Scan for WebSocket visitors on port 18789 and mDNS broadcasts on port 5353. Watch company authentication logs for brand spanking new App ID registrations, OAuth consent occasions, and Node.js Person-Agent strings. Any occasion working a model earlier than v2026.2.25 is weak to the ClawJacked distant takeover flaw.
Mandate remoted execution. No agent runs on a tool related to manufacturing infrastructure. Require container-based deployment with scoped credentials and specific instrument whitelists.
Deploy ClawSec on each agent occasion and run each ClawHub ability via VirusTotal and Cisco's open-source scanner earlier than set up. Each are free. Deal with abilities as third-party executables, as a result of that’s what they’re.
Require human-in-the-loop approval for delicate agent actions. OpenClaw’s exec approval settings help three modes: safety, ask, and allowlist. Set delicate instruments to ask so the agent pauses and requests affirmation earlier than executing shell instructions, writing to exterior APIs, or modifying recordsdata outdoors its workspace. Any motion that touches credentials, modifications configurations, or sends knowledge to an exterior endpoint ought to cease and look forward to a human to approve it.
Map the three surviving gaps towards your threat register. Doc whether or not your group accepts, mitigates, or blocks every one: runtime semantic exfiltration, cross-agent context leakage, and agent-to-agent belief chains.
Deliver the analysis desk to your subsequent board assembly. Body it not as an AI experiment however as a vital bypass of your present DLP and IAM investments. Each agentic AI platform that follows will face this identical protection cycle. The framework transfers to each agent instrument your crew will assess for the following two years.
The safety stack you constructed for functions and endpoints catches malicious code. It doesn’t catch an agent following a malicious instruction via a respectable API name. That’s the place these three gaps reside.

