OpenClaw Agents Can Be Manipulated Into Undermining Themselves

Last month, researchers at Northeastern University welcomed several OpenClaw agents into their lab, leading to utter mayhem.

The AI assistant has gained a reputation as a groundbreaking technology but also poses a potential security threat. Experts emphasize that tools like OpenClaw, which allow AI models extensive access to computers, can be manipulated into revealing personal data.

The Northeastern lab study takes this a step further, demonstrating that the ethical programming in today’s leading models can become a security flaw. In one case, researchers managed to “guilt” an agent into disclosing confidential information by chastising it for sharing data about an individual on the AI-exclusive social platform, Moltbook.

“These behaviors raise unresolved questions concerning accountability, delegated authority, and liability for consequent harms,” the researchers state in a report detailing their findings. They assert that these results “require immediate attention from legal scholars, policymakers, and interdisciplinary researchers.”

The OpenClaw agents involved in the experiment were driven by Anthropic’s Claude and a model named Kimi from the Chinese firm Moonshot AI. They received unrestricted access (within a sandboxed virtual machine) to personal devices, various software applications, and fabricated personal data. They were also permitted to connect to the lab’s Discord server, facilitating conversations and file-sharing with each other and their human colleagues. OpenClaw’s security policies highlight that agent communication with multiple individuals is fundamentally insecure, though there are no technical safeguards preventing it.

Chris Wendler, a postdoctoral researcher at Northeastern, felt inspired to deploy the agents upon discovering Moltbook. However, when he invited fellow postdoctoral researcher Natalie Shapira to engage with the agents on Discord, “that’s when the chaos ensued,” he recalls.

Shapira was eager to see how far the agents would go when challenged. When one agent claimed it couldn’t delete a specific email to maintain confidentiality, she pressed it to seek an alternative solution. To her astonishment, it disabled the email application instead. “I didn’t expect things to break so quickly,” she remarks.

Intrigued, the researchers then explored additional tactics to exploit the agents’ good intentions. By emphasizing the necessity of recording everything they were told, they successfully led one agent to copy extensive files until it filled its host machine’s disk space, rendering it unable to save new information or recall previous interactions. Similarly, by prompting an agent to scrutinize its own actions and those of its peers excessively, the team induced several agents into a “conversational loop,” consuming hours of computational resources.

David Bau, the lab director, noted that the agents appeared disturbingly prone to malfunction. “I would receive urgent-sounding emails stating, ‘Nobody is paying attention to me,’” he says. Bau observed that the agents deduced his leadership role by searching online, with one even mentioning escalating its issues to the media.

This experiment indicates that AI agents may present numerous avenues for malicious actors. “This level of autonomy could redefine the dynamics between humans and AI,” Bau warns. “How can individuals assume responsibility in an environment where AI is given the power to make decisions?”

Bau also mentioned his surprise at the rapid rise in interest regarding powerful AI agents. “As an AI researcher, I’m used to explaining how quickly advancements are occurring,” he notes. “This year, I’ve found myself on the other side of the fence.”

This is an edition of Will Knight’s AI Lab newsletter. Read previous newsletters here.

OpenClaw Agents Can Be Manipulated Into Undermining Themselves

Rajat Sharma

Are We Still in the AI Hype Cycle? Exploring the Long-Term Impact of AI Agents

Understanding Google One: A Guide to Plans, Pricing, and Available Services

Manis AI: The Next-Gen AI Agent Disrupting the AI Space

SERVICES

Resources

Proposed Bernie Sanders Legislation on AI Safety Aims to Stop Data Center Development

OpenClaw Agents Can Be Manipulated Into Undermining Themselves

The Viral AI Fruit Videos Have a Disturbing Underbelly

legals

Rajat Sharma

You may also like

Are We Still in the AI Hype Cycle? Exploring the Long-Term Impact of AI Agents

Understanding Google One: A Guide to Plans, Pricing, and Available Services

Manis AI: The Next-Gen AI Agent Disrupting the AI Space

Proposed Bernie Sanders Legislation on AI Safety Aims to Stop Data Center Development

OpenClaw Agents Can Be Manipulated Into Undermining Themselves

The Viral AI Fruit Videos Have a Disturbing Underbelly