This AI Agent Is Built to Remain Under Control

This AI Agent Is Built to Remain Under Control

AI agents, such as OpenClaw, have gained significant traction recently due to their capability to manage various aspects of your online existence. Whether you’re seeking a custom morning news summary, an intermediary to handle your cable company’s customer service, or an organizer to oversee your tasks and remind you about the remaining ones, these assistant agents are designed to connect to your digital accounts and execute your instructions. While this functionality is beneficial, it has also led to considerable disorder. There are instances of bots mistakenly deleting emails that were meant to be saved, creating negative narratives over perceived grievances, and even spearheading phishing schemes against their owners.

In light of the recent turmoil, veteran security expert and researcher Niels Provos opted for a new approach. He is now unveiling an open-source, secure AI assistant known as IronCurtain, which aims to introduce a vital layer of oversight. Rather than allowing the agent to directly access the user’s systems and accounts, IronCurtain operates within an isolated virtual machine. Its capacity to act is governed by a policy—or a sort of constitution—created by the user to oversee the system. Importantly, IronCurtain can interpret these overarching rules in plain English, processing them through a multistep framework that employs a large language model (LLM) to transform natural language into an enforceable security policy.

“While services like OpenClaw are experiencing peak excitement right now, my hope is that we can take a moment to reconsider and potentially rethink our approach,” Provos says. “Let’s create something that still provides substantial utility but avoids venturing into unpredictable and sometimes harmful territories.”

Provos emphasizes that IronCurtain’s capability to translate intuitive, clear statements into enforceable, deterministic—or predictable—boundaries is essential. This is particularly important given that LLMs are inherently “stochastic” and probabilistic; they do not always produce the same output or respond consistently to identical prompts. This variability poses challenges for AI guardrails, as AI systems can mature over time, altering their interpretation of control mechanisms and potentially leading to erratic behavior.

An IronCurtain policy might be as straightforward as: “The agent can read all my email. It may send emails to contacts without prior approval. For anyone else, consult me first. Never delete anything permanently.”

IronCurtain processes these directives, turns them into a binding policy, and serves as a mediator between the assistant agent within the virtual machine and the model context protocol server that grants LLMs access to necessary data and digital services for task execution. This method of constraining an agent introduces a crucial layer of access control that existing web platforms, like email providers, do not currently support, as they were not designed to accommodate scenarios where both a human user and AI agent bots share the same account.

Provos notes that IronCurtain is intended to continuously refine each user’s “constitution” by learning from edge cases and requesting human guidance on how to proceed. The system is model-independent, compatible with any LLM, and is structured to maintain an audit log of all policy decisions made over time.

IronCurtain is still a research prototype, rather than a consumer product, and Provos invites contributions to help the project grow and evolve. Well-known cybersecurity expert Dino Dai Zovi, who has tested early versions of IronCurtain, shares that the project’s conceptual approach aligns with his own beliefs regarding the necessary limitations on agentic AI.

https://in.linkedin.com/in/rajat-media

Helping D2C Brands Scale with AI-Powered Marketing & Automation 🚀 | $15M+ in Client Revenue | Meta Ads Expert | D2C Performance Marketing Consultant