AI’s Hacking Capabilities Are Reaching a Critical Turning Point

AI's Hacking Capabilities Are Reaching a Critical Turning Point

Vlad Ionescu and Ariel Herbert-Voss, the founders of the cybersecurity startup RunSybil, experienced a brief moment of confusion when their AI tool, Sybil, alerted them to a flaw in a client’s systems last November.

Sybil employs a combination of various AI models along with some proprietary techniques to examine computer systems for vulnerabilities that hackers could exploit, such as an unpatched server or a misconfigured database.

In this instance, Sybil detected an issue with the client’s implementation of federated GraphQL, a language that defines how data is accessed through application programming interfaces (APIs) over the internet. This issue meant that the client was unintentionally revealing sensitive information.

What baffled Ionescu and Herbert-Voss was that identifying the problem necessitated a profound understanding of multiple systems and their interactions. RunSybil claims to have discovered this same issue in other GraphQL implementations—before it was publicly known. “We scoured the internet, and it didn’t exist,” says Herbert-Voss. “Uncovering it was a reasoning leap in terms of models’ capabilities—a significant shift.”

This scenario highlights an escalating risk. As AI models advance, their capability to uncover zero-day vulnerabilities and other flaws increases. The same intelligence that can identify weaknesses can also be weaponized against them.

Dawn Song, a computer scientist at UC Berkeley specializing in AI and security, notes that recent developments in AI have led to models that excel at detecting flaws. Techniques like simulated reasoning, which breaks problems into smaller components, and agentic AI, which includes web searching or executing software tools, have enhanced models’ cybersecurity capabilities.

“The cybersecurity skills of advanced models have surged dramatically in the past few months,” she states. “This marks a pivotal moment.”

Last year, Song co-created a benchmark called CyberGym to evaluate how effectively large language models identify vulnerabilities in extensive open-source software projects. CyberGym encompasses 1,507 known vulnerabilities across 188 projects.

In July 2025, Anthropic’s Claude Sonnet 4 managed to uncover roughly 20 percent of the vulnerabilities in the benchmark. By October 2025, a new model, Claude Sonnet 4.5, could identify 30 percent. “AI agents are capable of discovering zero-days, and they do so at a very low cost,” says Song.

Song emphasizes that this trend underscores the necessity for new defenses, including leveraging AI to assist cybersecurity professionals. “We must consider how to make AI a more active participant in defense, exploring various strategies,” she states.

One potential measure is for leading AI firms to share their models with security researchers prior to launch, enabling them to use these models to identify bugs and secure systems before a wider release.

Another suggestion from Song is to rethink the software development process from the ground up. Her lab has demonstrated that AI can be utilized to generate more secure code than what many developers use today. “In the long term, we believe this secure-by-design approach will significantly benefit defenders,” Song asserts.

The RunSybil team warns that, in the immediate future, the coding capabilities of AI models might give hackers a competitive edge. “AI can execute actions on a computer and generate code, which are precisely what hackers do,” Herbert-Voss explains. “If these capabilities accelerate, it means offensive security actions will also speed up.”


This is an edition of Will Knight’s AI Lab newsletter. Read previous newsletters here.

https://in.linkedin.com/in/rajat-media

Helping D2C Brands Scale with AI-Powered Marketing & Automation 🚀 | $15M+ in Client Revenue | Meta Ads Expert | D2C Performance Marketing Consultant