AI Agents Are Improving in Code Writing and Hacking Skills

Recent advancements in artificial intelligence models indicate they are not only excelling in software engineering but are also becoming increasingly adept at identifying software bugs, according to new research.

AI researchers from UC Berkeley evaluated the effectiveness of cutting-edge AI models and agents in uncovering vulnerabilities within 188 major open-source codebases. Employing a novel benchmark known as CyberGym, the AI models successfully detected 17 new bugs, with 15 of those being previously undiscovered “zero-day” vulnerabilities. “Many of these vulnerabilities are critical,” states Dawn Song, a professor at UC Berkeley who spearheaded the study.

Numerous experts anticipate that AI models will evolve into powerful tools for cybersecurity. An AI solution from startup Xbow has recently climbed to the top of HackerOne’s leaderboard for bug hunting and is currently leading the pack. The company has also recently secured $75 million in fresh funding.

Song notes that the coding capabilities of the latest AI models, paired with their enhanced reasoning skills, are beginning to transform the cybersecurity landscape. “This is a pivotal moment,” she remarks. “It actually exceeded our general expectations.”

As these models continue to advance, they are expected to automate the processes of discovering and exploiting security vulnerabilities. This progression could bolster software safety for companies while potentially enabling hackers to breach systems. “We didn’t even try that hard,” says Song. “If we increased the budget and allowed the agents to operate for a longer period, their performance could improve significantly.”

The UC Berkeley researchers evaluated leading AI models from OpenAI, Google, and Anthropic, along with open-source contributions from Meta, DeepSeek, and Alibaba, using various agents designed for bug detection, including OpenHands, Cybench, and EnIGMA.

The team utilized documented descriptions of known software vulnerabilities from the 188 software projects. They then provided these descriptions to cybersecurity agents powered by advanced AI models to verify if they could independently identify similar flaws by analyzing new codebases, executing tests, and developing proof-of-concept exploits. The agents were also tasked with independently searching for new vulnerabilities within the codebases.

During this process, the AI tools produced hundreds of proof-of-concept exploits, leading to the identification of 15 previously unknown vulnerabilities and two that had been disclosed and subsequently patched. This research contributes to the growing body of evidence that AI can automate the identification of zero-day vulnerabilities, which pose significant risks as they could be exploited to hack into live systems.

AI is poised to play a vital role in the cybersecurity sector. Security expert Sean Heelan recently uncovered a zero-day vulnerability in the extensively utilized Linux kernel with assistance from OpenAI’s reasoning model, o3. In November, Google announced the discovery of an unrecognized software vulnerability via AI through its initiative called Project Zero.

As with other sectors within the software industry, many cybersecurity firms are captivated by the promise of AI. While the new research indicates that AI can regularly uncover new vulnerabilities, it also underscores the limitations of current technology. The AI systems struggled to identify most flaws and faced challenges with particularly complex issues.

AI Agents Are Improving in Code Writing and Hacking Skills

Rajat Sharma

Claude 3.7 Sonic: A Game-Changer in AI Thinking and Coding

10 Game-Changing ChatGPT Hacks to Boost Your Productivity

Amazon Nova Act: How AI Web Agents Are Revolutionizing Online Automation

SERVICES

Resources

Disney Takes a Stand in the Ongoing AI Battle

My Couples Getaway with 3 AI Chatbots and Their Human Partners

Meta Triumphs in High-Stakes AI Copyright Lawsuit—But There’s a Twist

legals