AI Models Begin Self-Directed Learning Through Inquiry

Even the most advanced artificial intelligence systems are fundamentally imitators. They acquire knowledge either by ingesting examples created by humans or by attempting to tackle challenges posed by human educators.
However, there’s a potential for AI to learn in a more human-like manner—by exploring intriguing questions independently and striving to discover the correct answers. A collaboration among Tsinghua University, the Beijing Institute for General Artificial Intelligence (BIGAI), and Pennsylvania State University demonstrates that AI can learn to reason in this manner through experimentation with computer code.
The researchers created a system known as Absolute Zero Reasoner (AZR), which initially employs a large language model to generate complex yet solvable Python coding challenges. The same model then attempts to solve these challenges, verifying its solutions by executing the code. Ultimately, the AZR system utilizes its successes and errors as feedback to enhance the original model, improving its skill in both posing and resolving issues.
The team observed that their methodology significantly boosted the coding and reasoning capabilities of both the 7 billion and 14 billion parameter variants of the open-source language model Qwen. Impressively, this model even surpassed some that were trained with human-curated datasets.
I had a conversation with Andrew Zhao, a PhD student at Tsinghua University who originated the idea for Absolute Zero, along with Zilong Zheng, a researcher at BIGAI who collaborated on the project, via Zoom.
Zhao explained that this approach mirrors the way human learning extends beyond simple memorization or mimicry. “Initially, you copy your parents and emulate your teachers, but then you must begin to pose your own questions,” he stated. “Eventually, you can outshine those who guided you in school.”
Zhao and Zheng pointed out that the concept of AI learning through this method, often referred to as “self-play,” has been explored for years and was previously investigated by notable figures like Jürgen Schmidhuber, an influential AI pioneer, and Pierre-Yves Oudeyer, a computer scientist at Inria in France.
One of the most thrilling aspects of the project, as noted by Zheng, is the scalability of the model’s problem-posing and problem-solving abilities. “As the model becomes more powerful, the difficulty level increases,” he remarked.
A significant challenge is that, for now, the system is only effective on problems that can be easily verified, such as those related to mathematics or coding. As the project advances, it may be feasible to apply it to tasks requiring agency, like web browsing or handling office duties. This could involve assessing whether an agent’s actions are appropriate.
A compelling potential outcome of a framework like Absolute Zero is that it could, in theory, enable models to transcend human instruction. “Once we achieve that, it’s a pathway to attaining superintelligence,” Zheng shared with me.
There are initial indications that the Absolute Zero methodology is gaining traction at several prominent AI laboratories.
One initiative called Agent0, involving Salesforce, Stanford, and the University of North Carolina at Chapel Hill, features a software-tool-using agent that enhances itself through self-play. Similar to Absolute Zero, this model improves its general reasoning capabilities through experimental problem-solving. A recent research paper authored by scholars from Meta, the University of Illinois, and Carnegie Mellon University introduces a system utilizing a comparable kind of self-play for software engineering. The authors suggest this represents “a first step toward training paradigms for superintelligent software agents.”
Exploring innovative methods for AI learning will likely be a prominent theme in the tech sector this year. With traditional data sources becoming increasingly scarce and costly, and as labs seek new strategies to enhance model capabilities, a project like Absolute Zero could pave the way for AI systems that resemble humans more than mere imitators.
