The Mathematics Behind AI Agents Doesn’t Compute

The Mathematics Behind AI Agents Doesn't Compute

The major AI corporations have assured us that 2025 would herald “the year of AI agents.” However, it has instead become the year of discussing AI agents, postponing that transformative moment to 2026 or perhaps beyond. But what if the answer to the question “When will our lives be completely automated by generative AI robots that undertake our tasks and essentially govern the world?” is, much like that New Yorker cartoon, “How about never?”

This was essentially the takeaway from a paper published quietly some months ago, right in the midst of the overhyped year of “agentic AI.” Titled “Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models,” it claims to mathematically demonstrate that “LLMs are incapable of executing computational and agentic tasks beyond a certain level of complexity.” While the science is complex, the authors—a former SAP CTO who studied AI under one of its founding figures, John McCarthy, along with his teenage prodigy son—dispelled the vision of an agentic utopia with clear mathematical reasoning. They assert that even reasoning models that surpass the basic word-prediction capabilities of LLMs will not resolve the issue.

“There is no way they can be dependable,” Vishal Sikka, the father, explains to me. With a career that includes serving as SAP’s leader and as CEO of Infosys, he now runs an AI services startup called Vianai. “So should we abandon the idea of AI agents managing nuclear power plants?” I inquire. “Precisely,” he responds. While you might manage to have it file some documents to save time, be prepared for some errors.

The AI sector disagrees. One notable success in agent AI has been in coding, which gained momentum last year. Just this week at Davos, Google’s Nobel-winning head of AI, Demis Hassabis, announced progress in reducing hallucinations, while various startups and tech giants continue to promote the agent narrative. They now have support. A startup named Harmonic is claiming a breakthrough in AI coding that also relies on mathematics—and is excelling in benchmarks for reliability.

Harmonic, co-founded by Robinhood CEO Vlad Tenev and Stanford-educated mathematician Tudor Achim, asserts that this latest enhancement to its product called Aristotle (humility aside!) suggests ways to ensure the reliability of AI systems. “Are we condemned to a reality where AI merely produces errors and humans can’t effectively verify it? That would be an absurd existence,” remarks Achim. Harmonic’s answer involves employing formal methods of mathematical reasoning to validate an LLM’s output. It specifically encodes results in the Lean programming language, recognized for its verification capabilities. However, it’s important to note that Harmonic’s focus has been limited so far—its primary goal is the quest for “mathematical superintelligence,” with coding being a natural extension. Areas like history essays—which cannot be mathematically validated—lie outside its current scope. For now.

Nevertheless, Achim appears to believe that achieving reliable agentic behavior isn’t as problematic as some skeptics think. “I would argue that most models at this stage possess the necessary level of intelligence to effectively reason through planning a travel itinerary,” he asserts.

Both perspectives hold merit—or perhaps they even converge. On one hand, there’s consensus that hallucinations will persist as a troublesome reality. In a paper released last September, OpenAI researchers stated, “Despite significant advancements, hallucinations remain a persistent challenge in the field and continue to affect even the latest models.” They validated this claim by prompting three models, including ChatGPT, to identify the lead author’s dissertation title. All three fabricated titles and incorrectly reported the publication year. In a blog post discussing the findings, OpenAI solemnly noted that in AI models, “accuracy will never achieve 100 percent.”

https://in.linkedin.com/in/rajat-media

Helping D2C Brands Scale with AI-Powered Marketing & Automation 🚀 | $15M+ in Client Revenue | Meta Ads Expert | D2C Performance Marketing Consultant