Why Yan LeCun Says LLMs Won’t Get Us to AGI – And What Will

Introduction
While most of the world is still fascinated by ChatGPT and large language models (LLMs), one of AI’s most respected pioneers, Yan LeCun, just shook the tech world. At the NVIDIA GTC 2025 conference, he said something that stopped the crowd:
Coming from a godfather of AI, that statement isn’t just hot air – it reflects a deep shift in how we should think about artificial general intelligence (AGI). In this blog, we’ll break down LeCun’s argument, explore his proposed solution, and why the future of AI might not be built on next-token prediction.
Who Is Yan LeCun and Why It Matters
Yan LeCun is Meta’s Chief AI Scientist, a Turing Award winner, and one of the three “godfathers of deep learning.” When he says LLMs aren’t the future, the AI world listens. His focus has shifted from language models to building AI systems that can reason, plan, and interact with the physical world.
Why He’s “Over” LLMs
LeCun argues that while LLMs like GPT-4 are impressive, they’re reaching a plateau. Their improvements mostly come from scaling – more data, more compute, more fine-tuning.
“They’re in the hands of product people improving at the margins,” he says.
The hype around LLMs misses a crucial point: understanding the world goes beyond text prediction.
The Four Pillars of Future AI
According to LeCun, the real frontier lies in four key areas:
-
Understanding the physical world
-
Building persistent memory
-
True reasoning
-
Planning capabilities
LLMs only scratch the surface of these. What we need are world models.
What Are World Models?
A world model is how intelligent beings (including us) mentally represent the world. You know a bottle will fall if you push it from the top and slide if you push it from the side. That’s a model built from experience.
LeCun explains that babies form these models in months. But AI today? Still stuck predicting the next word in a sentence.
Why LLMs Can’t Reason About the Real World
-
Tokens are discrete. The real world is continuous and high-dimensional.
-
Text is limited. Reading all the internet’s text = 400,000 years of human reading. A child sees more data in vision by age four than any LLM has ingested via text.
-
Pixel prediction fails. Predicting every detail in a video wastes compute and fails at real-world abstraction.
LeCun believes trying to reason about the world using text tokens is like trying to understand gravity using emojis.
Introducing V-JEPA – The Future of Reasoning
The answer, according to LeCun, lies in V-JEPA – Joint Embedding Predictive Architecture.
What it does:
-
Learns from video, not just text
-
Predicts outcomes in abstract space, not pixel-by-pixel
-
Efficiently learns with few examples, like a human child
It’s not trying to reconstruct everything. It focuses on what matters. If an object in a video disappears or breaks physics, V-JEPA detects it.
In simple terms: It learns what’s possible in the real world.
System 1 vs System 2: The Missing Piece
LeCun references the famous psychology framework:
-
System 1: Fast, intuitive, reactive (e.g., driving on autopilot)
-
System 2: Slow, deliberate, reasoning (e.g., planning a chess move)
Current AI excels at System 1. But AGI needs System 2. And that, he argues, requires new architectures that go beyond transformers and next-token prediction.
What This Means for the Future of AGI
-
LLMs are powerful but limited.
-
Text-only training won’t get us to AGI.
-
Future AI will be multi-modal, learning from vision, memory, physics, and action.
-
Models like V-JEPA may form the foundation of truly intelligent systems.
As companies like Meta, Google, and OpenAI race toward AGI, the direction may soon shift from “chatbots” to world-aware, reasoning agents that operate far beyond text boxes.
Call to Action
Interested in AI beyond LLMs? Start diving into world models, cognitive architectures, and research like V-JEPA. This is where the real breakthroughs will happen.
📩 Subscribe to our newsletter to stay updated on the future of AGI and emerging architectures.
FAQs
1. Who is Yan LeCun?
He’s Meta’s Chief AI Scientist and one of the pioneers of deep learning.
2. Why does he say LLMs aren’t enough?
Because LLMs rely on text prediction and don’t understand the physical world or how to plan and reason in it.
3. What is V-JEPA?
A non-generative AI model that learns by predicting video representations in abstract space – a promising approach for building world-aware systems.
4. What’s the difference between System 1 and System 2 AI?
System 1 is reactive and fast; System 2 involves deep reasoning and planning. LLMs are closer to System 1.
5. Is AGI possible without LLMs?
Yes. LeCun believes AGI will require hybrid systems with world models, planning abilities, and abstract reasoning.