Antah AI

Often, it is assumed that as larger language models evolve, better models naturally come into the picture. But in practice, LLMs are prone to hallucinations. While hallucinations have decreased significantly over time, they will always exist because of the way transformer architectures are designed. Transformers work on the fundamental principle of predicting the next token given a context. This is why LLMs are still unreliable for tasks like robotics, self-driving cars, or autopilot automation. Nobody would trust an LLM in a plane’s autopilot—it could fail drastically.

Meta’s Approach Introduction of JEPA

Meta, a strong advocate of open-source large language models, has openly opposed relying solely on plain LLMs. Considering how transformers work, Meta supports the Joint Embedding Predictive Architecture (JEPA) instead. JEPA predicts outcomes given partial information, a concept fundamentally different from predicting the next token. It aligns more closely with how humans learn. For example, if a human sees an apple at the corner of a table, they might act to stop it from falling, predicting the outcome based on its position.

LLMs vs JEPA

Now, if we take the same example with a large language model: if the LLM has seen this pattern before, it may predict something irrelevant, like “the apple on the table is ready for breakfast,” which is pure hallucination. JEPA, on the other hand, understands the meaning and the current state, predicting the situation accurately, such as “the apple is at the corner and may fall into the apple tray.”

JEPA + LLM The Future of Autonomous Systems

Over time, JEPA is likely to work in conjunction with LLMs in real-world autonomous agentic actions. JEPA comes in multiple flavors:

A-JEPA: audio input to predict outcomes
V-JEPA: video-based input
T-JEPA: text-based input

Like LLMs, JEPA is a mathematical model—it predicts outcomes, but when combined with an LLM, it allows actions to be taken while minimizing hallucination risks.

Real-World Examples

Robotics: A robot serving a table sees an apple at the corner. A plain LLM might take no action, assuming everything is fine. But with JEPA, it predicts the apple might fall and can place it safely on the table.
Autonomous Cars: A plain LLM might be clueless when encountering a previously unseen pattern. JEPA, however, can understand physics, gravity, friction, and how humans would act next, allowing the autopilot to act correctly without hallucinations.

Why Bigger LLMs Aren’t Always Better

Real-World Examples

Leave a Comment

Comments (0)