COIN #213. Beyond World Models: From Spatial Intelligence to Co-Intelligent Worlds

Fei-Fei Li’s taxonomy shows how AI may render, simulate, and plan. COIN asks how humans, AI, institutions, and ecosystems govern what becomes real.

Jun 10, 2026

Venkat Ramaswamy

In her recent essay on world models, Fei-Fei Li offers a remarkably clear taxonomy for understanding where AI may be headed next. Rather than focusing on language alone, she describes a future in which AI systems learn to perceive, simulate, and act within dynamic environments.

Her framework centers on three capabilities:

Renderers generate representations of worlds.

Simulators model how those worlds evolve.

Planners determine what actions an agent should take within those worlds.

Together, they form what she describes as an agent–world loop: An agent observes. The world changes. The agent updates its understanding. A new action follows. The cycle repeats.

At one level, this is simply a technical architecture.

At another, it represents a profound shift in how we think about intelligence itself.

The first generation of AI systems learned to operate in language. The next generation may learn to operate in worlds.

Simulation sits at the center of this transition. Rendering can generate appearances. Planning can generate actions. Simulation connects the two by allowing AI systems to reason about consequences before they occur.

That capability may ultimately become one of the defining technologies of the twenty-first century.

Yet what fascinated me most was not the taxonomy itself. It was the discussion that followed. The comments revealed something deeper. They exposed the governance gap hiding inside the world-model revolution.

The Questions That World Models Cannot Answer

Several readers immediately recognized that rendering, simulation, and planning are necessary. But not sufficient.

One commenter raised the issue of referential integrity. How does a world model know that its internal representation still corresponds to reality?

The question sounds technical. In fact, it is philosophical. A model that ceases to correspond to reality eventually becomes detached from the very world it seeks to understand.

Another commenter focused on the feedback loop itself. Planning is not enough. Actions must return consequences. Consequences must return observations. Observations must return learning.

Without this loop, the model remains trapped inside its own imagination.

Others raised an even more important issue.

A planner may know what is possible. But who decides what is permissible?

A simulator may predict consequences. But who owns responsibility for those consequences?

A world model may recommend action. But who authorizes it?

Suddenly, the discussion shifts. The challenge is no longer computational. It becomes institutional.

Social. Ethical. Civilizational.

A world model can tell you that a door can be opened. It cannot tell you whether you should open it. Nor whether you have permission to. Nor who bears responsibility if something goes wrong after you do.

Those questions lie outside the world-model stack itself. And yet they increasingly determine whether intelligence creates value or creates harm.

At that moment, we move beyond world models. We enter the domain of co-intelligence.

Figure: From World Models to Co-Intelligent Worlds

The Missing Layer: Governance

What the discussion around world models reveals is that intelligence does not stop at prediction. Prediction becomes action. Action generates consequences. Consequences create value, risk, learning, trust, and legitimacy.

Those outcomes require governance. This is where the COIN perspective becomes useful.

Fei-Fei Li’s framework helps explain how AI systems may learn to understand worlds.
COIN asks a different question:
Within what architecture should those understandings become legitimate, accountable, and valuable actions?

This distinction matters enormously.

A system can possess extraordinary capabilities and still create poor outcomes.

History is filled with examples of powerful technologies deployed without sufficient governance.

The issue was rarely capability. The issue was how capability became embedded within institutions, incentives, and social systems.

The same challenge now confronts AI.

Where World Models Sit Inside COIN Systems Architecture

From a COIN perspective, world models belong primarily within the lower layers of the intelligence stack. They enhance machinic cognition. They improve reasoning. They enable simulation. They support planning.

In other words, they strengthen the capability side of intelligence.

But capability alone does not create value.

Value emerges only when capability enters relationships.

Inside COIN Systems Architecture, intelligence becomes meaningful through six interconnected layers:

Shared digitalized infrastructures provide the foundation.
Machinic cognition supplies reasoning and simulation.
Co-agency environments coordinate humans and AI systems.
Interactive engagements generate action and learning.
Life-experiences create meaning.
Ecosystems evolve toward broader impacts.

This progression is important.

World models help explain how machines learn about worlds.

COIN helps explain how societies learn with them.

The distinction is subtle. But profound.

From Outputs to TDIs

One of the recurring themes throughout the COIN series has been that intelligence is not merely produced. It is enacted.

The most important artifact emerging from a co-intelligent system is therefore not a model output.

It is a TDI. A Tokenized Dynamic Intelligence.

TDIs capture the relational traces generated when intelligence moves through a system.

They connect observations, simulations, actions, permissions, risks, feedback, learning, and outcomes.

In effect, TDIs become the memory of co-intelligent systems.

A world model may produce a recommendation. A TDI records what happened when that recommendation entered reality.

Who approved it. Who participated. What changed. What was learned. What risks emerged. What value was realized.

This is why TDIs are so important. They transform intelligence from prediction into accountable learning. They connect models to reality. And reality remains the ultimate judge.

Reality Must Remain Sovereign

Perhaps the deepest lesson from the world-model discussion is that reality itself must remain sovereign.

Models compress reality. Simulations approximate reality. Planners imagine futures.

But reality always exceeds representation.

This is not a weakness of AI. It is simply the nature of knowledge.

Every scientific theory. Every economic forecast. Every organizational strategy. Every AI model.

All remain partial views of a larger world. That is why societies need more than increasingly capable models. They need learning-and-discovery architectures capable of continuously testing representations against reality.

This has been a recurring theme across recent COIN essays. The future belongs not merely to those who build powerful intelligence. It belongs to those who build powerful relational learning systems, seeing intelligence as a recursive resource.

Beyond Human-Centered AI

For years, the dominant aspiration has been human-centered AI. The phrase remains important. But increasingly, it may be too narrow.

The future is not simply about placing humans at the center of AI systems.
It is about designing systems in which humans, AI systems, institutions, organizations, ecosystems, and environments learn together.

Consider healthcare. Or education. Or financial systems. Or scientific discovery.

No single actor sits at the center. Instead, intelligence emerges through relationships.
Humans learn. AI learns. Organizations adapt. Institutions evolve. Ecosystems reorganize.

The challenge becomes one of not only orchestrating, but governing learning across the entire system.

That is the essence of co-intelligence.

The Next Frontier

Fei-Fei Li’s taxonomy may ultimately be remembered as one of the clearest explanations of how AI moves beyond language and into worlds. Renderers help machines visualize worlds. Simulators help machines reason about worlds. Planners help machines act within worlds.

COIN asks what comes next:

How do those actions become legitimate?
How do they become accountable?
How do they generate learning?
How do they create value?
How do they contribute to human flourishing rather than merely machine capability?

These are not technical questions alone. They are the very questions of governance: from education to participation and from institutions to civilization design.

World models help AI understand, imagine, simulate, and act in worlds.

COIN asks the next question.

Within what architecture should those actions become legitimate, valuable, and wise?

That may prove to be one of the defining questions of the Co-Intelligence Revolution.

The Co-Intelligence Revolution’s Substack

Ready for more?