Great piece from Dr. Fei-Fei Li (@drfeifei)
Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-04
Dr. Fei-Fei Li argues that because the physical world is not made of words, AI models capable of simulation—not just language—are needed to ground understanding in pixels and action predictions for embodied agents.
Appears in
Extraction
Topics: embodied-aiworld-modelsllm-limitationssimulation
Claims
- LLMs learn patterns in text, which constrains their understanding to linguistic representations of the world.
- A model that masters simulation can project understanding into pixels for human consumption and into action predictions for embodied agents.
- The physical world is not made of words, pointing to a fundamental ceiling for language-only AI.
Key quotes
The world is not made of words.... A model that masters simulation can project its understanding into pixels for human consumption, and into action predictions for embodied agents.
LLMs learn patterns in text, so they can explain a [world they haven't directly perceived].