Nvidia's Cosmos 3: 1 model that can understand, simulate, and act across many physical AI tasks.
Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-13
Nvidia's Cosmos 3 is a single unified model designed to understand, simulate, and act across physical AI tasks by treating action as a first-class modality rather than deriving it from passive perception of images or video.
Appears in
Extraction
Topics: world-modelsphysical-ainvidiaembodied-ai
Claims
- Nvidia's Cosmos 3 is a single model capable of understanding, simulating, and acting across physical AI tasks.
- The model treats action as a 'first-class language of the world,' not a secondary output derived from perception.
- Most existing AI models engage with reality passively, converting images and videos into descriptions rather than actions.
- Cosmos 3 represents a paradigm shift in how AI systems relate to physical reality.
Key quotes
1 model that can understand, simulate, and act across many physical AI tasks.
It treats action as a first-class language of the world.
Most AI models look at reality from the outside: images become captions, videos become descriptions, and motion becomes...