The hardest problem in physical AI has never been the model, it has been the data (Save this).
Milk Road AI Twitter · Milk Road AI (@MilkRoadAI) · 2026-06-02
Milk Road AI argues that the central bottleneck for physical AI is data collection rather than model architecture, contrasting it with language models that benefited from centuries of pre-existing human-written text.
Appears in
Extraction
Topics: physical-aitraining-datalanguage-modelsrobotics
Claims
- The hardest problem in physical AI has never been the model architecture but the data required to train it.
- Language models enjoyed an extraordinary and underappreciated training advantage because vast pre-existing human text was available before training began.
- The entire written output of human civilization provided an unprecedented dataset that gave language models a unique head start over physical AI systems.
Key quotes
The hardest problem in physical AI has never been the model, it has been the data (Save this).
Language models had an extraordinary training advantage that almost no one appreciated at the time.