The Information Machine

The hardest problem in physical AI has never been the model, it has been the data (Save this).

Milk Road AI Twitter · Milk Road AI (@MilkRoadAI) · 2026-06-02

Milk Road AI argues that the central bottleneck for physical AI is data collection rather than model architecture, contrasting it with language models that benefited from centuries of pre-existing human-written text.

Open original ↗

Appears in

Extraction

Topics: physical-aitraining-datalanguage-modelsrobotics

Claims

  • The hardest problem in physical AI has never been the model architecture but the data required to train it.
  • Language models enjoyed an extraordinary and underappreciated training advantage because vast pre-existing human text was available before training began.
  • The entire written output of human civilization provided an unprecedented dataset that gave language models a unique head start over physical AI systems.

Key quotes

The hardest problem in physical AI has never been the model, it has been the data (Save this).
Language models had an extraordinary training advantage that almost no one appreciated at the time.