Did Google’s AI agents really build an operating system for $916?
AI Snake Oil · Sayash Kapoor · 2026-05-22
AI Snake Oil researchers critique Google's claim that AI agents autonomously built an operating system for $916, identifying misleading framing around a 'single prompt' that was thousands of lines long, undefined human-intervention standards, and a lack of released artifacts for independent verification.
Appears in
Extraction
Topics: ai-agentsai-evaluationagent-capabilitiesopen-world-evaluationgoogle-gemini
Claims
- Google's 'single prompt' claim is misleading because the prompt used was thousands of lines long, and the effort required to construct it was not disclosed.
- Google's writeup does not define what constitutes human intervention, leaving key questions about restarts, approvals, and retries unanswered.
- No similarity analysis was performed to determine whether the agents copied existing open-source operating system code rather than generating novel software.
- Google has not released the prompt, the agent-generated code, or execution logs, making independent evaluation of the claims impossible.
- Open-world evaluations of long-horizon tasks represent a valuable emerging paradigm but require new methodological norms that AI vendors' own writeups currently lack.
Key quotes
The blog post says the operating system was built from a single prompt. But halfway through the post, Google discloses that the prompt 'ended up being many thousands of lines' long.
Google has not released the lengthy prompt, the code the agents wrote, or the logs from the run, which makes it impossible to independently evaluate the claims.
We argue that open-world evaluations require a new set of methodological norms. Done right, they can provide a valuable perspective that benchmark-based evaluation cannot.