The Information Machine

AI Autonomy Without Human Oversight Concerns · history

Version 3

2026-05-23 05:36 UTC · 59 items

What

Simon Willison's two May 2026 essays — one on an AI-managed Stockholm café whose autonomous actions harmed unconsenting third parties [1], and one on 'normalization of deviance' eroding code-review norms in professional software development [2] — have become a focal point for a broader industry conversation. Andrej Karpathy has independently marked the same transition, cited as declaring the 'vibe coding era is ending' in favor of 'agentic engineering' [4], lending a second major voice to the framing Willison drew. Legal scholars and legislators are beginning to engage the liability gap: the AI LEAD Act (S.2937) is moving through the US Senate [5], and academic legal literature is framing AI agents as 'risky agents without intentions' that current tort frameworks cannot adequately address [6][7]. Willison's concern about normalization of deviance dates at least to December 2025 [3], suggesting his May critiques were the culmination of a developing argument rather than an ad hoc reaction.

Why it matters

The convergence of a practitioner critique (Willison), an influential researcher framing (Karpathy), and emerging legislative attention (AI LEAD Act, EU AI Act) suggests that the question of who is responsible when an AI agent harms a third party is moving from blog-post debate toward formal governance. The profession and the law are both racing to develop norms before a high-profile failure forces the issue.

Open questions

  • The AI LEAD Act [5] is in the Senate, but does it address the specific scenario Willison identifies — an AI agent taking outbound actions that impose costs on unconsenting third parties? The bill's text has not been summarized in any tracked item.

  • Karpathy's framing of 'agentic engineering' as the post-vibe-coding paradigm [4] implicitly raises a standard for AI-assisted work — but does his framing include any mandatory oversight requirements, or does it remain a productivity-focused distinction?

  • Willison's normalization-of-deviance concern [2][3] predates his May 2026 essays by at least five months. Has the software profession produced any empirical data on whether code-review intensity has actually declined since AI coding tools became widespread? [9][10]

  • Will the liability frameworks being developed for AI agents [6][7][11] distinguish between AI errors that harm the deploying organization and those that harm third parties who never agreed to interact with the AI — the asymmetry at the heart of Willison's café critique? [1]

Narrative

In early May 2026, Simon Willison published two essays that together diagnosed a shared failure mode in AI autonomy: as AI systems become more reliable, the humans nominally responsible for them quietly reduce oversight — until the system fails in ways that harm third parties or accumulate invisibly in codebases.

The first essay analyzed an experiment by Andon Labs, which deployed an AI manager named Mona at a café in Stockholm (following an earlier AI-run retail store in San Francisco). Mona ordered 120 eggs despite having no stove, purchased 22.5 kg of canned tomatoes for a fresh sandwich menu, sent unsolicited 'EMERGENCY' emails to suppliers to correct its own prior mistakes without human review, and submitted an outdoor seating permit application to police that included a diagram generated without the AI ever observing the actual street [1]. Willison's critique centered on a specific ethical claim: the experiment's error costs were borne not by Andon Labs but by suppliers, public officials, and permit reviewers who had never consented to participate in an AI trial. Any AI agent capable of outbound actions affecting other people must, in his view, require a human in the loop before those actions are executed [1].

The second essay turned inward, to Willison's own experience as a professional engineer using AI coding tools. He argued that what once seemed a clear distinction — 'vibe coding' (casual, low-scrutiny AI-assisted generation) versus 'agentic engineering' (disciplined, reviewed AI-assisted development) — is blurring uncomfortably [2]. As AI coding agents become more reliably correct, professional engineers reduce the intensity of their review. Willison named this 'normalization of deviance': each incremental success without close monitoring raises the engineer's trust, increasing the risk of catastrophic misplaced trust in a future edge case [2]. He also identified a structural break: the entire software development lifecycle — review practices, testing cadences, quality norms — was implicitly calibrated to humans producing a few hundred lines of code per day. That assumption is now broken, and the profession has not yet developed replacement norms [2]. Willison had been building toward this argument since at least December 2025, when he published an earlier piece on normalization of deviance in AI systems more broadly [3].

The framing Willison introduced has since been echoed by other voices. Andrej Karpathy — a foundational figure in AI research and a key reference point for the software engineering community — has been cited as declaring that the 'vibe coding era is ending' and that 'agentic engineering' is next [4], lending a second major voice to the same conceptual boundary Willison drew. Separately, legal scholars and legislators are beginning to engage the liability gap that Willison's café essay made visceral: the AI LEAD Act (S.2937) is in the US Senate [5], academic legal literature frames AI agents as 'risky agents without intentions' that current tort frameworks cannot cleanly handle [6][7], and state-level AI legislation has been proliferating [8]. The question these frameworks face — who bears accountability when an AI agent harms a third party who never consented to interact with it — maps directly onto Willison's critique of the Stockholm café experiment.

Timeline

  • 2025-12-10: Willison publishes 'The Normalization of Deviance in AI,' an early articulation of the concern he would later apply specifically to AI coding tools. [3]
  • 2026-05-05: Willison publishes critique of Andon Labs' AI-managed café experiment in Stockholm, arguing it imposed unacceptable costs on unconsenting third parties. [1]
  • 2026-05-06: Willison publishes essay on the uncomfortable convergence of vibe coding and professional agentic engineering, naming 'normalization of deviance' as a key risk. [2]
  • 2026-05-16: Andrej Karpathy is cited as declaring the vibe coding era is ending and that 'agentic engineering' — orchestrating agents — is the next paradigm. [4]

Perspectives

Simon Willison

AI agents taking autonomous outbound actions affecting unconsenting third parties is currently unethical without mandatory human-in-the-loop controls; and in software development, the profession's oversight norms are eroding faster than new ones are forming, creating 'normalization of deviance.'

Evolution: Consistent across both May 2026 pieces; his December 2025 essay shows the normalization-of-deviance concern predates the coding-specific application by at least five months [3].

Andrej Karpathy

The vibe coding era is ending; the next paradigm is 'agentic engineering' — orchestrating AI agents rather than passively accepting their output.

Evolution: First appearance in this thread; position is a framing/categorization claim rather than a direct response to Willison, but the conceptual boundary he draws aligns with Willison's argument.

Andon Labs

Running autonomous AI agents in real-world commercial settings (retail, café) is a legitimate experimental approach; stance on third-party impacts is not directly stated in tracked items.

Evolution: No direct statement of position; represented only through Willison's description of their experiments.

Legal scholars / regulators

Current liability frameworks are inadequate for AI agents acting as 'risky agents without intentions'; legislative responses including the AI LEAD Act and state-level legislation are emerging.

Evolution: New voice in this thread; no prior synthesis included this perspective.

Tensions

  • Andon Labs treats autonomous AI management of real-world businesses as an acceptable experimental model; Willison argues such experiments are unethical when their error costs are externalized to unconsenting third parties like suppliers and public officials. [1]
  • AI coding tool vendors and early adopters point to productivity gains as justification for reduced oversight; Willison argues this creates 'normalization of deviance' that will eventually produce a costly failure the profession is not yet equipped to prevent. [2][3]
  • Karpathy's framing of 'agentic engineering' as the successor to vibe coding positions the transition as a maturation toward discipline; Willison's concern is that this distinction is already blurring in practice, with professional engineers quietly reducing review intensity as models become more reliable. [4][2]

Sources

  1. [1] Our AI started a cafe in Stockholm — Simon Willison (2026-05-05)
  2. [2] Vibe coding and agentic engineering are getting closer than I'd like — Simon Willison (2026-05-06)
  3. [3] The Normalization of Deviance in AI — reactive:ai-agent-autonomy-risks
  4. [4] karpathy said last week the vibe coding era is ending. the next thing is "agentic engineering" - orchestrating agents ag... — reactive:ai-agent-autonomy-risks (2026-05-16)
  5. [5] Text - S.2937 - 119th Congress (2025-2026): AI LEAD Act — reactive:ai-agent-autonomy-risks
  6. [6] The Law of AI is the Law of Risky Agents Without Intentions — reactive:agentic-coding-debate
  7. [7] liability for ai agents — reactive:ai-agent-autonomy-risks
  8. [8] Summary of Artificial Intelligence 2025 Legislation — reactive:ai-agent-autonomy-risks
  9. [9] AI | 2025 Stack Overflow Developer Survey — reactive:ai-agent-autonomy-risks
  10. [10] The AI wave continues to grow on software development teams — reactive:ai-agent-autonomy-risks
  11. [11] What Is AI Liability in the Agentic Economy? Why Someone Must Be on the Hook | MindStudio — reactive:ai-agent-autonomy-risks