atomic[.]chat, a desktop app that runs LLMs locally, ran a very revealing comparison for Claude Sonnet 5, Claude Opus 4.…

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-07-01

A benchmark by atomic.chat comparing Claude Sonnet 5, Claude Opus 4.8, Claude Sonnet 4.6, and GPT 5.5 on three physics HTML5 coding demos shows Claude Sonnet 5 matches GPT 5.5 quality while using the fewest tokens at just $0.15 per run, roughly 6x cheaper than GPT 5.5's $0.94.

Open original ↗

Appears in

Anthropic Launches Claude Sonnet 5: Agentic Performance, New Tokenizer, and Per-Task Cost Surprises

Extraction

Topics: llm-benchmarksclaudecost-efficiencycoding-evalsmodel-comparison

Claims

Claude Sonnet 5 completed three physics coding demos using 15,047 tokens at $0.15, the lowest token count and cost of the four models tested.
GPT 5.5 used 31,152 tokens at $0.94, making Claude Sonnet 5 approximately 6x cheaper for equivalent output quality.
Claude Sonnet 5 matched GPT 5.5 performance quality on all three physics crash demo tests.
In the wrecking ball demo, Sonnet 5 outperformed Opus 4.8; in the catapult demo, it outperformed GPT 5.5.
Reviewers note Sonnet 5 still lags on visual detail and graphics quality despite its efficiency advantage.

Key quotes

New Claude Sonnet 5 performs at GPT 5.5 level 6x cheaper!

Sonnet 5 still needs better detail and graphics. But it used fewer tokens than every other model.

Claude Sonnet 5 just matched GPT 5.5 on 3 physics coding demos at 6x lower cost.