atomic[.]chat, a desktop app that runs LLMs locally, ran a very revealing comparison for Claude Sonnet 5, Claude Opus 4.…
Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-07-01
A benchmark by atomic.chat comparing Claude Sonnet 5, Claude Opus 4.8, Claude Sonnet 4.6, and GPT 5.5 on three physics HTML5 coding demos shows Claude Sonnet 5 matches GPT 5.5 quality while using the fewest tokens at just $0.15 per run, roughly 6x cheaper than GPT 5.5's $0.94.
Appears in
Extraction
Topics: llm-benchmarksclaudecost-efficiencycoding-evalsmodel-comparison
Claims
- Claude Sonnet 5 completed three physics coding demos using 15,047 tokens at $0.15, the lowest token count and cost of the four models tested.
- GPT 5.5 used 31,152 tokens at $0.94, making Claude Sonnet 5 approximately 6x cheaper for equivalent output quality.
- Claude Sonnet 5 matched GPT 5.5 performance quality on all three physics crash demo tests.
- In the wrecking ball demo, Sonnet 5 outperformed Opus 4.8; in the catapult demo, it outperformed GPT 5.5.
- Reviewers note Sonnet 5 still lags on visual detail and graphics quality despite its efficiency advantage.
Key quotes
New Claude Sonnet 5 performs at GPT 5.5 level 6x cheaper!
Sonnet 5 still needs better detail and graphics. But it used fewer tokens than every other model.
Claude Sonnet 5 just matched GPT 5.5 on 3 physics coding demos at 6x lower cost.