The Information Machine

Gemini 3.5 Flash: Release, Pricing, and Tooling

open · v1 · 2026-05-20 · 3 items

What

Google has released Gemini 3.5 Flash with pricing that is 3x higher than Gemini 3 Flash Preview and 6x higher than Gemini 3.1 Flash-Lite, landing at $1.50/million input tokens and $9/million output tokens [1]. Despite the cost increase, Google is deploying the model across its flagship consumer and enterprise products, including the Gemini app and AI Mode in Google Search [1]. Benchmark testing found that running Gemini 3.5 Flash (high) cost $1,551.60 — more than the $892.28 for Gemini 3.1 Pro Preview [1]. Tooling support followed quickly: the llm CLI plugin gained a gemini-3.5-flash identifier in version 0.32, and an alpha (0.32a0) added streaming reasoning token support [2][3].

Why it matters

Gemini 3.5 Flash's pricing nearly matches Gemini 3.1 Pro despite carrying the 'Flash' (efficiency-tier) branding, blurring the traditional cost/capability segmentation [1]. Analyst commentary frames this as part of a coordinated industry shift: all three major AI labs appear to be simultaneously testing how much API customers will pay, signaling a possible end to the race-to-zero pricing dynamic that defined early LLM competition [1].

Open questions

  • Does Gemini 3.5 Flash's capability improvement justify its steep price premium over predecessors, or are customers paying more for roughly equivalent performance? [1]

  • How can Google justify deploying a significantly more expensive model across free consumer products like the Gemini app and Search AI Mode — is it cross-subsidizing API revenue with consumer margins, or vice versa? [1]

  • Will API customers absorb the price increases across all three major labs, or will pushback slow the apparent industry-wide repricing trend? [1]

  • When will reasoning token streaming in llm-gemini (currently alpha) stabilize into a production release? [3]

Narrative

Google announced Gemini 3.5 Flash as its latest efficiency-tier model, but the pricing tells a different story than the 'Flash' branding implies. At $1.50 per million input tokens and $9 per million output tokens, Gemini 3.5 Flash costs 3x more than Gemini 3 Flash Preview and 6x more than Gemini 3.1 Flash-Lite [1]. Crucially, those figures nearly match Gemini 3.1 Pro ($2/$12), collapsing the price gap between Google's mid-tier and pro offerings. An independent benchmark run illustrated the real-world impact: the Artificial Analysis benchmark cost $1,551.60 on Gemini 3.5 Flash (high), compared to just $892.28 for Gemini 3.1 Pro Preview [1].

At the same time, Google announced it is deploying Gemini 3.5 Flash across its highest-traffic consumer and enterprise surfaces, including the Gemini app and AI Mode in Google Search [1]. This creates an immediate tension: a more expensive model is being rolled into products that are free to end users, raising questions about how Google is accounting for the cost differential and what signal this sends to API customers being asked to absorb higher prices.

Observer Simon Willison frames the pricing shift as part of a broader industry realignment rather than a Google-specific move, noting that Google, OpenAI, and Anthropic all appear to be testing the price tolerance of their API customers simultaneously [1]. This suggests the aggressive discounting that characterized earlier LLM competition may be giving way to a new phase of margin recapture across the industry.

On the tooling side, the llm CLI plugin for Gemini (maintained by the open-source community) moved quickly to add support for the new model. Version 0.32 added the gemini-3.5-flash model identifier [2], and a concurrent alpha release (0.32a0) introduced the ability to stream reasoning tokens from Gemini models, requiring llm>=0.32a0 to function [3]. These releases are incremental infrastructure updates rather than capability analysis, but they mark the point at which Gemini 3.5 Flash became accessible via the standard llm CLI workflow.

Timeline

  • 2026-05-19: Gemini 3.5 Flash released with pricing at $1.50/M input, $9/M output — 3x the cost of Gemini 3 Flash Preview; Google announces deployment across Gemini app and Search AI Mode [1]
  • 2026-05-19: llm-gemini 0.32 released, adding the gemini-3.5-flash model identifier to the llm CLI plugin [2]
  • 2026-05-19: llm-gemini 0.32a0 alpha released, adding streaming reasoning token support for Gemini models [3]

Perspectives

Simon Willison

Analytically skeptical of the price increases; documents the cost jump with benchmark data and frames it as part of an industry-wide repricing trend affecting all three major labs. Notes the irony of Google deploying a pricier model in free consumer products.

Evolution: Consistent across all three posts; the two tooling announcements are minimal release notes, while the pricing post carries the analytical weight.

Tensions

  • Google positions Gemini 3.5 Flash as a Flash-tier (efficiency) model, but its pricing nearly matches Gemini 3.1 Pro — undermining the traditional cost/capability tier distinction and raising the question of whether 'Flash' still signals value for API customers. [1]

Status: active but too new to trend

Sources

  1. [1] Gemini 3.5 Flash: more expensive, but Google plan to use it for everything — Simon Willison (2026-05-19)
  2. [2] llm-gemini 0.32 — Simon Willison (2026-05-19)
  3. [3] llm-gemini 0.32a0 — Simon Willison (2026-05-19)