LLM 0.32a0 is a major backwards-compatible refactor

Simon Willison · Simon Willison · 2026-04-29

(No summary yet for this item — extraction summaries are still backfilling.)

Open original ↗

Appears in

Simon Willison Releases llm 0.32 Alpha Series

Extraction

Topics: llm-python-libraryopen-source-ai-toolingmultimodal-output-streamingapi-designllm-developer-experience

Claims

LLM 0.32a0 replaces the previous prompt/response model with a message-sequence API that allows injecting full prior conversations without relying on SQLite.
The new streaming API exposes responses as typed event parts (text, tool_call_name, tool_call_args, reasoning), enabling downstream consumers to handle mixed-type outputs from modern models.
A new serialization mechanism (to_dict/from_dict) allows Python API users to store and restore responses in any storage layer, decoupling the library from SQLite.
The CLI now renders reasoning/thinking tokens in a different color and streams them to stderr, keeping piped output clean.

Key quotes

model inputs can be represented as a sequence of messages, and model responses can be composed of a stream of differently typed parts

Many of today's models return mixed types of content. A prompt run against Claude might return reasoning output, then text, then a JSON request for a tool call, then more text content.

This new mechanism for streaming different token types means the CLI tool can now display 'thinking' text in a different color from the text in the final response.