LLM 0.32a0 is a major backwards-compatible refactor
Simon Willison · Simon Willison · 2026-04-29
(No summary yet for this item — extraction summaries are still backfilling.)
Appears in
Extraction
Topics: llm-python-libraryopen-source-ai-toolingmultimodal-output-streamingapi-designllm-developer-experience
Claims
- LLM 0.32a0 replaces the previous prompt/response model with a message-sequence API that allows injecting full prior conversations without relying on SQLite.
- The new streaming API exposes responses as typed event parts (text, tool_call_name, tool_call_args, reasoning), enabling downstream consumers to handle mixed-type outputs from modern models.
- A new serialization mechanism (to_dict/from_dict) allows Python API users to store and restore responses in any storage layer, decoupling the library from SQLite.
- The CLI now renders reasoning/thinking tokens in a different color and streams them to stderr, keeping piped output clean.
Key quotes
model inputs can be represented as a sequence of messages, and model responses can be composed of a stream of differently typed parts
Many of today's models return mixed types of content. A prompt run against Claude might return reasoning output, then text, then a JSON request for a tool call, then more text content.
This new mechanism for streaming different token types means the CLI tool can now display 'thinking' text in a different color from the text in the final response.