When you send a message, the harness will route your request to the appropriate LLM server, then apply some chat templat…

SemiAnalysis Twitter · SemiAnalysis (@SemiAnalysis_) · 2026-07-03

SemiAnalysis explains how harnesses route requests to LLM servers, apply chat templates, and implement the tool-use loop by parsing model-generated JSON, executing the requested tool locally, and returning results to the model for further processing.

Open original ↗

Appears in

SemiAnalysis Demystifies Agentic Coding Harness Architecture: Model vs. Orchestration

Extraction

Topics: llm-harnessestool-usellm-inferenceagentic-coding

Claims

Harnesses route user messages to the appropriate LLM server and apply chat templates to reformat the request.
When a model wants to use a tool, it generates a JSON object describing the tool call, which is embedded in its response body.
The harness parses that tool-use JSON, executes the tool on the host machine, and automatically sends the output back to the model.

Key quotes

If the model chooses a tool, it will literally generate the appropriate tool use JSON and return it in the response body! This tool use is then parsed in the harness on the host machine and executed, and its output is automatically sent back to the LLM for processing.