Using Local Coding Agents

Ahead of AI · Sebastian Raschka, PhD · 2026-06-27

Sebastian Raschka publishes a detailed tutorial on running a fully local coding agent stack using Qwen3.6 35B-A3B served via Ollama and connected to Qwen-Code, Codex, or Claude Code, including security audits, speed benchmarks, and capability evaluations across harnesses.

Open original ↗

Extraction

Topics: local-llmscoding-agentsopen-weight-modelsagent-harnessesmlops

Claims

Qwen3.6 35B-A3B is currently the strongest open-weight model in its size class for local coding agent use according to recent benchmarks.
Codex outperformed Qwen's native Qwen-Code harness for Qwen3.6 on a small agent capability benchmark, suggesting harness-model pairing assumptions may not hold.
Claude Code consumes significantly more input tokens than Codex or Qwen-Code because it accumulates larger prompt-side history across agent turns.
Qwen-Code sends telemetry and metadata to Alibaba/Aliyun endpoints by default even when using a fully local Ollama model, requiring explicit opt-out.
Any coding agent harness warrants a security audit for data egress, file-write blast radius, and prompt injection surfaces before installation on a primary machine.

Key quotes

The most important part is not getting one specific tool installed, but understanding the model-serving layer, the agent harness, the permission model, and how to evaluate whether the setup actually solves coding tasks reliably.

Even with local Ollama, Qwen Code can send usage telemetry and metadata to Alibaba/Aliyun endpoints unless usage statistics and telemetry are disabled.

Claude Code uses by far the most tokens on average, Codex the least... the difference mainly comes from input tokens rather than output tokens — Claude is repeatedly feeding more context back into the model across turns.