Congrats!! I was impressed to learn about some of the engineering wizardry (e.g. very low voltage domains, cluster sca…

Andrej Karpathy Twitter · Andrej Karpathy (@karpathy) · 2026-06-30

Andrej Karpathy praises Etched's hardware engineering for LLM inference efficiency, highlighting cluster-scale memory and ultra-low voltage domains as key techniques and drawing an analogy to the opposite electrical regime of power transmission lines.

Open original ↗

Appears in

NVIDIA vs. Custom ASICs: GPU Dominance Persists Despite Startup Performance Claims

Extraction

Topics: ai-hardwareinference-efficiencytokens-per-wattetched

Claims

Etched's LLM inference hardware uses very low voltage domains and cluster-scale memory to maximize tokens per watt at interactive speeds.
Achieving interactive token throughput per user requires significant hardware engineering at the extremes of low-voltage, high-current operation.
The electrical regime of efficient LLM inference — very low voltage, high current at tiny distances — is the inverse of power transmission, which uses very high voltage and low current over great distances.

Key quotes

I was impressed to learn about some of the engineering wizardry (e.g. *very* low voltage domains, cluster scale memory, ...) that goes into tokens/watt maxxing of state of the art LLMs at interactive tokens/sec/user.

Esp fun and memorable is the idea that this is engineering at the 'opposite' regime to that of power transmission lines: very low voltage high current (at tiny distances) vs. very high voltage & low current (at great distances).