GPU at ~30% utilization. One CPU core pinned at 100%. Per-request latency dominated by a large fixed cost.
reactive:nvidia-nemotron-ultra · Daniel Svonava (@svonava) · 2026-06-05
(No summary yet for this item — extraction summaries are still backfilling.)