The Information Machine

Mira Murati's Thinking Machines made Bridgewater’s private expert judgment trainable, beating frontier models with 29.8%…

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-07-03

Rohan Paul describes how Mira Murati's Thinking Machines fine-tuned a model for Bridgewater Associates using high-quality expert investor labels, achieving 29.8% fewer errors than the best frontier model at 13.8x lower inference cost on a financial document triage task.

Open original ↗

Appears in

Extraction

Topics: llm-fine-tuningexpert-knowledge-distillationfinancial-aienterprise-ai

Claims

  • Naive prompting of frontier models yields only 46–50% accuracy on financial document triage; expert prompts raise this to 74–78%.
  • Fine-tuning on expert investor labels beat the best frontier model with 29.8% fewer errors and 13.8x lower inference cost.
  • Non-expert labels failed because the triage task depends on investor taste and judgment, not surface-level financial language comprehension.
  • Bridgewater improved label quality by routing model-disputed cases back to expert investors for review.
  • Training combined interleaved batches, CISPO loss, and on-policy distillation from stronger teacher checkpoints to maintain stability without brittle shortcuts.

Key quotes

This is a serious signal for enterprise AI, that bringing private judgment in the loop beats general intelligence.
The breakthrough came from replacing written rules with high-quality labels from expert investors.
The model then learned patterns that experts could recognize, but could not fully verbalize.