Efficient tradeoffs and the safety-usefulness tradeoff model

Alignment Forum · Buck · 2026-06-08

Alignment researcher Buck Shlegeris examines when the safety-usefulness tradeoff model accurately describes AI developer behavior and argues that political dynamics—regulators, public pressure, employees with differing priors—often make it a poor guide for AI safety strategy.

Open original ↗

Appears in

Frontier AI Safety Evaluation: Scheming Research and Evaluation Standards

Extraction

Topics: ai-safety-strategyai-governancealignment-researchai-risksafety-usefulness-tradeoff

Claims

The safety-usefulness tradeoff model accurately describes developer behavior when developers are reasonable actors constrained by competitive pressure, but breaks down when developers respond to third-party political pressure rather than their own cost-benefit analysis.
When AI companies implement safety measures to satisfy regulators, employees, or public opinion, they optimize for appeasement rather than actual safety value according to the safety researcher's own beliefs.
Political feasibility of a safety intervention—including its legibility, verifiability, and constituency support—may matter more in practice than its position on a safety-usefulness Pareto frontier.
AI control techniques are preferable partly because they are more robustly externally evaluable than alignment approaches, making them more resilient to being implemented by companies not sincerely motivated to make them work.
Capability research can indirectly increase an AI company's safety budget through diminishing marginal returns, though this effect is weaker than naively expected due to capability diffusion across companies.

Key quotes

if the developer is acting under pressure from third parties with different beliefs or priorities—regulators, governments, poorly-informed staff, the public—the developer is optimizing for their satisfaction, not for safety-according-to-you.

I think it's valuable to think about 'what are politically feasible asks that are good for AI risk' from a perspective that focuses on aspects of political feasibility other than 'how costly is this to the AI company'.

I stand by the basic point that when you're developing safety techniques, you should pay attention to whether they're going to be incredibly inconvenient and expensive.