Claude Knew It Was Being Tested in 26% of Benchmark Runs — Anthropic's NLA Data Explained

reactive:claude-evaluation-awareness

(No summary yet for this item — extraction summaries are still backfilling.)

Appears in