A Nature Medicine study found general-purpose LLMs are now outperforming dedicated medical AI products on physician-revi…
Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-12
A Nature Medicine study finds that general-purpose LLMs including GPT-5.2, Gemini 3.1 Pro, and Claude Opus 4.6 outperform dedicated medical AI products OpenEvidence and UpToDate Expert AI on physician-reviewed clinical tasks.
Extraction
Topics: medical-aillm-benchmarksclinical-aihealthcare-ai
Claims
- General-purpose LLMs now outperform dedicated medical AI products on physician-reviewed clinical tasks.
- GPT-5.2, Gemini 3.1 Pro, and Claude Opus 4.6 were benchmarked against OpenEvidence and UpToDate Expert AI on medical exam questions.
- The findings were published in Nature Medicine, a peer-reviewed journal, lending the results significant credibility.
Key quotes
A Nature Medicine study found general-purpose LLMs are now outperforming dedicated medical AI products on physician-reviewed clinical tasks.
The authors compared OpenEvidence and UpToDate Expert AI with GPT-5.2, Gemini 3.1 Pro, and Claude Opus 4.6 on medical exam questions.