A Nature Medicine study found general-purpose LLMs are now outperforming dedicated medical AI products on physician-revi…

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-12

A Nature Medicine study finds that general-purpose LLMs including GPT-5.2, Gemini 3.1 Pro, and Claude Opus 4.6 outperform dedicated medical AI products OpenEvidence and UpToDate Expert AI on physician-reviewed clinical tasks.

Open original ↗

Appears in

General vs. Specialized AI in Clinical Settings: Competing Benchmark Findings

Extraction

Topics: medical-aillm-benchmarksclinical-aihealthcare-ai

Claims

General-purpose LLMs now outperform dedicated medical AI products on physician-reviewed clinical tasks.
GPT-5.2, Gemini 3.1 Pro, and Claude Opus 4.6 were benchmarked against OpenEvidence and UpToDate Expert AI on medical exam questions.
The findings were published in Nature Medicine, a peer-reviewed journal, lending the results significant credibility.

Key quotes

A Nature Medicine study found general-purpose LLMs are now outperforming dedicated medical AI products on physician-reviewed clinical tasks.

The authors compared OpenEvidence and UpToDate Expert AI with GPT-5.2, Gemini 3.1 Pro, and Claude Opus 4.6 on medical exam questions.