A near-autonomous AI chemist improves a challenging reaction in medicinal chemistry

OpenAI Blog · 2026-06-17

OpenAI and Molecule.one demonstrate that GPT-5.4, operating near-autonomously through an automated high-throughput chemistry lab, improved Chan-Lam coupling yields for over 80% of tested substrates by independently identifying TEMPO as a surprising additive — a result subsequently validated at bench scale by human chemists.

Open original ↗

Appears in

OpenAI GPT-Rosalind: Specialized Biology Model with Biodefense Gating

Extraction

Topics: ai-sciencedrug-discoveryautonomous-ai-agentsmedicinal-chemistryai-research-automation

Claims

GPT-5.4 independently identified primary sulfonamides as a high-value, challenging substrate class and proposed mild oxidants including TEMPO as a promising additive to improve Chan-Lam coupling, a hypothesis human chemists found surprising.
The AI-proposed optimization improved yields for 88% of boronic acids and 83% of sulfonamides tested, raising mean yield from 16.6% to 25.2% and tripling the share of reactions exceeding 30% yield.
The result was validated at bench scale by human chemists, with yield improvements in 11 of 14 substrate pairs, most by more than twofold.
The system ran 10,080 reactions in Maria Lab across two experimental cycles — more than a chemist performing three daily reactions would complete in a decade.
Human judgment remained essential throughout: chemists selected proposals, corrected experimental plans, managed physical lab infrastructure, and conducted bench-scale validation.
The workflow is deliberately scoped to a legitimate medicinal-chemistry problem and the paper explicitly states the results should not be read as evidence the system can assist with harmful chemical applications.

Key quotes

That scale mattered because chemistry results can be misleading when they are tested on only a few examples. A reaction can look promising on one pair of starting materials, but fail across a broader set of molecules. Thousands of reactions made it possible to identify TEMPO among ten tested oxidants, see the effect repeat across diverse combinations, and find its limitations.

This work shows that a model can make a useful contribution in organic chemistry. It did more than summarize the literature or suggest a one-off experiment: it proposed a specific surprising hypothesis and surfaced it for human review, designed experiments, interpreted experimental data, and designed follow-up experiments.

It does not show that AI can independently run a chemistry research program from end to end. Human judgment remained essential, and the workflow depended on specialized high-throughput infrastructure.