Inside Genebench-Pro

OpenAI Blog · 2026-06-30

OpenAI publishes ten annotated case studies from GeneBench-Pro, a new expert-level genomics benchmark featuring multi-step clinical and population-genetics questions that require integrating long-read sequencing, pharmacogenomics, Mendelian randomization, and ancient DNA analysis.

Open original ↗

Appears in

OpenAI Launches GeneBench-Pro: Expert-Level Genomics Benchmark for Frontier AI

Extraction

Topics: ai-benchmarksgenomicsclinical-aibioinformaticspopulation-genetics

Claims

GeneBench-Pro is a genomics benchmark comprising expert-level, multi-step questions that require integrating heterogeneous tabular datasets across clinical and statistical genomics domains.
The benchmark covers at least ten distinct problem types including clinical utility estimation, lncRNA dependency analysis, cis-multivariable Mendelian randomization, carrier screening, single-cell RNA-seq eQTL modeling, and ancient selection inference.
Each question requires sequential reasoning: models must first identify and correct data artifacts (ambient RNA, low-mappability contacts, label inversions) before answering the primary biological question.
The benchmark is designed to probe failure modes specific to genomics, such as confounding by population structure, batch effects, and structural variant artifacts, rather than generic reasoning.

Key quotes

Estimate whether a synthetic TXR1-directed inhibitor has positive clinical utility in tumors whose target activation is driven by a structural variant.

Transcript-directed evidence has to survive controls for local DNA-locus perturbation, neighbor-gene repression, guide swaps, GC toxicity, and plate effects.

Noisy ancient trajectories are not directly comparable until both loci are placed on the same derived-allele scale and the provided sample-level sequencing-error values are modeled directly.