Today’s AI agents still struggle to pass real human-verification checks (CAPTCHAs) on websites.

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-13

A new research benchmark called HLL reveals that current AI agents fail to reliably solve real-world CAPTCHA human-verification challenges, testing agents across 10 task types involving visual perception, interaction, and state tracking.

Open original ↗

Appears in

AI Agents Underperform Real-World Tasks: CAPTCHAs, Expert Benchmarks, and Memory Quality Failures

Extraction

Topics: ai-agentscaptcha-benchmarkshuman-verificationagent-evaluation

Claims

Current AI agents still struggle to pass real CAPTCHA human-verification checks on live websites.
The HLL benchmark evaluates agents across 10 distinct CAPTCHA task types.
Solving CAPTCHAs requires integrated capabilities: visual page understanding, precise clicking or dragging, state tracking, and answer submission.

Key quotes

Today's AI agents still struggle to pass real human-verification checks (CAPTCHAs) on websites.

The paper proposes HLL, a benchmark where agents must solve 10 types of CAPTCHA tasks by seeing the page, clicking or dragging correctly, tracking state, and submitting the answer.