LLMs believe false statements even after explicit warnings that they're false
Ars Technica AI · Kyle Orland · 2026-05-28
New preprint research finds that LLMs absorb false statements as beliefs even when those statements are explicitly labeled as false in training data, a phenomenon called 'negation neglect' that researchers say may explain AI hallucinations.
Appears in
Extraction
Topics: llm-traininghallucinationnegation-neglectbelief-implantationai-research
Claims
- LLMs learn from statistical patterns in training text rather than from explicit framing or labels marking content as false.
- False statements explicitly labeled as false during training still become implanted as beliefs in LLM representations.
- Researchers demonstrated the effect using outrageously false claims embedded in thousands of synthetically generated documents.
- The negation neglect phenomenon may help explain why LLMs frequently hallucinate false information.
- The findings have implications for how high-quality AI training data should be structured.
Key quotes
They appear to learn from the statistical patterns in their training text more than from explicit framing around it.
Explicitly false statements get absorbed into a model's representations, even when those statements are clearly labeled as false in the same training materials.