The Information Machine

A massive legal datasets just dropped on Huggingface.

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-21

Researchers release LOCUS-v1 on HuggingFace, an AI-compiled dataset of 2.2 million U.S. laws representing the first comprehensive machine-readable database of all American legislation.

Open original ↗

Extraction

Topics: legal-aiopen-datasetslegal-technlp

Claims

  • AI was used to gather, perform OCR, process, and build a database of every law in America.
  • The dataset contains 2.2 million laws and is the first comprehensive compilation of its kind.
  • LOCUS-v1 is publicly available on HuggingFace under the LocalLaws organization.

Key quotes

For the first time, researchers used AI to gather, run optical character recognition, process, and build a database of every law in America.