A massive legal datasets just dropped on Huggingface.
Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-21
Researchers release LOCUS-v1 on HuggingFace, an AI-compiled dataset of 2.2 million U.S. laws representing the first comprehensive machine-readable database of all American legislation.
Extraction
Topics: legal-aiopen-datasetslegal-technlp
Claims
- AI was used to gather, perform OCR, process, and build a database of every law in America.
- The dataset contains 2.2 million laws and is the first comprehensive compilation of its kind.
- LOCUS-v1 is publicly available on HuggingFace under the LocalLaws organization.
Key quotes
For the first time, researchers used AI to gather, run optical character recognition, process, and build a database of every law in America.