Welcome to open-source-legislation, a platform dedicated to democratizing access to global legislative data. This repository serves as a foundational tool for developers, legal professionals, and researchers to build applications using primary source legislation. Download legislation data in our standardized SQL format and immediate start building with our Python and Typescript (Coming Soon!) SDKs. We are striving to eliminate the incredible barriers to accessing primary source legislation data by building a comprehensive library of global legislation.
-
Global Repository of Scraped Legislation: Tap into our extensive database featuring detailed legislative content from countries and jurisdictions worldwide. The bottom line is we want to make this data easy to begin building with.
-
Download Processed SQL Legislation Data: Primary source legislation is scraped and processed into SQL files with rich metadata tagging, making it easy to download and integrate directly into your databases. Download one or many different corpus of legislation from different countries and jurisdictions.
- Unified Legislation Schema: Legislation is modeled within a sophisticated SQL knowledge graph schema, designed to support complex queries and relational data exploration, enhancing both the depth and breadth of legislative analysis. Connections between nodes of legislation in the same corpus and different corpus are now possible, unlocking powerful cross-corpus and cross-jurisdiction connections. Below, is an example Section from the US Code of Federal Regulations, showcasing a Section which contains direct references to other pieces of legislation within the CFR.
The scraper file extracts and processes text into our unified schema, which allows for the direct connections between nodes in our graphs. This allows for incredibly powerful graph traversal.
- Large Language Model Readiness: The structure and availability of data are optimized for use with Large Language Models, facilitating advanced computational legal studies and AI-driven applications. Embedding fields are pre-generated and available out of the box (Donations welcome)
Ask Abe, a legal education assistant developed in parallel with this project, showcases the capabilities of LLM applications built using open-source-legislation.
- Python SDK: Utilize our Python SDK based on Pydantic to seamlessly interface with the legislation data. This SDK simplifies the process of data handling the unified schema, making it straightforward to implement robust data pipelines. Pydantic models provide instant data validation, helper functions for data transformation (node_text into JSON, XML, string), and allow for easy integration with the Instructor library for LLM prompting.
See more documentation in (TODO: Add link to documentation and write documentation)
-
TypeScript SDK (Coming Soon): Anticipate the release of our TypeScript SDK, which will provide additional flexibility for developing client-side applications and services.
-
Customizable Scraping Tools: All scraping and processing tools are open-source and fully customizable. Users are encouraged to modify, extend, or enhance these tools to suit their specific needs or to contribute back to the community.
Country | Jurisdiction | Corpus | Status | Download | Source Code | |
---|---|---|---|---|---|---|
mhl - Republic of the Marshall Islands | federal - Federal Jurisdiction | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | ak - Alaska | Statutes | 🟡 | Testing | N/A | view |
us - United States | al - Alabama | Code of Alabama | 🟢 | Complete | download | view |
us - United States | az - Arizona | Statutes | 🟢 | Complete | download | view |
us - United States | ca - California | Code | 🟢 | Complete | download | view |
us - United States | co - Colorado | Statutes | 🔵 | In Progress | N/A | view |
us - United States | ct - Conneticut | Statutes | 🟢 | Complete | download | view |
us - United States | de - Delaware | Statutes | 🟢 | Complete | download | view |
us - United States | federal - Federal Jurisdiction | Code of Federal Regulations - Electronic | 🟠 | Refactoring | N/A | view |
us - United States | federal - Federal Jurisdiction | US Code | 🟠 | Refactoring | N/A | view |
us - United States | federal - Federal Jurisdiction | Aeronautical Information Manual | 🟠 | Refactoring | N/A | view |
us - United States | fl - Florida | Statutes | 🟢 | Complete | download | view |
us - United States | hi - Hawaii | Statutes | 🟡 | Testing | N/A | view |
us - United States | ia - Iowa | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | id - Idaho | Statutes | 🟢 | Complete | download | view |
us - United States | il - Illinois | Statutes | 🟢 | Complete | download | view |
us - United States | in - Indiana | Statutes | 🟢 | Complete | download | view |
us - United States | ks - Kansas | Statutes | 🟡 | Testing | N/A | view |
us - United States | ky - Kentucky | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | la - Louisianna | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | ma - Massachussetts | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | md - Maryland | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | me - Maine | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | mi - Michigan | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | mn - Minnesota | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | mo - Missouri | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | mt - Montana | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | nc - North Carolina | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | nd - North Dakota | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | ne - Nebraska | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | nh - New Hampshire | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | nm - New Mexico | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | ny - New York | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | oh - Ohio | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | or - Oregon | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | pa - Pennsylvania | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | ri - Rhode Island | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | sc - South Carolina | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | sd - South Dakota | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | tx - Texas | Statutes | 🟡 | Testing | N/A | view |
us - United States | ut - Utah | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | va - Virginia | Statutes | 🟢 | Complete | download | view |
us - United States | vt - Vermont | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | wa - Washington | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | wi - Wisconsin | Statutes | 🟠 | Refactoring | N/A | view |
us - United States | wv - West Virginia | Statutes | 🟠 | Refactoring | N/A | view |
Legislation status tracked in real time.
We aim to provide data downloads of primary source legislation for every supported jurisdiction and corpus. Currently, every supported corpus of legislation has a corresponding .sql file available for download. Running this SQL file will create that corpus's corresponding PostgresSQL file using individual insert statements. Below is an example .SQL file for Arizona Statutes.
Note: Corpuses with deprecated schema undergoing refactoring can still be accessed, requiring cloning the repository and running the scrapers manually.
Go to the "## Supported Legislation" table and click on the link of the requested corpus of legislation Download Link. This is a link to a hosted file storage system which will automatically initiate a download. Hosting legislation for free and public downloads can be financially demanding, consider supporting the project or joining the community!
There are different ways to run the SQL file. I recommend using psql. Below are installation and usage instructions.
- Postgres is installed
- PSQL is usable
- Make sure you enable pgvector in your database with:
CREATE EXTENSION vector;
Open Terminal or Command Prompt
Navigate to the Directory Containing the .sql File Use the cd command to navigate to the directory where your .sql file is located.
cd path/to/your/sql/file
Execute the following command to run your .sql file and connect it to your local PostgreSQL database:
psql -U your_username -d your_database -f country_jurisdiction_corpus.sql
Replace your_username with your PostgreSQL username, your_database with the name of your database, and country_jurisdiction_corpus.sql with the name of your .sql file.
Assuming your username is myuser, your database name is mydatabase, and your file is named us_az_statutes.sql, the command would be:
psql -U myuser -d mydatabase -f us_az_statutes.sql
This command will prompt you to enter your PostgreSQL password. After entering the password, it will execute the .sql file and populate your database with the data from the file.
By following these steps, you can successfully download and run the .sql files to create tables for each corpus you need in your PostgreSQL database.
Besides downloading data, this repository contains all of the source code on all supported corpus of legislation and the Python based scrapers which scrape, process, and clean the data. You are free to modify, use, and update these programs as you see fit. If you'd prefer to run them yourselves, which would allow for more regular updates, go for it! You can run these scrapers yourself by following these steps.
Note: Corpus with deprecated schema undergoing refactoring are only usable by manually running scrapers. We hope to finish refactoring soon, and offer bulk data downloads for all supported corpuses.
-
Clone the Repository:
git clone https://github.com/spartypkp/open-source-legislation.git cd open-source-legislation
-
Create a Virtual Environment:
python3 -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install Dependencies:
pip install -r requirements.txt
-
Set Up Database: Ensure you have PostgresSQL installed and set up on your machine. Populate the database config file (TODO)
To run an existing scraper for a specific jurisdiction (e.g., California statutes):
-
Navigate to the Scraper Directory: Going to the "## Supported Legislation" table, click on the corpus of choice's "Source Code" field. This will take you to the location of the specific scraper within the correct corpus, jurisdiction, and country folder.
-
Run the Scraper: Run the Python scraper normally.
TODO
TODO
Populating knowledge graph
More here
Our dream is to curate a platform and community dedicated to providing primary source legislation data in a unified and accessible format. Building applications in the legal field is difficult considering the incredible barriers to accessing legislation data in a standardized format for use with code. A legal engineer wanting to build an application which relies on primary source legislation would first need to spend considerable time and effort sourcing this legislative data before they could even begin to build. We want to remove these barriers, and provide instant and easy access so that our community can just start building. We believe legal data and law itself is a public good, and should be readily and easily accessible for all.
We welcome contributions from the community! Please read our CONTRIBUTING.md for guidelines on how to contribute to the project, including how to add new countries, jurisdictions, and corpuses.
Mega WIP lol
- Top_level_title: The first explicit category used to split up legislation, always found as the first category on a main table of contents page.
- Reserved: Indicates that this piece of legislation (structure or content node) is no longer available because the legislature has restructured, renumbered, or repealed it.
- Soup: The BeautifulSoup object in Python that contains the HTML of the entire current webpage.
Thank you for your interest in open-source-legislation! Together, we can create a comprehensive database of legislative information.