Skip to content

Open-source global legislation data in an SQL knowledge-graph format ideal for use with LLMs: Download legislation data in bulk and immediately start building with our Python/Typescript SDKs. Democratize Legal Knowledge For All

Notifications You must be signed in to change notification settings

spartypkp/open-source-legislation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

61 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Open Sourcing the World's Legislation

Welcome to open-source-legislation, a platform dedicated to democratizing access to global legislative data. This repository serves as a foundational tool for developers, legal professionals, and researchers to build applications using primary source legislation. Download legislation data in our standardized SQL format and immediate start building with our Python and Typescript (Coming Soon!) SDKs. We are striving to eliminate the incredible barriers to accessing primary source legislation data by building a comprehensive library of global legislation.

Features

  1. Global Repository of Scraped Legislation: Tap into our extensive database featuring detailed legislative content from countries and jurisdictions worldwide. The bottom line is we want to make this data easy to begin building with.

  2. Download Processed SQL Legislation Data: Primary source legislation is scraped and processed into SQL files with rich metadata tagging, making it easy to download and integrate directly into your databases. Download one or many different corpus of legislation from different countries and jurisdictions.

  1. Unified Legislation Schema: Legislation is modeled within a sophisticated SQL knowledge graph schema, designed to support complex queries and relational data exploration, enhancing both the depth and breadth of legislative analysis. Connections between nodes of legislation in the same corpus and different corpus are now possible, unlocking powerful cross-corpus and cross-jurisdiction connections. Below, is an example Section from the US Code of Federal Regulations, showcasing a Section which contains direct references to other pieces of legislation within the CFR.

The scraper file extracts and processes text into our unified schema, which allows for the direct connections between nodes in our graphs. This allows for incredibly powerful graph traversal.

  1. Large Language Model Readiness: The structure and availability of data are optimized for use with Large Language Models, facilitating advanced computational legal studies and AI-driven applications. Embedding fields are pre-generated and available out of the box (Donations welcome)

Ask Abe, a legal education assistant developed in parallel with this project, showcases the capabilities of LLM applications built using open-source-legislation.

  1. Python SDK: Utilize our Python SDK based on Pydantic to seamlessly interface with the legislation data. This SDK simplifies the process of data handling the unified schema, making it straightforward to implement robust data pipelines. Pydantic models provide instant data validation, helper functions for data transformation (node_text into JSON, XML, string), and allow for easy integration with the Instructor library for LLM prompting.

See more documentation in (TODO: Add link to documentation and write documentation)

  1. TypeScript SDK (Coming Soon): Anticipate the release of our TypeScript SDK, which will provide additional flexibility for developing client-side applications and services.

  2. Customizable Scraping Tools: All scraping and processing tools are open-source and fully customizable. Users are encouraged to modify, extend, or enhance these tools to suit their specific needs or to contribute back to the community.

Supported Legislation

Country Jurisdiction Corpus Status Download Source Code
mhl - Republic of the Marshall Islands federal - Federal Jurisdiction Statutes 🟠 Refactoring N/A view
us - United States ak - Alaska Statutes 🟑 Testing N/A view
us - United States al - Alabama Code of Alabama 🟒 Complete download view
us - United States az - Arizona Statutes 🟒 Complete download view
us - United States ca - California Code 🟒 Complete download view
us - United States co - Colorado Statutes πŸ”΅ In Progress N/A view
us - United States ct - Conneticut Statutes 🟒 Complete download view
us - United States de - Delaware Statutes 🟒 Complete download view
us - United States federal - Federal Jurisdiction Code of Federal Regulations - Electronic 🟠 Refactoring N/A view
us - United States federal - Federal Jurisdiction US Code 🟠 Refactoring N/A view
us - United States federal - Federal Jurisdiction Aeronautical Information Manual 🟠 Refactoring N/A view
us - United States fl - Florida Statutes 🟒 Complete download view
us - United States hi - Hawaii Statutes 🟑 Testing N/A view
us - United States ia - Iowa Statutes 🟠 Refactoring N/A view
us - United States id - Idaho Statutes 🟒 Complete download view
us - United States il - Illinois Statutes 🟒 Complete download view
us - United States in - Indiana Statutes 🟒 Complete download view
us - United States ks - Kansas Statutes 🟑 Testing N/A view
us - United States ky - Kentucky Statutes 🟠 Refactoring N/A view
us - United States la - Louisianna Statutes 🟠 Refactoring N/A view
us - United States ma - Massachussetts Statutes 🟠 Refactoring N/A view
us - United States md - Maryland Statutes 🟠 Refactoring N/A view
us - United States me - Maine Statutes 🟠 Refactoring N/A view
us - United States mi - Michigan Statutes 🟠 Refactoring N/A view
us - United States mn - Minnesota Statutes 🟠 Refactoring N/A view
us - United States mo - Missouri Statutes 🟠 Refactoring N/A view
us - United States mt - Montana Statutes 🟠 Refactoring N/A view
us - United States nc - North Carolina Statutes 🟠 Refactoring N/A view
us - United States nd - North Dakota Statutes 🟠 Refactoring N/A view
us - United States ne - Nebraska Statutes 🟠 Refactoring N/A view
us - United States nh - New Hampshire Statutes 🟠 Refactoring N/A view
us - United States nm - New Mexico Statutes 🟠 Refactoring N/A view
us - United States ny - New York Statutes 🟠 Refactoring N/A view
us - United States oh - Ohio Statutes 🟠 Refactoring N/A view
us - United States or - Oregon Statutes 🟠 Refactoring N/A view
us - United States pa - Pennsylvania Statutes 🟠 Refactoring N/A view
us - United States ri - Rhode Island Statutes 🟠 Refactoring N/A view
us - United States sc - South Carolina Statutes 🟠 Refactoring N/A view
us - United States sd - South Dakota Statutes 🟠 Refactoring N/A view
us - United States tx - Texas Statutes 🟑 Testing N/A view
us - United States ut - Utah Statutes 🟠 Refactoring N/A view
us - United States va - Virginia Statutes 🟒 Complete download view
us - United States vt - Vermont Statutes 🟠 Refactoring N/A view
us - United States wa - Washington Statutes 🟠 Refactoring N/A view
us - United States wi - Wisconsin Statutes 🟠 Refactoring N/A view
us - United States wv - West Virginia Statutes 🟠 Refactoring N/A view

Legislation status tracked in real time.

Downloading Legislation Data

We aim to provide data downloads of primary source legislation for every supported jurisdiction and corpus. Currently, every supported corpus of legislation has a corresponding .sql file available for download. Running this SQL file will create that corpus's corresponding PostgresSQL file using individual insert statements. Below is an example .SQL file for Arizona Statutes.

Note: Corpuses with deprecated schema undergoing refactoring can still be accessed, requiring cloning the repository and running the scrapers manually.

Find and Download the Corpus's .sql File

Go to the "## Supported Legislation" table and click on the link of the requested corpus of legislation Download Link. This is a link to a hosted file storage system which will automatically initiate a download. Hosting legislation for free and public downloads can be financially demanding, consider supporting the project or joining the community!

Run the SQL File

There are different ways to run the SQL file. I recommend using psql. Below are installation and usage instructions.

Prerequesites

  1. Postgres is installed
  2. PSQL is usable
  3. Make sure you enable pgvector in your database with:
CREATE EXTENSION vector;

Running the SQL File

Open Terminal or Command Prompt

Navigate to the Directory Containing the .sql File Use the cd command to navigate to the directory where your .sql file is located.

cd path/to/your/sql/file

Execute the following command to run your .sql file and connect it to your local PostgreSQL database:

psql -U your_username -d your_database -f country_jurisdiction_corpus.sql

Replace your_username with your PostgreSQL username, your_database with the name of your database, and country_jurisdiction_corpus.sql with the name of your .sql file.

Example

Assuming your username is myuser, your database name is mydatabase, and your file is named us_az_statutes.sql, the command would be:

psql -U myuser -d mydatabase -f us_az_statutes.sql

This command will prompt you to enter your PostgreSQL password. After entering the password, it will execute the .sql file and populate your database with the data from the file.

By following these steps, you can successfully download and run the .sql files to create tables for each corpus you need in your PostgreSQL database.

Running Scrapers Locally

Besides downloading data, this repository contains all of the source code on all supported corpus of legislation and the Python based scrapers which scrape, process, and clean the data. You are free to modify, use, and update these programs as you see fit. If you'd prefer to run them yourselves, which would allow for more regular updates, go for it! You can run these scrapers yourself by following these steps.

Note: Corpus with deprecated schema undergoing refactoring are only usable by manually running scrapers. We hope to finish refactoring soon, and offer bulk data downloads for all supported corpuses.

Setup Instructions

  1. Clone the Repository:

     git clone https://github.com/spartypkp/open-source-legislation.git
     cd open-source-legislation
  2. Create a Virtual Environment:

    python3 -m venv venv
     source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install Dependencies:

    pip install -r requirements.txt
  4. Set Up Database: Ensure you have PostgresSQL installed and set up on your machine. Populate the database config file (TODO)

Running Scrapers

To run an existing scraper for a specific jurisdiction (e.g., California statutes):

  1. Navigate to the Scraper Directory: Going to the "## Supported Legislation" table, click on the corpus of choice's "Source Code" field. This will take you to the location of the specific scraper within the correct corpus, jurisdiction, and country folder.

  2. Run the Scraper: Run the Python scraper normally.

Python SDK

TODO

Typescript SDK

TODO

About Our Schema

Populating knowledge graph

Preparing Bulk Data for Usage with LLM

More here

Democratizing Legal Knowledge for All

Our dream is to curate a platform and community dedicated to providing primary source legislation data in a unified and accessible format. Building applications in the legal field is difficult considering the incredible barriers to accessing legislation data in a standardized format for use with code. A legal engineer wanting to build an application which relies on primary source legislation would first need to spend considerable time and effort sourcing this legislative data before they could even begin to build. We want to remove these barriers, and provide instant and easy access so that our community can just start building. We believe legal data and law itself is a public good, and should be readily and easily accessible for all.

Contributing

We welcome contributions from the community! Please read our CONTRIBUTING.md for guidelines on how to contribute to the project, including how to add new countries, jurisdictions, and corpuses.

Extra Documentation

Mega WIP lol

  • Top_level_title: The first explicit category used to split up legislation, always found as the first category on a main table of contents page.
  • Reserved: Indicates that this piece of legislation (structure or content node) is no longer available because the legislature has restructured, renumbered, or repealed it.
  • Soup: The BeautifulSoup object in Python that contains the HTML of the entire current webpage.

Additional Resources

Thank you for your interest in open-source-legislation! Together, we can create a comprehensive database of legislative information.

About

Open-source global legislation data in an SQL knowledge-graph format ideal for use with LLMs: Download legislation data in bulk and immediately start building with our Python/Typescript SDKs. Democratize Legal Knowledge For All

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages