|
| 1 | +<p align="center"><img src="https://raw.githubusercontent.com/antoinejeannot/jurisprudence/artefacts/jurisprudence.svg" width=650></p> |
| 2 | + |
| 3 | +[](https://huggingface.co/datasets/antoinejeannot/jurisprudence) [](https://github.com/antoinejeannot/jurisprudence) |
| 4 | + |
| 5 | +# ✨ Jurisprudence, release v2024.10.28 🏛️ |
| 6 | + |
| 7 | +Jurisprudence is an open-source project that automates the collection and distribution of French legal decisions. It leverages the Judilibre API provided by the Cour de Cassation to: |
| 8 | + |
| 9 | +- Fetch rulings from major French courts (Cour de Cassation, Cour d'Appel, Tribunal Judiciaire) |
| 10 | +- Process and convert the data into easily accessible formats |
| 11 | +- Publish & version updated datasets on Hugging Face every few days. |
| 12 | + |
| 13 | +It aims to democratize access to legal information, enabling researchers, legal professionals and the public to easily access and analyze French court decisions. |
| 14 | +Whether you're conducting legal research, developing AI models, or simply interested in French jurisprudence, this project might provide a valuable, open resource for exploring the French legal landscape. |
| 15 | + |
| 16 | +## 📊 Exported Data |
| 17 | + |
| 18 | +| Jurisdiction | Jurisprudences | Oldest | Latest | Tokens | JSONL (gzipped) | Parquet | |
| 19 | +|--------------|----------------|--------|--------|--------|-----------------|---------| |
| 20 | +| Cour d'Appel | 396,317 | 1996-03-25 | 2024-10-22 | 1,981,675,335 | [Download (1.74 GB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/cour_d_appel.jsonl.gz?download=true) | [Download (2.90 GB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/cour_d_appel.parquet?download=true) | |
| 21 | +| Tribunal Judiciaire | 82,085 | 2023-12-14 | 2024-10-22 | 291,028,506 | [Download (263.20 MB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/tribunal_judiciaire.jsonl.gz?download=true) | [Download (436.65 MB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/tribunal_judiciaire.parquet?download=true) | |
| 22 | +| Cour de Cassation | 537,252 | 1860-08-01 | 2024-10-24 | 1,107,801,271 | [Download (932.25 MB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/cour_de_cassation.jsonl.gz?download=true) | [Download (1.58 GB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/cour_de_cassation.parquet?download=true) | |
| 23 | +| **Total** | **1,015,654** | **1860-08-01** | **2024-10-24** | **3,380,505,112** | **2.90 GB** | **4.90 GB** | |
| 24 | + |
| 25 | +<i>Latest update date: 2024-10-28</i> |
| 26 | + |
| 27 | +<i># Tokens are computed using GPT-4 tiktoken and the `text` column.</i> |
| 28 | + |
| 29 | +## 🤗 Hugging Face Dataset |
| 30 | + |
| 31 | +The up-to-date jurisprudences dataset is available at: https://huggingface.co/datasets/antoinejeannot/jurisprudence in JSONL (gzipped) and parquet formats. |
| 32 | + |
| 33 | +This allows you to easily fetch, query, process and index all jurisprudences in the blink of an eye! |
| 34 | + |
| 35 | +### Usage Examples |
| 36 | +#### HuggingFace Datasets |
| 37 | +```python |
| 38 | +# pip install datasets |
| 39 | +import datasets |
| 40 | + |
| 41 | +dataset = load_dataset("antoinejeannot/jurisprudence") |
| 42 | +dataset.shape |
| 43 | +>> {'tribunal_judiciaire': (58986, 33), |
| 44 | +'cour_d_appel': (378392, 33), |
| 45 | +'cour_de_cassation': (534258, 33)} |
| 46 | + |
| 47 | +# alternatively, you can load each jurisdiction separately |
| 48 | +cour_d_appel = load_dataset("antoinejeannot/jurisprudence", "cour_d_appel") |
| 49 | +tribunal_judiciaire = load_dataset("antoinejeannot/jurisprudence", "tribunal_judiciaire") |
| 50 | +cour_de_cassation = load_dataset("antoinejeannot/jurisprudence", "cour_de_cassation") |
| 51 | +``` |
| 52 | + |
| 53 | +Leveraging datasets allows you to easily ingest data to [PyTorch](https://huggingface.co/docs/datasets/use_with_pytorch), [Tensorflow](https://huggingface.co/docs/datasets/use_with_tensorflow), [Jax](https://huggingface.co/docs/datasets/use_with_jax) etc. |
| 54 | + |
| 55 | +#### BYOL: Bring Your Own Lib |
| 56 | +For analysis, using polars, pandas or duckdb is quite common and also possible: |
| 57 | +```python |
| 58 | +url = "https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/cour_de_cassation.parquet" # or tribunal_judiciaire.parquet, cour_d_appel.parquet |
| 59 | + |
| 60 | +# pip install polars |
| 61 | +import polars as pl |
| 62 | +df = pl.scan_parquet(url) |
| 63 | + |
| 64 | +# pip install pandas |
| 65 | +import pandas as pd |
| 66 | +df = pd.read_parquet(url) |
| 67 | + |
| 68 | +# pip install duckdb |
| 69 | +import duckdb |
| 70 | +table = duckdb.read_parquet(url) |
| 71 | +``` |
| 72 | + |
| 73 | +## 🪪 Citing & Authors |
| 74 | + |
| 75 | +If you use this code in your research, please use the following BibTeX entry: |
| 76 | +```bibtex |
| 77 | +@misc{antoinejeannot2024, |
| 78 | +author = {Jeannot Antoine and {Cour de Cassation}}, |
| 79 | +title = {Jurisprudence}, |
| 80 | +year = {2024}, |
| 81 | +howpublished = {\url{https://github.com/antoinejeannot/jurisprudence}}, |
| 82 | +note = {Data source: API Judilibre, \url{https://www.data.gouv.fr/en/datasets/api-judilibre/}} |
| 83 | +} |
| 84 | +``` |
| 85 | + |
| 86 | +This project relies on the [Judilibre API par la Cour de Cassation](https://www.data.gouv.fr/en/datasets/api-judilibre/), which is made available under the Open License 2.0 (Licence Ouverte 2.0) |
| 87 | + |
| 88 | +It scans the API every 3 days at midnight UTC and exports its data in various formats to Hugging Face, without any fundamental transformation but conversions. |
| 89 | + |
| 90 | +<p align="center"><a href="https://www.etalab.gouv.fr/licence-ouverte-open-licence/"><img src="https://raw.githubusercontent.com/antoinejeannot/jurisprudence/artefacts/license.png" width=50 alt="license ouverte / open license"></a></p> |
0 commit comments