diff --git a/README.md b/README.md index eb8d98f..6966abb 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ [![Dataset on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/dataset-on-hf-md-dark.svg)](https://huggingface.co/datasets/antoinejeannot/jurisprudence) [![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/antoinejeannot/jurisprudence) -# ✨ Jurisprudence, release v2024.11.04 🏛️ +# ✨ Jurisprudence, release v2025.01.03 🏛️ Jurisprudence is an open-source project that automates the collection and distribution of French legal decisions. It leverages the Judilibre API provided by the Cour de Cassation to: @@ -17,12 +17,12 @@ Whether you're conducting legal research, developing AI models, or simply intere | Jurisdiction | Jurisprudences | Oldest | Latest | Tokens | JSONL (gzipped) | Parquet | |--------------|----------------|--------|--------|--------|-----------------|---------| -| Cour d'Appel | 398,207 | 1996-03-25 | 2024-10-29 | 1,989,416,125 | [Download (1.74 GB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/cour_d_appel.jsonl.gz?download=true) | [Download (2.91 GB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/cour_d_appel.parquet?download=true) | -| Tribunal Judiciaire | 86,266 | 2023-12-14 | 2024-10-29 | 304,283,113 | [Download (275.60 MB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/tribunal_judiciaire.jsonl.gz?download=true) | [Download (456.91 MB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/tribunal_judiciaire.parquet?download=true) | -| Cour de Cassation | 537,471 | 1860-08-01 | 2024-10-25 | 1,107,915,336 | [Download (932.26 MB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/cour_de_cassation.jsonl.gz?download=true) | [Download (1.58 GB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/cour_de_cassation.parquet?download=true) | -| **Total** | **1,021,944** | **1860-08-01** | **2024-10-29** | **3,401,614,574** | **2.92 GB** | **4.93 GB** | +| Cour d'Appel | 408,675 | 1996-03-25 | 2024-12-27 | 2,032,754,639 | [Download (1.78 GB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/cour_d_appel.jsonl.gz?download=true) | [Download (2.97 GB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/cour_d_appel.parquet?download=true) | +| Tribunal Judiciaire | 109,551 | 2023-12-14 | 2024-12-26 | 383,992,085 | [Download (351.19 MB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/tribunal_judiciaire.jsonl.gz?download=true) | [Download (579.59 MB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/tribunal_judiciaire.parquet?download=true) | +| Cour de Cassation | 540,307 | 1860-08-01 | 2024-12-20 | 1,112,016,951 | [Download (936.24 MB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/cour_de_cassation.jsonl.gz?download=true) | [Download (1.58 GB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/cour_de_cassation.parquet?download=true) | +| **Total** | **1,058,533** | **1860-08-01** | **2024-12-27** | **3,528,763,675** | **3.04 GB** | **5.12 GB** | -Latest update date: 2024-11-04 +Latest update date: 2025-01-03 # Tokens are computed using GPT-4 tiktoken and the `text` column. diff --git a/jurisprudence/settings.py b/jurisprudence/settings.py index f27805e..480b431 100644 --- a/jurisprudence/settings.py +++ b/jurisprudence/settings.py @@ -1 +1 @@ -JURISPRUDENCE_LAST_EXPORT_DATETIME = "2024-11-04 01:09:21" +JURISPRUDENCE_LAST_EXPORT_DATETIME = "2025-01-03 13:49:48" diff --git a/release_notes/v2025.01.03.md b/release_notes/v2025.01.03.md new file mode 100644 index 0000000..6966abb --- /dev/null +++ b/release_notes/v2025.01.03.md @@ -0,0 +1,90 @@ +
+ +[![Dataset on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/dataset-on-hf-md-dark.svg)](https://huggingface.co/datasets/antoinejeannot/jurisprudence) [![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/antoinejeannot/jurisprudence) + +# ✨ Jurisprudence, release v2025.01.03 🏛️ + +Jurisprudence is an open-source project that automates the collection and distribution of French legal decisions. It leverages the Judilibre API provided by the Cour de Cassation to: + +- Fetch rulings from major French courts (Cour de Cassation, Cour d'Appel, Tribunal Judiciaire) +- Process and convert the data into easily accessible formats +- Publish & version updated datasets on Hugging Face every few days. + +It aims to democratize access to legal information, enabling researchers, legal professionals and the public to easily access and analyze French court decisions. +Whether you're conducting legal research, developing AI models, or simply interested in French jurisprudence, this project might provide a valuable, open resource for exploring the French legal landscape. + +## 📊 Exported Data + +| Jurisdiction | Jurisprudences | Oldest | Latest | Tokens | JSONL (gzipped) | Parquet | +|--------------|----------------|--------|--------|--------|-----------------|---------| +| Cour d'Appel | 408,675 | 1996-03-25 | 2024-12-27 | 2,032,754,639 | [Download (1.78 GB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/cour_d_appel.jsonl.gz?download=true) | [Download (2.97 GB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/cour_d_appel.parquet?download=true) | +| Tribunal Judiciaire | 109,551 | 2023-12-14 | 2024-12-26 | 383,992,085 | [Download (351.19 MB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/tribunal_judiciaire.jsonl.gz?download=true) | [Download (579.59 MB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/tribunal_judiciaire.parquet?download=true) | +| Cour de Cassation | 540,307 | 1860-08-01 | 2024-12-20 | 1,112,016,951 | [Download (936.24 MB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/cour_de_cassation.jsonl.gz?download=true) | [Download (1.58 GB)](https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/cour_de_cassation.parquet?download=true) | +| **Total** | **1,058,533** | **1860-08-01** | **2024-12-27** | **3,528,763,675** | **3.04 GB** | **5.12 GB** | + +Latest update date: 2025-01-03 + +# Tokens are computed using GPT-4 tiktoken and the `text` column. + +## 🤗 Hugging Face Dataset + +The up-to-date jurisprudences dataset is available at: https://huggingface.co/datasets/antoinejeannot/jurisprudence in JSONL (gzipped) and parquet formats. + +This allows you to easily fetch, query, process and index all jurisprudences in the blink of an eye! + +### Usage Examples +#### HuggingFace Datasets +```python +# pip install datasets +import datasets + +dataset = load_dataset("antoinejeannot/jurisprudence") +dataset.shape +>> {'tribunal_judiciaire': (58986, 33), +'cour_d_appel': (378392, 33), +'cour_de_cassation': (534258, 33)} + +# alternatively, you can load each jurisdiction separately +cour_d_appel = load_dataset("antoinejeannot/jurisprudence", "cour_d_appel") +tribunal_judiciaire = load_dataset("antoinejeannot/jurisprudence", "tribunal_judiciaire") +cour_de_cassation = load_dataset("antoinejeannot/jurisprudence", "cour_de_cassation") +``` + +Leveraging datasets allows you to easily ingest data to [PyTorch](https://huggingface.co/docs/datasets/use_with_pytorch), [Tensorflow](https://huggingface.co/docs/datasets/use_with_tensorflow), [Jax](https://huggingface.co/docs/datasets/use_with_jax) etc. + +#### BYOL: Bring Your Own Lib +For analysis, using polars, pandas or duckdb is quite common and also possible: +```python +url = "https://huggingface.co/datasets/antoinejeannot/jurisprudence/resolve/main/cour_de_cassation.parquet" # or tribunal_judiciaire.parquet, cour_d_appel.parquet + +# pip install polars +import polars as pl +df = pl.scan_parquet(url) + +# pip install pandas +import pandas as pd +df = pd.read_parquet(url) + +# pip install duckdb +import duckdb +table = duckdb.read_parquet(url) +``` + +## 🪪 Citing & Authors + +If you use this code in your research, please use the following BibTeX entry: +```bibtex +@misc{antoinejeannot2024, +author = {Jeannot Antoine and {Cour de Cassation}}, +title = {Jurisprudence}, +year = {2024}, +howpublished = {\url{https://github.com/antoinejeannot/jurisprudence}}, +note = {Data source: API Judilibre, \url{https://www.data.gouv.fr/en/datasets/api-judilibre/}} +} +``` + +This project relies on the [Judilibre API par la Cour de Cassation](https://www.data.gouv.fr/en/datasets/api-judilibre/), which is made available under the Open License 2.0 (Licence Ouverte 2.0) + +It scans the API every 3 days at midnight UTC and exports its data in various formats to Hugging Face, without any fundamental transformation but conversions. + + \ No newline at end of file