Legal Text Analytics

A list of selected resources, methods, and tools dedicated to Legal Text Analytics.

Please read the contribution guidelines before contributing. Please add a resource by raising a pull request. We also seek for discussion and proposal of new ideas (including additional content sections) as issues.

Selected Tasks and Use Cases

Back to Top

Optical Character Recognition (find more information here)
Legal Document Pre-processing (find more information here)
Clause Segmentation and Sentence Boundary Detection
Information Extraction and Named Entity Recognition (find more information here)
Legal Norm Classification
Machine Translation
Document Comparison and Semantic Matching
Text Summarization
Argument Mining
Question Answering
Legal Case Outcome Prediction
Legal and Regulatory Monitoring
Reference and Coreference Extraction
Document Assembling and Generation
Voice Transcription
Anomaly Detection
Data Anonymization
Consistency Checking

Methods

Back to Top

NLP Overview
NLP Progress
Text Visualizations
Optical Character Recognition
Rule-based methods for NLP, Apache Ruta, Jape Grammar
Statistical NLP
Machine Learning Frameworks
Neural networks and deep learning for NLP Tutorial
Domain adaptation (e.g., research paper)

Libraries

Back to Top

Spacy - Industrial-Strength Natural Language Processing
Scikit - machine learning in python
NLTK - Natural Language Toolkit
Apache UIMA
Gate - General Architecture for Text Engineering
Hugging Face - more than 300 pre-trained transformer/embedding models for the legal domain
German Bert Model: Deepset AI
Flair - SOTA NLP (incl. biomedical and legal data)
Blackstone - Legal Named Entity Recognition and Text Categorizer
Legal Reference Detection - Neo Search
Legal Reference Detection - Open Legal Data
Haystack - Transformers at scale for question answering & neural search
Sentence Boundary Detection (US Caselaw)
Quantitative Legal Studies
CiteURL - an extensible tool to detect and hyperlink legal citations
LexNLP – Python NLP library for legal text analytics
Dutch Case Law Extractor - Functions to obtain published Dutch case law (rechtspraak) data and available metadata associated to the cases
Case Law Explorer - Materials for building a network analysis software platform for analyzing Dutch and European court decisions

Datasets and Data

Back to Top

BUILDNyAI
NLP Datasets
An 800GB Dataset of Diverse Text for Language Modeling
Meta Search: Google Dataset Search
OpenLegalData
Belgium: Belgian Statutory Article Retrieval Dataset (BSARD), including code
German NLP Resource: Awesome German NLP
Legal Entity Recognition
Legal Text Summarization
Legal Text Translation
Legal Document Classification
Legal Sentence Classification (German)
100k German Court Decisions
Legal Paper Datasets
LexGLUE: a Benchmark Dataset for Legal Language Understanding in English
Awesome Legal Data
Germany: Gesetze im Internet, Rechtsprechung im Internet, Verwaltungsvorschriften im Internet
Germany: Annotated Court Decisions (Judgment style)
Germany: German Federal Courts Dataset
Germany: Quantitative dataset of asylum court hearings at German administrative courts. ASYFAIR
Germany: Aktenzeichen der Bundesrepublik Deutschland (AZ-BRD)
Germany: Corpus des Deutschen Bundesrechts (C-DBR)
Germany: Corpus der Entscheidungen des Bundesverfassungsgerichts (CE-BVerfG)
Germany: Corpus der amtlichen Entscheidungssammlung des Bundesverfassungsgerichts (C-BVerfGE)
Germany: Corona-Rechtsprechung des Bundesverfassungsgerichts (BVerfG-Corona)
Germany: Corpus der Entscheidungen des Bundesverwaltungsgerichts (CE-BVerwG)
Germany: Corpus der Entscheidungen des Bundesarbeitsgerichts (CE-BAG)
Germany: Corpus der Entscheidungen des Bundespatentgerichts (CE-BPatG)
Germany: Presidents and Vice-Presidents of the Federal Courts of Germany (PVP-FCG)
Germany: Stoppwörter der Deutschen Rechtssprache (SW-DE-RS)
France: The French Court Decision Structure dataset — FCD12K
Switzerland: Swiss Legislation Corpus French and German
Turkey: Prediction of Outcomes in the Higher Courts of Turkey
India: Indian Legal Documents Corpus for Court Judgment Prediction and Explanation
ECtHR: Judicial Decisions of the European Court of Human Rights
European Court of Human Rights Argument Mining Corpus
EU Law (eurlex R Package), Digital Corpus of the European Parliament (DCEP)
EU Regulatory Compliance Information Retrieval
Israel: The Israeli Supreme Court Database
Canada: Federal Laws and Regulations (ftp://205.193.86.89/)
UK: UK Law Reports & Case Law Search
US Statutory Law Interpretation Data Set
US Caselaw Sentence Boundary Detection Dataset
US Caselaw Functional and Issue Specific Segmentation Dataset
US Caselaw Sentence Polarity Detection
US Caselaw Access Project
US Supreme Court Database
US House of Representatives Office of the Law Revision Counsel
US Board of Veterans Appeals (BVA) Citation Prediction Dataset and Code
Overview of Political Science Datasets: PolData
International Law: Text of Trade Agreements (ToTA)
International Law: Corpus of Decisions: International Court of Justice (CD-ICJ)
International Law: Corpus of Decisions: Permanent Court of International Justice (CD-PCIJ)
United Nations: United Nations General Debate Corpus, United Nations Parallel Corpus
Contract Understanding Atticus Dataset by The Atticus Project: A corpus of 13,000+ labels in 510 commercial legal contracts with rich expert annotations.
Kira Systems M&A Dataset by Kira Systems: A non-commercial use dataset comprising 4,400 documents and labels for 50 legal concepts in the M&A Due Diligence setting.

Annotation and Data Schemes

Back to Top

Annotation Tools

Back to Top

Software (interfaces)

Back to Top

Research Groups, Labs, and Communities

Back to Top

Tutorials

Back to Top

Credits

Back to Top

See contributors and committers (and many more).

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
images		images
LICENSE		LICENSE
README.md		README.md
contributing.md		contributing.md
use-cases-details.md		use-cases-details.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Legal Text Analytics

Contents

Selected Tasks and Use Cases

Methods

Libraries

Datasets and Data

Annotation and Data Schemes

Annotation Tools

Software (interfaces)

Research Groups, Labs, and Communities

Tutorials

Credits

About

Releases

Packages

License

shounakpaul95/Legal-Text-Analytics

Folders and files

Latest commit

History

Repository files navigation

Legal Text Analytics

Contents

Selected Tasks and Use Cases

Methods

Libraries

Datasets and Data

Annotation and Data Schemes

Annotation Tools

Software (interfaces)

Research Groups, Labs, and Communities

Tutorials

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages