This repository contains all detailed information and resources for our tutorial at CODS-COMAD 2023, held at IIT Bombay, 4th January 2023.
Artificial Intelligence (AI), Machine Learning (ML), and Natural Language Processing (NLP) are transforming the way legal professionals and law firms approach their work. The significant potential for the application of AI to Law, for instance, by creating computational solutions for legal tasks, has intrigued researchers for decades. This appeal has only been amplified with the advent of Deep Learning (DL). In particular, research in AI & Law can be extremely beneficial in countries like India with an overburdened legal system.
In this tutorial, we will give an overview of the various aspects of applying AI to legal textual data. We will start with a history of AI & Law, and then discuss the current state of AI & Law research including the techniques that have produced the biggest impact. We will also take a deep dive into the software processes required to implement and sustain such AI solutions.
Part | Topic | Presenter | Link to Slides |
---|---|---|---|
1 | A brief history of AI & Law research, important milestones | Jack G. Conrad | Slides |
2 | State-of-the-art AI & Law research, datasets, benchmarks and tools | Saptarshi Ghosh & Shounak Paul | Slides |
3 | Challenges and best practices in developing Legal information systems at scale (how to put the AI models into practice and sustain them) | Shirsha Ray Chaudhuri | Slides |
This section contains resources for different automation tasks in the legal domain
This task aims to identify different entities in legal documents. Entities may be classified into different groups that have different legal meanings, such as the parties (appellants, respondents), lawyers, judges and so on.
- A Dataset of German Legal Documents for Named Entity Recognition (Lietner et al., 2020)
- Named Entity Recognition in Indian court judgments (Kalamkar et al., 2022)
The task of summarization in the legal domain aims to generate a gist of the entire case document, either in extractive fashion (selecting the most important sentences) or abstractive fashion (similar to summaries written by humans).
- Legal Case Document Summarization: Extractive and Abstractive Methods and their Evaluation (Bhattacharya et al., 2022)
- BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization (Sharma et al., 2019)
Broadly speaking, this task aims to determine the outcomes of court cases. In many settings, this may be composed of several sub-tasks, which are addressed in the forthcoming sections.
- Natural language processing in law: Prediction of outcomes in the higher courts of Turkey (Mumcuoglu et al., 2021)
- Building corpora for the philological study of Swiss legal texts (Hofler et al., 2011)
- ILDC for CJPE: Indian Legal Documents Corpus for Court Judgment Prediction and Explanation (Malik et al., 2021)
- Judicial Decisions of the European Court of Human Rights: Looking into the Crystal Ball (Medvedeva et al., 2018)
- CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction (Xiao et al., 2018)
Often considered a sub-task of Legal Judgment Prediction, this task aims to identify the relevant legal articles and charges given the facts of a case.
- LeSICiN: A Heterogeneous Graph-based Approach for Automatic Legal Statute Identification from Indian Legal Documents (Paul et al., 2022)
- Hierarchical Matching Network for Crime Classification (Wang et al., 2019)
- Automatic Charge Identification from Facts: A Few Sentence-Level Charge Annotations is All You Need (Paul et al., 2020)
- Charge Prediction with Legal Attention (Bao et al., 2019)
Court case documents are composed of several functional parts such as Facts, Arguments, Ruling, etc. which may not be clearly demarcated. This task aims to automate the process of segmenting a court case document into these parts.
- Identification of Rhetorical Roles of Sentences in Indian Legal Judgments (Bhattacharya et al., 2019)
- The French Court Decision Structure dataset — FCD12K
Recently there have been many efforts to pre-train large, transformer-based language models for the legal domain, which have been adapted to many down-stream end tasks with spectacular efficiency.
- LEGAL-BERT: The Muppets straight out of Law School (Chalkidis et al., 2020)
- When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset of 53,000+ Legal Holdings (Zheng et al., 2021)
- Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset (Henderson et al., 2022)
- Pre-training Transformers on Indian Legal Text (Paul et al., 2022)
This is a miscellaneous list of other resources.
- LexGLUE: A Benchmark Dataset for Legal Language Understanding in English (Chalkidis et al., 2022)
- Liquid Legal Institute Repository on Legal Text Analytics
-
Jack G. Conrad, Director of Applied Research and Lead Research Scientist, TR Labs at Thomson Reuters, Minneapolis, MN USA
- Jack G. Conrad is Director of Applied Research at Thomson Reuters TR Labs where he focuses on a broad range of technical application areas involving AI, machine learning and textual data processing. He also fosters cross-team collaboration and communication in the process of implementing and deploying technology to meet business needs. For over two decades, he has delivered critical artifacts and infrastructure for research and business directed projects across a diverse spectrum of domains that have included legal, tax and news. Jack has published more than 50 peer reviewed research papers and has eight patents. He is passionate about the power of AI transformation in enterprise environments. Jack is past president of the International Association for Artificial Intelligence and Law (IAAIL.org) and has served on the IAAIL Executive Committee for 8 years. Jack’s areas of expertise include research in the fields of information retrieval (search), question answering, NLP, machine learning, data mining, and system evaluation.
-
Shirsha Ray Chaudhuri, Director of Engineering, TR Labs at Thomson Reuters, Bangalore, Karnataka, India
- Shirsha Ray Chaudhuri is the Director of Engineering at Thomson Reuters Labs, Bangalore. TR Labs is TR's applied research division, focused on delivering solutions with AI and emerging tech to TR's platforms and products and customer Proof of Concepts (PoCs). TR’s editorial workflows power its best-in-class product like Westlaw. The TR Labs team in Bangalore contributes to leveraging AI in these editorial workflows. Her earlier work includes strategic architecture and prototyping for Daimler's EvoBus team to help route planning software solutions for electric buses, predictive maintenance solutions for Daimler's DTNA service centres, and use of AI for rapid field ops in Daimler's Japan-based FUSO trucks. Besides providing point-in-time solutions for a single use case, she implemented generic modifications of such AI services which could be replicated across geographies and operators.
-
Shounak Paul, Senior Research Fellow, Deptt. of Computer Science &, Engineering, IIT Kharagpur, West Bengal, India
- Shounak Paul is a Senior Research Fellow at the Department of CSE, IIT Kharagpur. His research interests mainly include legal data analytics and applications of NLP in the legal domain. His works on AI & Law for Indian applications have been published in premier conferences and journals such as: semantic segmentation (JURIX 2019, best paper award; AI & Law Journal 2021), charge identification (COLING 2020) and legal statute identification using citation networks (AAAI 2022).
-
Saptarshi Ghosh, Assistant Professor, Deptt. of Computer Science &, Engineering, IIT Kharagpur, West Bengal, India
- Saptarshi Ghosh is an Assistant Professor at the Department of CSE, IIT Kharagpur. His research interests include Legal analytics, Social media analytics, and Algorithmic bias and fairness (on which he presently leads a Max Planck Partner Group at IIT Kharagpur). His works on AI & Law have been published at premier conferences including SIGIR, AAAI, CIKM, ECIR, COLING, and have been awarded at top AI & Law conferences, including the Best Paper Award at the International Conference on Legal Knowledge and Information Systems (JURIX) 2019, and the Best Student Paper Award at the International Conference on Artificial Intelligence and Law (ICAIL) 2021. He is presently the Section Editor on Legal Information Retrieval for the journal Artificial Intelligence and Law, the premier journal in AI & Law. He has been awarded with several prestigious awards, including the Institution of Engineers (India) Young Engineer Award 2017-18 in Computer Engineering discipline.
This tutorial has been published on the ACM Digital Library as part of the conference proceedings. You can cite our work using:
@inproceedings{10.1145/3570991.3571050,
author = {Conrad, Jack G. and Ray Chaudhuri, Shirsha and Paul, Shounak and Ghosh, Saptarshi},
title = {AI & Law: Formative Developments, State-of-the-Art Approaches, Challenges & Opportunities},
year = {2023},
isbn = {9781450397971},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3570991.3571050},
doi = {10.1145/3570991.3571050},
abstract = {Artificial Intelligence (AI), Machine Learning (ML), and Natural Language Processing (NLP) are transforming the way legal professionals and law firms approach their work. The significant potential for the application of AI to Law, for instance, by creating computational solutions for legal tasks, has intrigued researchers for decades. This appeal has only been amplified with the advent of Deep Learning (DL). In particular, research in AI & Law can be extremely beneficial in countries like India with an overburdened legal system. In this tutorial, we will give an overview of the various aspects of applying AI to legal textual data. We will start with a history of AI & Law, and then discuss the current state of AI & Law research including the techniques that have produced the biggest impact. We will also take a deep dive into the software processes required to implement and sustain such AI solutions.},
booktitle = {Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)},
pages = {320–323},
numpages = {4},
keywords = {Text Analytics, Legal Analytics, Natural Language Processing, Machine Learning},
location = {Mumbai, India},
series = {CODS-COMAD '23}
}