Machine Learning Justice Lab - Tools

Introduction

ML Justice Lab is designed as an alternative to CoCounsel/Casetext, aiming to provide efficient and accurate legal document analysis and summarization. Currently under evaluation by the San Francisco Public Defenders office, ML Justice Lab aims to provide a comprehensive suite of tools for legal document processing.

Core Capabilities

Document Summarization: Our primary focus has been on developing a summarization tool that distills complex legal documents into clear, concise summaries.
Entity Extraction: We're creating a tool to identify and extract key entities such as persons, places, and important facts from legal documents.
Timeline Generation: We're working on a feature to automatically create chronological timelines from legal documents.
Table of Contents Generation: To improve navigation of lengthy documents, we're developing a tool to automatically generate detailed tables of contents.
Page Classification: Our team is building a classification system to categorize pages within a document, making it easier to locate specific types of information.

Key Findings

Our initial evaluations, focused on the summarization task, have yielded promising results:

Performance: Summaries generated by ML Justice Lab, using Claude Haiku and open-source Mixtral, performed on par with or outperformed CoCounsel/Casetext, which uses a variation of GPT-4, a much larger model.
Efficiency and Cost-effectiveness: ML Justice Lab achieves these results using a significantly smaller model compared to CoCounsel/Casetext's GPT-4o, making our approach substantially more cost-efficient. Our model is approximately 20x less expensive for input tokens and 12x less expensive for output tokens.

Open-Source Model Evaluation

This repository contains our evaluation results, comparing the performance of various models including LLAMA, CoCounsel, Claude, and Mixtral across different metrics for the summarization task. As we continue to develop and refine our additional tools, we will update this repository with new findings and performance metrics.

Summary stats from second eval

Metric	cocounsel				claude - non-condensed				claude - condensed				mixtral 22b				llama 3				mixtral nemo				llama 3.1				gemini 1.5 flash
	Completeness	Correctness	Conciseness	Average	Completeness	Correctness	Conciseness	Average	Completeness	Correctness	Conciseness	Average	Completeness	Correctness	Conciseness	Average	Completeness	Correctness	Conciseness	Average	Completeness	Correctness	Conciseness	Average	Completeness	Correctness	Conciseness	Average	Completeness	Correctness	Conciseness	Average
count	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0	8.0
mean	3.625	3.375	3.625	3.542	3.25	2.375	3.125	2.917	4.25	3.25	4.5	4.000	3.375	2.75	3.25	3.125	3.25	2.625	2.875	2.917	3.875	3.625	3.875	3.792	2.75	2.5	2.5	2.583	3.375	2.75	2.75	2.958
std	1.188	1.302	1.188	1.226	1.581	1.768	1.126	1.492	0.707	1.165	0.535	0.802	1.408	1.282	1.282	1.324	1.282	1.188	0.354	0.941	0.641	1.061	0.991	0.898	0.886	1.069	0.535	0.830	0.744	1.035	0.886	0.888
min	2.0	2.0	2.0	2.000	1.0	1.0	2.0	1.333	3.0	1.0	4.0	2.667	2.0	1.0	1.0	1.333	1.0	1.0	2.0	1.333	3.0	2.0	2.0	2.333	1.0	1.0	2.0	1.333	3.0	2.0	2.0	2.333
25%	2.75	2.0	2.75	2.500	2.5	1.0	2.0	1.833	4.0	3.0	4.0	3.667	2.0	2.0	2.75	2.250	3.0	2.0	3.0	2.667	3.75	3.0	3.75	3.500	2.75	1.75	2.0	2.167	3.0	2.0	2.0	2.333
50%	4.0	3.5	4.0	3.833	3.5	1.5	3.0	2.667	4.0	3.0	4.5	3.833	3.0	2.5	3.5	3.000	3.0	2.5	3.0	2.833	4.0	3.5	4.0	3.833	3.0	3.0	2.5	2.833	3.0	2.5	2.5	2.667
75%	4.25	4.25	4.25	4.250	4.25	3.5	4.0	3.917	5.0	4.0	5.0	4.667	5.0	3.25	4.0	4.083	3.5	3.0	3.0	3.167	4.0	4.25	4.25	4.167	3.0	3.0	3.0	3.000	3.25	3.0	3.25	3.167
max	5.0	5.0	5.0	5.000	5.0	5.0	5.0	5.000	5.0	5.0	5.0	5.000	5.0	5.0	5.0	5.000	5.0	5.0	3.0	4.333	5.0	5.0	5.0	5.000	4.0	4.0	3.0	3.667	5.0	5.0	4.0	4.667

Ranked as the best summary from second eval

Model	Count
cocounsel	3
mixtral nemo	3
claude - condensed	1
claude - non-condensed	1

Proportion of summaries that were not approved from second eval

model	Proportion Not Approved
claude - condensed	0.125
mixtral nemo	0.125
cocounsel	0.25
gemini 1.5 flash	0.375
llama 3	0.375
claude - non-condensed	0.5
mixtral 22b	0.5
llama 3.1	0.625

Summary stats from first eval

Summary stats from first eval with averages

Metric	LLAMA				CoCounsel				Claude				Mixtral
	Completeness	Correctness	Conciseness	Average	Completeness	Correctness	Conciseness	Average	Completeness	Correctness	Conciseness	Average	Completeness	Correctness	Conciseness	Average
count	5.0	5.0	5.0	5.0	5.0	5.0	5.0	5.0	10.0	10.0	10.0	10.0	5.0	5.0	5.0	5.0
mean	3.0	2.4	3.4	2.93	3.8	4.0	4.4	4.07	4.1	3.5	3.9	3.83	3.0	2.2	3.6	2.93
std	1.0	0.55	1.14	0.90	1.30	1.41	0.89	1.20	1.10	1.27	0.99	1.12	1.0	0.84	1.14	0.99
min	2.0	2.0	2.0	2.0	2.0	2.0	3.0	2.33	2.0	2.0	2.0	2.0	2.0	1.0	2.0	1.67
25%	2.0	2.0	3.0	2.33	3.0	3.0	4.0	3.33	3.25	2.25	3.25	2.92	2.0	2.0	3.0	2.33
50%	3.0	2.0	3.0	2.67	4.0	5.0	5.0	4.67	4.5	3.5	4.0	4.0	3.0	2.0	4.0	3.0
75%	4.0	3.0	4.0	3.67	5.0	5.0	5.0	5.0	5.0	4.75	4.75	4.83	4.0	3.0	4.0	3.67
max	4.0	3.0	5.0	4.0	5.0	5.0	5.0	5.0	5.0	5.0	5.0	5.0	4.0	3.0	5.0	4.0

Ranked as the best summary from first eval

Model	Count
LLAMA	0
CoCounsel	3
Claude	2
Mixtral	0

Proportion of summaries that were not approved from first eval

Model	Proportion of Summaries Not Approved
LLAMA	80%
CoCounsel	20%
Claude	30%
Mixtral	60%

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
local		local
ocr/src		ocr/src
summary		summary
table-of-contents/toc		table-of-contents/toc
timelines/timeline		timelines/timeline
vision		vision
.gitignore		.gitignore
LICENSE		LICENSE
Notes.md		Notes.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Justice Lab - Tools

Introduction

Core Capabilities

Key Findings

Open-Source Model Evaluation

Summary stats from second eval

Ranked as the best summary from second eval

Proportion of summaries that were not approved from second eval

Summary stats from first eval

Summary stats from first eval with averages

Ranked as the best summary from first eval

Proportion of summaries that were not approved from first eval

About

Languages

License

ayyubibrahimi/mljusticelab-tools

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Justice Lab - Tools

Introduction

Core Capabilities

Key Findings

Open-Source Model Evaluation

Summary stats from second eval

Ranked as the best summary from second eval

Proportion of summaries that were not approved from second eval

Summary stats from first eval

Summary stats from first eval with averages

Ranked as the best summary from first eval

Proportion of summaries that were not approved from first eval

About

Resources

License

Stars

Watchers

Forks

Languages