ML Justice Lab is designed as an alternative to CoCounsel/Casetext, aiming to provide efficient and accurate legal document analysis and summarization. Currently under evaluation by the San Francisco Public Defenders office, ML Justice Lab aims to provide a comprehensive suite of tools for legal document processing.
-
Document Summarization: Our primary focus has been on developing a summarization tool that distills complex legal documents into clear, concise summaries.
-
Entity Extraction: We're creating a tool to identify and extract key entities such as persons, places, and important facts from legal documents.
-
Timeline Generation: We're working on a feature to automatically create chronological timelines from legal documents.
-
Table of Contents Generation: To improve navigation of lengthy documents, we're developing a tool to automatically generate detailed tables of contents.
-
Page Classification: Our team is building a classification system to categorize pages within a document, making it easier to locate specific types of information.
Our initial evaluations, focused on the summarization task, have yielded promising results:
-
Performance: Summaries generated by ML Justice Lab, using Claude Haiku and open-source Mixtral, performed on par with or outperformed CoCounsel/Casetext, which uses a variation of GPT-4, a much larger model.
-
Efficiency and Cost-effectiveness: ML Justice Lab achieves these results using a significantly smaller model compared to CoCounsel/Casetext's GPT-4o, making our approach substantially more cost-efficient. Our model is approximately 20x less expensive for input tokens and 12x less expensive for output tokens.
This repository contains our evaluation results, comparing the performance of various models including LLAMA, CoCounsel, Claude, and Mixtral across different metrics for the summarization task. As we continue to develop and refine our additional tools, we will update this repository with new findings and performance metrics.
Metric | cocounsel | claude - non-condensed | claude - condensed | mixtral 22b | llama 3 | mixtral nemo | llama 3.1 | gemini 1.5 flash | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Completeness | Correctness | Conciseness | Average | Completeness | Correctness | Conciseness | Average | Completeness | Correctness | Conciseness | Average | Completeness | Correctness | Conciseness | Average | Completeness | Correctness | Conciseness | Average | Completeness | Correctness | Conciseness | Average | Completeness | Correctness | Conciseness | Average | Completeness | Correctness | Conciseness | Average | |
count | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 |
mean | 3.625 | 3.375 | 3.625 | 3.542 | 3.25 | 2.375 | 3.125 | 2.917 | 4.25 | 3.25 | 4.5 | 4.000 | 3.375 | 2.75 | 3.25 | 3.125 | 3.25 | 2.625 | 2.875 | 2.917 | 3.875 | 3.625 | 3.875 | 3.792 | 2.75 | 2.5 | 2.5 | 2.583 | 3.375 | 2.75 | 2.75 | 2.958 |
std | 1.188 | 1.302 | 1.188 | 1.226 | 1.581 | 1.768 | 1.126 | 1.492 | 0.707 | 1.165 | 0.535 | 0.802 | 1.408 | 1.282 | 1.282 | 1.324 | 1.282 | 1.188 | 0.354 | 0.941 | 0.641 | 1.061 | 0.991 | 0.898 | 0.886 | 1.069 | 0.535 | 0.830 | 0.744 | 1.035 | 0.886 | 0.888 |
min | 2.0 | 2.0 | 2.0 | 2.000 | 1.0 | 1.0 | 2.0 | 1.333 | 3.0 | 1.0 | 4.0 | 2.667 | 2.0 | 1.0 | 1.0 | 1.333 | 1.0 | 1.0 | 2.0 | 1.333 | 3.0 | 2.0 | 2.0 | 2.333 | 1.0 | 1.0 | 2.0 | 1.333 | 3.0 | 2.0 | 2.0 | 2.333 |
25% | 2.75 | 2.0 | 2.75 | 2.500 | 2.5 | 1.0 | 2.0 | 1.833 | 4.0 | 3.0 | 4.0 | 3.667 | 2.0 | 2.0 | 2.75 | 2.250 | 3.0 | 2.0 | 3.0 | 2.667 | 3.75 | 3.0 | 3.75 | 3.500 | 2.75 | 1.75 | 2.0 | 2.167 | 3.0 | 2.0 | 2.0 | 2.333 |
50% | 4.0 | 3.5 | 4.0 | 3.833 | 3.5 | 1.5 | 3.0 | 2.667 | 4.0 | 3.0 | 4.5 | 3.833 | 3.0 | 2.5 | 3.5 | 3.000 | 3.0 | 2.5 | 3.0 | 2.833 | 4.0 | 3.5 | 4.0 | 3.833 | 3.0 | 3.0 | 2.5 | 2.833 | 3.0 | 2.5 | 2.5 | 2.667 |
75% | 4.25 | 4.25 | 4.25 | 4.250 | 4.25 | 3.5 | 4.0 | 3.917 | 5.0 | 4.0 | 5.0 | 4.667 | 5.0 | 3.25 | 4.0 | 4.083 | 3.5 | 3.0 | 3.0 | 3.167 | 4.0 | 4.25 | 4.25 | 4.167 | 3.0 | 3.0 | 3.0 | 3.000 | 3.25 | 3.0 | 3.25 | 3.167 |
max | 5.0 | 5.0 | 5.0 | 5.000 | 5.0 | 5.0 | 5.0 | 5.000 | 5.0 | 5.0 | 5.0 | 5.000 | 5.0 | 5.0 | 5.0 | 5.000 | 5.0 | 5.0 | 3.0 | 4.333 | 5.0 | 5.0 | 5.0 | 5.000 | 4.0 | 4.0 | 3.0 | 3.667 | 5.0 | 5.0 | 4.0 | 4.667 |
Model | Count |
---|---|
cocounsel | 3 |
mixtral nemo | 3 |
claude - condensed | 1 |
claude - non-condensed | 1 |
model | Proportion Not Approved |
---|---|
claude - condensed | 0.125 |
mixtral nemo | 0.125 |
cocounsel | 0.25 |
gemini 1.5 flash | 0.375 |
llama 3 | 0.375 |
claude - non-condensed | 0.5 |
mixtral 22b | 0.5 |
llama 3.1 | 0.625 |
Metric | LLAMA | CoCounsel | Claude | Mixtral | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Completeness | Correctness | Conciseness | Average | Completeness | Correctness | Conciseness | Average | Completeness | Correctness | Conciseness | Average | Completeness | Correctness | Conciseness | Average | |
count | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | 10.0 | 10.0 | 10.0 | 10.0 | 5.0 | 5.0 | 5.0 | 5.0 |
mean | 3.0 | 2.4 | 3.4 | 2.93 | 3.8 | 4.0 | 4.4 | 4.07 | 4.1 | 3.5 | 3.9 | 3.83 | 3.0 | 2.2 | 3.6 | 2.93 |
std | 1.0 | 0.55 | 1.14 | 0.90 | 1.30 | 1.41 | 0.89 | 1.20 | 1.10 | 1.27 | 0.99 | 1.12 | 1.0 | 0.84 | 1.14 | 0.99 |
min | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 3.0 | 2.33 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 1.0 | 2.0 | 1.67 |
25% | 2.0 | 2.0 | 3.0 | 2.33 | 3.0 | 3.0 | 4.0 | 3.33 | 3.25 | 2.25 | 3.25 | 2.92 | 2.0 | 2.0 | 3.0 | 2.33 |
50% | 3.0 | 2.0 | 3.0 | 2.67 | 4.0 | 5.0 | 5.0 | 4.67 | 4.5 | 3.5 | 4.0 | 4.0 | 3.0 | 2.0 | 4.0 | 3.0 |
75% | 4.0 | 3.0 | 4.0 | 3.67 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | 4.75 | 4.75 | 4.83 | 4.0 | 3.0 | 4.0 | 3.67 |
max | 4.0 | 3.0 | 5.0 | 4.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | 4.0 | 3.0 | 5.0 | 4.0 |
Model | Count |
---|---|
LLAMA | 0 |
CoCounsel | 3 |
Claude | 2 |
Mixtral | 0 |
Model | Proportion of Summaries Not Approved |
---|---|
LLAMA | 80% |
CoCounsel | 20% |
Claude | 30% |
Mixtral | 60% |