This is the repository for our "Law and Artificial Intelligence" project at Northwestern University. The team member for the project are Noah Caldwell-Gatsos @ncaldwell17, Rhett D'souza @rhettdsouza13 and Lukas Justen @Lukas-Justen.
Directly applying advancements in transfer learning from BERT results in poor accuracy in domain-specific areas like law because of a word distribution shift from general domain corpora to domain-specific corpora. In our project, we will demonstrate how the pre-trained language model BERT can be adapted to additional domains, such as contract law or court judgments.
We did not create and train the model, that requires resources beyond the scope of the project. Instead, what we propose is a framework for creating a domain-specific BERT by using legal contracts as a case study. This framework will cover why this is necessary, what kind of data is necessary, how the model is trained, and how the model’s performance can be evaluated.
Finally, we built a small frontend that allows you to visualize the complexity of a corpora. We hoped that this will help other people to gain insights into their datasets and figure out whether it makes sense to apply BERT to their domain.