Skip to content

Data and associated sciprts for generating legal language copora

Notifications You must be signed in to change notification settings

SelfBriefs/Legal_copora

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Legal Corpora

This repository contains:

  1. A Jupyter Notebook script (leg_extract.ipynb) for the extraction of sections and sub-sections from XML tagged legislation, as available from http://www.legislation.gov.uk/

  2. A python script (LIscrape.py) implementing a Scrapy spider for the extraction of contractual clauses from material contracts filed with the SEC, as available from https://www.lawinsider.com/

  3. A sample dataset of extracted legislation (Leg_data160718.csv)

  4. A sample dataset of extracted contract clauses (LIdata160718.csv) and a list of scraped URLs (before addition of suffixes) (TopDomains.txt)

Please see https://richardbatstone.github.io/ for a discussion and further background.

About

Data and associated sciprts for generating legal language copora

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 81.7%
  • Python 18.3%