GEN-min-script

Repository with the models used for the task of segmentation and classification of legal documents.

Setup enviroment

conda env create -f env.yml
conda activate gen-env-min
pip install -r requirements.txt

Pre-annotation data analysis

The distribution of pages per document has been studied. Determining that the vast majority of PDFs contain between 1 and 50 pages. As a single page does not provide information for the task, files of more than 15 pages and less than 50 pages will be taken.

The duration of contracts has also been studied. It can be seen that at some point most of the contracts that ended before the year 2000 were deleted. On the other hand, it can be seen that most of the terminated contracts have a short duration.

Finally, it has been studied how many contracts have ended each year, with respect to the total we have.It should be noted that the contracts still open represent 71% of the data.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
gen		gen
.gitignore		.gitignore
README.md		README.md
env.yml		env.yml
labels.json		labels.json
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GEN-min-script

Setup enviroment

Pre-annotation data analysis

About

Releases

Packages

Languages

UCA-Datalab/GEN-min-script

Folders and files

Latest commit

History

Repository files navigation

GEN-min-script

Setup enviroment

Pre-annotation data analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages