Skip to content

hannaHayik/IsraeliLawParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IsraeliLawParser

Parser & Tagger for Israeli laws documents

Goal: Convert Israeli law documents from DOC format (random strings) into tagged XML files with proper heirarchy

Input: PDF_DOC (input folder) (You can download the input folder from here https://drive.google.com/drive/folders/1bvjtlnGQZxYbSWumEKtDD4lKeqKImeBf)
Output: Israeli laws tagged as XML files (output files also are uploaded to our git)

Requirments:

  1. Install LibreOffice in 'C:\Program Files' OR change the path in project2.py to wherever you install it
  2. Install Python3 libraries (installion with PIP would be much easier):
    1. docx2python FROM https://pypi.org/project/docx2python/
    2. fuzzywuzzy FROM https://pypi.org/project/fuzzywuzzy/
    3. tqdm FROM https://pypi.org/project/tqdm/
  3. Put all files in one folder that should contain:
    1. PDF_DOC (input folder) (You can download the input folder from here https://drive.google.com/drive/folders/1bvjtlnGQZxYbSWumEKtDD4lKeqKImeBf)
    2. output (output folder)
    3. project2.py (Python3 script)

Running the script:

  1. MAKE SURE YOUR PYTHON INTERPETER IS: Python 3.6 (DH3)
  2. Run project2.py and wait to finish

Releases

No releases published

Packages

No packages published

Languages