Skip to content

Python: Epidocs XML > pie_extended lemmatization > sqlite (for php Verbatim)

License

Notifications You must be signed in to change notification settings

galenus-verbatim/verbapie

Repository files navigation

Verbapie

Python code to produce an SQLite database, ready to offer lemma search on the web for Epidocs XML greek documents.

Caution

This code is at a very early stage and is not ready for distribution. It works, at least for the developpers. Fill an issue upper if you want to work with the developpers to make it work for your corpus.

Requirements

  • A corpus of greek texts conforming to the tei-epidoc.rng schema. Example
    …/myprojects$ git clone https://github.com/OpenGreekAndLatin/First1KGreek.git
  • A python 3 installation, >= 3.6, < 3.10 (at 2022-01)
    ubuntu.21.10:…$ python3 -V
    Python 3.9.7
  • The pip packager
  • The Python libxml wrapper for XSLT transformations
    ubuntu.21.10:…$ sudo pip3 install lxml
  • pie_extended, the lemmatizer from Thibault Clérice, with the greek model, takes a while, and can fall in a depedencies hell if you have some required packages installed in other versions than desired by pie. This scenario has worked (Cython allow scikit to recompile itself)
    ubuntu.21.10:…$ sudo pip3 install Cython
    ubuntu.21.10:…$ sudo pip3 install pie-extended
    ubuntu.21.10:…$ pie-extended download grc

Usage

Not stable for now.

Optional, Cuda with nvidia graphic cards

For a faster lemmatisation, if you have an Nvidia graphic card, you can use it for work (and not only gaming). Install the latest Nvidia pilots, and the Cuda toolkit to use the processors of your graphic card, ant install the python lib
ubuntu.21.10:~$ sudo apt install nvidia-cuda-toolkit

Installation for Windows

  • Install nvidia cuda pilots
  • Install PyTorch 1.7.1, lemmatization with papie 0.3.9 requires torch<=1.7.1,>=1.3.1, chose the torch version according to your cuda pilot version
  • (The full lematization of the Iliad and the Odyssey takes about 5 minutes with cuda on an rtx 3060ti and about 13 minutes without, the 2.6 multiplication factor is about the same with a much larger corpus.)

Install Python for Windows 10

A python package suppose usually that you have already a running Python installation, but if not, and if you are on windows, the system will not help vou to make good choices like linux. Here some hints that may save you time, at least at date (2022-01).

  • Install Python 3.8, don’t try to be newer than others. Verbapy is a Digital Humanity library, it requires research libs. Researchers are not paid to dicover new bugs on new versions of Python. Tick NOW (much more easier to explain than after) Add Python 3.8 to PATH, and pip.
  • Don’t try to install python globally on windows (ex: C:\Program Files\Python38). This good practice as a linux admin will run you in "deps hell" with windows.
  • Verify thoses commands in your preferred console
    win10> python -V
    Python 3.8.10
    win10> where python
    C:\Users{YOU}\AppData\Local\Programs\Python\Python38\python.exe
  • Update pip (the python package installer)
    win10> pip install --user --upgrade pip
    (--user should not be required, but sometimes, it seems)
  • Now you should have a Python correct to work, try to install an omportant requirement
    win10> pip install lxml

About

Python: Epidocs XML > pie_extended lemmatization > sqlite (for php Verbatim)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •