-
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #380 from sillsdev/setup_update
Update setup documentation and Docker images
- Loading branch information
Showing
8 changed files
with
248 additions
and
271 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
# Manual Setup | ||
|
||
## SILNLP Prerequisites | ||
These are the main requirements for the SILNLP code to run on a local machine. Since there are many Python packages that need to be used with complex versioning requirements, we use a Python package called Poetry to mangage all of those. So here is a rough heirarchy of SILNLP with the major dependencies. | ||
|
||
| Requirement | Reason | | ||
| --------------------- | ----------------------------------------------------------------- | | ||
| GIT | to get the repo from [github](https://github.com/sillsdev/silnlp) | | ||
| Python | to run the silnlp code | | ||
| Poetry | to manage all the Python packages and versions | | ||
| NVIDIA GPU | Required to run on a local machine | | ||
| Nvidia drivers | Required for the GPU | | ||
| CUDA Toolkit | Required for the Machine learning with the GPU | | ||
| Environment variables | To tell SILNLP where to find the data, etc. | | ||
|
||
## Setup | ||
|
||
The SILNLP code can be run on either Windows or Linux operating systems. If using an Ubuntu distribution, the only compatible version is 20.04. | ||
|
||
__Download and install__ the following before creating any projects or starting any code, preferably in this order to avoid most warnings: | ||
|
||
1. If using a local GPU: [NVIDIA driver](https://www.nvidia.com/download/index.aspx) | ||
* On Ubuntu, the driver can alternatively be installed through the GUI by opening Software & Updates, navigating to Additional Drivers in the top menu, and selecting the newest NVIDIA driver with the labels proprietary and tested. | ||
* After installing the driver, reboot your system. | ||
2. [Git](https://git-scm.com/downloads) | ||
3. [Python 3.8](https://www.python.org/downloads/) (latest minor version, ie 3.8.19) | ||
* Can alternatively install Python using [miniconda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/windows.html) if you're planning to use more than one version of Python. If following this method, activate your conda environment before installing Poetry. | ||
4. [Poetry](https://python-poetry.org/docs/#installation) | ||
* Note that whether the command should call python or python3 depends on which is required on your machine. | ||
* It may (or may not) be possible to run the curl command within a VS Code terminal. If that causes permission errors close VS Code and try it in an elevated CMD prompt. | ||
|
||
Windows: | ||
At an administrator CMD prompt or a terminal within VS Code run: | ||
``` | ||
curl -sSL https://install.python-poetry.org | python - --version 1.7.1 | ||
``` | ||
In Powershell, run: | ||
``` | ||
(Invoke-WebRequest -Uri https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py -UseBasicParsing).Content | python | ||
``` | ||
|
||
Linux: | ||
In terminal, run: | ||
``` | ||
curl -sSL https://install.python-poetry.org | python3 - --version 1.7.1 | ||
``` | ||
Add the following line to your .bashrc file in your home directory: | ||
``` | ||
export PATH="$HOME/.local/bin:$PATH" | ||
``` | ||
5. C++ Redistributable | ||
* Note - this may already be installed. If it is not installed you may get cryptic errors such as "System.DllNotFoundException: Unable to load DLL 'thot' or one of its dependencies" | ||
* Windows: Download from https://support.microsoft.com/en-us/topic/the-latest-supported-visual-c-downloads-2647da03-1eea-4433-9aff-95f26a218cc0 and install | ||
* Linux: Instead of installing the redistributable, run the following commands: | ||
``` | ||
sudo apt-get update | ||
sudo apt-get install build-essential gdb | ||
``` | ||
|
||
### Visual Studio Code setup | ||
|
||
1. Install Visual Studio Code | ||
2. Install Python extension for VS Code | ||
3. Open up silnlp folder in VSC | ||
4. In CMD window, type `poetry install` to create the virtual environment for silnlp | ||
* If using conda, activate your conda environment first before `poetry install`. Poetry will then install all the dependencies into the conda environment. | ||
5. Choose the newly created virtual environment as the "Python Interpreter" in the command palette (ctrl+shift+P) | ||
* If using conda, choose the conda environment as the interpreter | ||
6. Open the command palette and select "Preferences: Open User Settings (JSON)". In the `settings.json` file, add the following options: | ||
``` json | ||
"python.formatting.provider": "black", | ||
"python.linting.pylintEnabled": true, | ||
"editor.formatOnSave": true, | ||
``` | ||
|
||
### S3 bucket setup | ||
|
||
See [S3 bucket setup](s3_bucket_setup.md). | ||
|
||
### ClearML setup | ||
|
||
See [ClearML setup](clear_ml_setup.md). | ||
|
||
### Create SILNLP cache | ||
* Create the directory "/home/user/.cache/silnlp", replacing "user" with your username. | ||
* Create the directory "/home/user/.cache/silnlp/experiments" and set the environment variable SIL_NLP_CACHE_EXPERIMENT_DIR to that path. | ||
* Create the directory "/home/user/.cache/silnlp/projects" and set the environment variable SIL_NLP_CACHE_PROJECT_DIR to that path. | ||
|
||
### Additional Environment Variables | ||
* Set the following environment variables with your respective credentials: CLEARML_API_ACCESS_KEY, CLEARML_API_SECRET_KEY, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY. | ||
* Set SIL_NLP_DATA_PATH to "/aqua-ml-data" and CLEARML_API_HOST to "https://api.sil.hosted.allegro.ai". | ||
|
||
### Setting Up and Running Experiments | ||
|
||
See the [wiki](https://github.com/sillsdev/silnlp/wiki) for information on setting up and running experiments. The most important pages for getting started are the ones on [file structure](https://github.com/sillsdev/silnlp/wiki/Folder-structure-and-file-naming-conventions), [model configuration](https://github.com/sillsdev/silnlp/wiki/Configure-a-model), and [running experiments](https://github.com/sillsdev/silnlp/wiki/NMT:-Usage). A lot of the instructions are specific to NMT, but are still helpful starting points for doing other things like [alignment](https://github.com/sillsdev/silnlp/wiki/Alignment:-Usage). | ||
|
||
See [this](https://github.com/sillsdev/silnlp/wiki/Using-the-Python-Debugger) page for information on using the VS code debugger. | ||
|
||
If you need to use a tool that is supported by SILNLP but is not installable as a Python library (which is probably the case if you get an error like "RuntimeError: eflomal is not installed."), follow the appropriate instructions [here](https://github.com/sillsdev/silnlp/wiki/Installing-External-Libraries). | ||
|
||
## Setting environment variables permanently | ||
Windows users: see [here](https://github.com/sillsdev/silnlp/wiki/Install-silnlp-on-Windows-10#permanently-set-environment-variables) for instructions on setting environment variables permanently | ||
|
||
Linux users: To set environment variables permanently, add each variable as a new line to the `.bashrc` file in your home directory with the format | ||
``` | ||
export VAR="VAL" | ||
``` | ||
|
||
## .NET Machine alignment models | ||
|
||
If you need to run the .NET versions of the Machine alignment models, you will need to install .NET Core SDK 8.0. After installing, run `dotnet tool restore`. | ||
* Windows: [.NET Core SDK](https://dotnet.microsoft.com/download) | ||
* Linux: Installation instructions can be found [here](https://learn.microsoft.com/en-us/dotnet/core/install/linux-ubuntu-2004). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
# S3 bucket setup | ||
|
||
We use Amazon S3 storage for storing our experiment data. Here is some workspace setup to enable a decent workflow. | ||
|
||
### Install and configure AWS S3 storage | ||
* Install the aws-cli from: https://aws.amazon.com/cli/ | ||
* In cmd, type: `aws configure` and enter your AWS access_key_id and secret_access_key and the region (we use region = us-east-1). | ||
* The aws configure command will create a folder in your home directory named '.aws' it should contain two plain text files named 'config' and 'credentials'. The config file should contain the region and the credentials file should contain your access_key_id and your secret_access_key. | ||
(Home directory on windows is usually C:\Users\<Username>\ and on linux it is /home/username) | ||
|
||
### Install and configure rclone | ||
|
||
**Windows** | ||
|
||
The following will mount /aqua-ml-data on your S drive and allow you to explore, read and write. | ||
* Install WinFsp: http://www.secfs.net/winfsp/rel/ (Click the button to "Download WinFsp Installer" not the "SSHFS-Win (x64)" installer) | ||
* Download rclone from: https://rclone.org/downloads/ | ||
* Unzip to your desktop (or some convient location). | ||
* Add the folder that contains rclone.exe to your PATH environment variable. | ||
* Take the `scripts/rclone/rclone.conf` file from this SILNLP repo and copy it to `~\AppData\Roaming\rclone` (creating folders if necessary) | ||
* Add your credentials in the appropriate fields in `~\AppData\Roaming\rclone` | ||
* Take the `scripts/rclone/mount_to_s.bat` file from this SILNLP repo and copy it to the folder that contains the unzipped rclone. | ||
* Double-click the bat file. A command window should open and remain open. You should see something like: | ||
``` | ||
C:\Users\David\Software\rclone>call rclone mount --vfs-cache-mode full --use-server-modtime s3aqua:aqua-ml-data S: | ||
The service rclone has been started. | ||
``` | ||
|
||
**Linux** | ||
|
||
The following will mount /aqua-ml-data to an S folder in your home directory and allow you to explore, read and write. | ||
* Download rclone from: https://rclone.org/install/ | ||
* Take the `scripts/rclone/rclone.conf` file from this SILNLP repo and copy it to `~/.config/rclone/rclone.conf` (creating folders if necessary) | ||
* Add your credentials in the appropriate fields in `~/.config/rclone/rclone.conf` | ||
* Create a folder called "S" in your user directory | ||
* Run the following command: | ||
``` | ||
rclone mount --vfs-cache-mode full --use-server-modtime s3aqua:aqua-ml-data ~/S | ||
``` | ||
### To start S: drive on start up | ||
|
||
**Windows** | ||
|
||
Put a shortcut to the mount_to_s.bat file in the Startup folder. | ||
* In Windows Explorer put `shell:startup` in the address bar or open `C:\Users\<Username>\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup` | ||
* Right click to add a new shortcut. Choose `mount_to_s.bat` as the target, you can leave the name as the default. | ||
|
||
Now your AWS S3 bucket should be mounted as S: drive when you start Windows. | ||
|
||
**Linux** | ||
* Run `crontab -e` | ||
* Paste `@reboot rclone mount --vfs-cache-mode full --use-server-modtime s3aqua:aqua-ml-data ~/S` into the file, save and exit | ||
* Reboot Linux | ||
|
||
Now your AWS S3 bucket should be mounted as ~/S when you start Linux. |