This application allows you to upload files into minio storage bucket and upload files from minio storage directly to colab. Furthermore, you will be able to remotely execute scripts on colab and dynamically download script output files back to minio storage in synchronization mode (rsync-like approach).
-
Make sure that you have installed the latest versions of
python
andpip
on your computer. Also, you have to install Docker and Docker Compose. -
This project by default uses poetry for dependency and virtual environment management. Make sure to install it too.
-
Make sure to provide all required environment variables (via
.env
file,export
command, secrets, etc.) before running application.
-
For managing pre-commit hooks this project uses pre-commit.
-
For import sorting this project uses isort.
-
For code format checking this project uses black.
-
For type checking his project uses mypy
-
For create commits and lint commit messages this project uses commitizen. Run
make commit
to use commitizen during commits. -
There is special
build_dev
stage in Docker file to build dev version of application image.
This project involves github actions to run all checks and unit-tests on push
to remote repository.
There are lots of useful commands in Makefile
included into this project's repo. Use make <some_command>
syntax to run each of them.
If your system doesn't support make commands - you may copy commands from Makefile
directly into terminal.
-
To install all the required dependencies and set up a virtual environment run in the cloned repository directory use:
poetry install
You can also install project dependencies using
pip install -r requirements.txt
. -
To config pre-commit hooks for code linting, code format checking and linting commit messages run in the cloned directory:
poetry run pre-commit install
-
Build app image using
make build
To build reloadable application locally use
make build_dev
to build image in development environment. -
Run Docker containers using
make up
Note: this will also create and attach persistent named volume
logs
for Docker container. Container will use this volume to store applicationapp.log
file. -
Stop and remove Docker containers using
make down
If you also want to remove log volume use
make down_volume
-
By default, application will be accessible at http://localhost:8080, minio storage console - at http://localhost:9001. You can try all endpoints with SWAGGER documentation at http://localhost:8080/docs
-
Use
/files/upload_minio
resource to upload files to colab. You should provide minio bucket name in header. Request uses multipart/form-data to upload one or multiple files to minio storage. You should also specify key prefix that will be added to all uploaded files (in directory-like way) e.g.:files/main/
. If there are existing files with the same prefix - they will be removed from storage. -
Copy script from
documentation/colab_ssh_config_script.ipynb
to your colab account. Register at ngrok. Run this cell on colab. Provide ngrok auth token to prompt. Tunel will be created with connection credentials at/content/ssh_config/credentials
file. -
Use
/files/upload_colab
to send files from minio storage to colab. Files will be saved at colab's/content/uploaded/
directory. Specify files key prefix in request body keys_prefix field (in directory-like way) - all files with such prefix will be uploaded to colab. To connect colab you must provide all credentials (i.e. username, password, host and port) from/content/ssh_config/credentials
file. Files will be streamed to colab directly. If script_name specified - this script will be executed on colab. If jupiter notebook provided as script (e.g. file with .ipynb extension). It will be converted to python script (e.g. file with .py extension) on colab before execution. Note: make sure to save script outputs at/content/uploaded/output/
in order to make them available to further download from colab to minio. -
Use
/files/download_colab
to download script results from colab directory/content/uploaded/output/
to minio storage. Specified key prefix will be added to each object in minio (in directory-like way). Files are streamed directly from colab to minio (i.e. without equally stored in application local storage) using sshfs protocol, which will mount colab remote directory to temporary created application local directory. This action uses aws CLI sync under the hood, so it can be used to synchronise minio storage files with dynamically created/updated/deleted by colab script files (i.e. this means that if new file was created/updated/deleted on colab directory - it will be uploaded/updated/deleted in minio respectively. If file didn't change - it won't be modified in minio.)
-
Description of all project's endpoints and API may be viewed without running any services from
documentation/openapi.yaml
file -
You can update openapi.yaml documentation for API at any time by using
make openapi
command. -
All warnings and info messages will be shown in container's stdout and saved in
app.log
file. -
Use
colab_ssh_config_script.ipynb
on colab session side to open ssh connection tunnel. -
You may use
test_script.py
ortest_script.ipynb
files fromdocumentation/
as examples of scripts to upload, run on colab and get remote outputs.
-
Use
make test
to run build image and run all linters checks and unit-tests. -
After all tests coverage report will be also shown.
-
Staged changes will be checked during commits via pre-commit hook.
-
All checks and tests will run on code push to remote repository as part of github actions.