An ML API to compute semantic similarity scores between sentence examples.
The API is programmed with the fastapi
Python package,
and the semantic similarities are computed based on SBert (sentence-transformers
package).
The deployment is configured for Docker Compose.
Call Docker Compose
export API_PORT=8083
docker-compose -f docker-compose.yml up --build
# or as oneliner:
API_PORT=8083 docker-compose -f docker-compose.yml up --build
(Start docker daemon before, e.g. open /Applications/Docker.app
on MacOS).
Check
curl http://localhost:8083
Notes: Only main.py
is used in Dockerfile
.
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt --no-cache-dir
pip install -r requirements-dev.txt --no-cache-dir
(If your git repo is stored in a folder with whitespaces, then don't use the subfolder .venv
. Use an absolute path without whitespaces.)
SBert allows to set the cache_folder
via the environment variable SENTENCE_TRANSFORMERS_HOME
(See here).
mkdir ./sbert-models
export SENTENCE_TRANSFORMERS_HOME="$(pwd)/sbert-models"
source .venv/bin/activate
# uvicorn app.main:app --reload
gunicorn app.main:app --reload --bind=0.0.0.0:8083 \
--worker-class=uvicorn.workers.UvicornH11Worker \
--workers=1 --timeout=600
a) Send a list of strings.
curl -X 'POST' \
'http://localhost:8083/similarities/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '[
"Der Film ist super.",
"Der Spielfilm ist gut.",
"Der Film ist Müll.",
"Der Spielfilm ist schlecht."
]'
b) Send an JSON object with UUID4 as keys and text as values.
curl -X 'POST' \
'http://localhost:8083/similarities/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"80ba0456-1d26-4d22-8e80-113b919502ee": "Der Film ist super.",
"a47fe293-26e7-40f0-b0b5-202e0955458f": "Der Spielfilm ist gut.",
"86e356a3-5b42-4e03-91fc-cf69098b6dd2": "Der Film ist Müll.",
"779d0245-8f54-49ec-9f0f-8e29dc987b41": "Der Spielfilm ist schlecht."
}'
- Check syntax:
flake8 --ignore=F401 --exclude=$(grep -v '^#' .gitignore | xargs | sed -e 's/ /,/g')
- Run Unit Tests:
PYTHONPATH=. pytest
- Show the docs: http://localhost:8083/docs
- Show Redoc: http://localhost:8083/redoc
find . -type f -name "*.pyc" | xargs rm
find . -type d -name "__pycache__" | xargs rm -r
rm -r .pytest_cache
rm -r .venv
@software{ulf_hamster_2022_7096002,
author = {Ulf Hamster and
Luise Köhler},
title = {simiscore-semantic: ML API for semantic similarities},
month = sep,
year = 2022,
publisher = {Zenodo},
version = {0.1.0},
doi = {10.5281/zenodo.7096002},
url = {https://doi.org/10.5281/zenodo.7096002}
}
- Sebastián Ramírez, 2018, FastAPI, https://github.com/tiangolo/fastapi
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics. http://dx.doi.org/10.18653/v1/D19-1410
Please open an issue for support.
Please contribute using Github Flow. Create a branch, add commits, and open a pull request.
The "Evidence" project was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 433249742 (GU 798/27-1; GE 1119/11-1).
- till 31.Aug.2023 (v0.1.0) the code repository was maintained within the DFG project 433249742
- since 01.Sep.2023 (v0.2.0) the code repository is maintained by Ulf Hamster.