This is a simple FastAPI-based service that provides two main endpoints:
/embed
: Generates normalized sentence embeddings using theintfloat/e5-large
model./count-tokens
: Counts the number of tokens for each input sentence using the Hugging Face tokenizer.
- Docker installed
To include the model in the build context, clone the Hugging Face repository inside the app
folder:
git clone https://huggingface.co/intfloat/e5-large app/e5_model
Make sure the directory structure is like this:
.
├── app
│ ├── main.py
│ ├── requirements.txt
│ └── e5_model
│ ├── config.json
...
You can build the Docker image and assign it a specific version (e.g. 0.2.0
):
docker build -t e5-embedder:0.2.0 .
To run the service and expose it on port 8000:
docker run -p 8000:8000 e5-embedder:0.2.0
The API will be accessible at: http://localhost:8000
Request:
{
"sentences": ["What is the capital of France?", "Tell me about Python."]
}
Response:
{
"vectors": [[...], [...]]
}
Each vector is a normalized embedding of the input sentence.
Request:
{
"sentences": ["Hello world!", "This is a test."]
}
Response:
{
"token_counts": [
{"sentence": "Hello world!", "token_count": 4},
{"sentence": "This is a test.", "token_count": 6}
]
}
.
├── Dockerfile
├── app
│ ├── main.py
│ ├── requirements.txt
│ └── e5_model/ # Cloned model files here
To remove the container after testing:
docker ps -a # Find container ID
docker rm <container_id>
To remove the image:
docker rmi e5-embedder:0.2.0