Serving a model from Hugging Face with FastAPI

Python 3.11
FastAPI
Transformers
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B model from Hugging Face

Note: I haven't done batching yet. Will learn about it and implement it soon.

Steps

Clone the repository
Install uv
Install dependencies
```
uv sync --frozen
```
You will need your Hugging Face token once to download the model when running
Run the development server on port 8080
```
make dev
```

Send request to the /completions endpoint for inference

curl -X POST -H "Content-Type: application/json" -d '{"query":"hello"}' http://127.0.0.1:8080/completion

This endpoint will stream tokens using server send events. After the LLM is done, it will send a [DONE] event, AND after then it will send the inference stats.

Example:

data: isolate

data: y.

data: [DONE]

data: {"start_time": "2025-01-24T19:09:39.295529", "end_time": "2025-01-24T19:09:51.029058", "elapsed_time": 11.733529, "num_tokens": 119, "tokens_per_second": 10.141876327232838}

Run the server on port 8080

make start

Docker

You will need to put your Hugging Face token in the hf_token variable in the Makefile.

Build the Docker image

make docker-build

Run the Docker image

make docker-run

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.Dockerignore		.Dockerignore
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Serving a model from Hugging Face with FastAPI

Steps

Run the server on port 8080

Docker

Build the Docker image

Run the Docker image

About

Uh oh!

Releases

Packages

Languages

biraj-outspeed/model-server

Folders and files

Latest commit

History

Repository files navigation

Serving a model from Hugging Face with FastAPI

Steps

Run the server on port 8080

Docker

Build the Docker image

Run the Docker image

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages