-
-
Notifications
You must be signed in to change notification settings - Fork 80
Strong need for multiple models
in a single deployment
#263
Comments
Hi @KastanDay! I would suggest to implement a routing service externally, which can decide which backend service/process to call based on the |
Thank you! Do you have any suggestions on an easy routing system? Something short and sweet? I'm an experienced backend programmer, but I've not done much with load balancing // reverse proxies. Thanks again! Edit: In particular, I want to respect the |
Answering my own question, I suppose NGIX or Traefik would work well. Here's what GPT-4 said, just pretend You can configure Traefik to route requests based on query parameters using its Docker Compose File (docker-compose.yml)Here, we define two backend services ( version: '3.7'
services:
traefik:
image: traefik:v2.4
ports:
- "80:80"
volumes:
- "./traefik.yml:/etc/traefik/traefik.yml"
backend1:
image: nginx:alpine
labels:
- "traefik.http.routers.backend1.rule=Host(`localhost`) && Query(`backend=backend1`)"
backend2:
image: nginx:alpine
labels:
- "traefik.http.routers.backend2.rule=Host(`localhost`) && Query(`backend=backend2`)" Traefik Configuration File (traefik.yml)This file sets up Traefik and tells it to look for configurations in Docker labels. api:
dashboard: true
providers:
docker:
exposedByDefault: false To bring up the services:
UsageAfter running the Docker Compose, you can route your requests by including the
Traefik will route the request to the appropriate backend based on the query parameter. |
As mentioned in #179, users need multiple models. On a multi-GPU on-prem machine, I want to write a config file that's like:
Then users should be able to specify
"model": "<either_model>",
in their requests.I can start a PR if you want this feature. Let me know if you have any suggestions on the best way to load these models and keep them mostly separate from each other.
The text was updated successfully, but these errors were encountered: