-
Notifications
You must be signed in to change notification settings - Fork 5
Dynamo namespace based isolation #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
biswapanda
wants to merge
23
commits into
main
Choose a base branch
from
bis/multi-tenancy
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
2802b6e
multi-tenancy
biswapanda 3f9ef63
--wip-- [skip ci]
biswapanda de71017
--wip--
biswapanda f056a39
--wip--
biswapanda c4dd0f9
--wip--
biswapanda aae19e9
--wip--
biswapanda 905da6e
--wip--
biswapanda 1bd31c1
--wip--
biswapanda f3ac1ea
--wip--
biswapanda af9fb7d
--wip--
biswapanda 32173b8
--wip--
biswapanda 4f0804d
--wip--
biswapanda affcd34
--wip--
biswapanda 074e549
--wip--
biswapanda a7ba785
--wip--
biswapanda 17d6811
--wip--
biswapanda 2ebb284
--wip--
biswapanda 400d9a1
--wip--
biswapanda 2c26a88
--wip--
biswapanda 7501e40
--wip--
biswapanda bfb0ef4
--wip--
biswapanda 2452616
--wip-- [skip ci]
biswapanda 7e81809
--wip--
biswapanda File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,199 @@ | ||
# Multi-tenancy support for DynamoGraphDeployment | ||
|
||
## Problem | ||
Currently we dont have strong isolation between dynamo graph deployments. | ||
|
||
Users expect a Dynamo namespace scoped frontend to serve models from same dynamo namespace. | ||
|
||
For example, two distinct DynamoGraphDeployment frontend pods should not serve models from the different dynamo namespaces. It same model is deployed in differente dynamo namespaces, frontend pods can cross-reference across dynamo deployments in same namespace. This issue can be aggravated if the same model is deplyed with different versions. | ||
|
||
Dynamo namespace is not enforced across the entire system. | ||
- frontend components are not scoped to dynamo namespace. | ||
- backend components are using `--endpoint` argument instead if an env variable. | ||
- operator doesn't pass specified `dynamoNamespace` to components. | ||
|
||
|
||
## What is a Dynamo namespace? | ||
|
||
tldr; Dynamo namespace is name of a `dynamo graph deployment`. | ||
|
||
A Dynamo graph deployment is: | ||
- `logical` group of workers (components) managed and working cohesively. | ||
- `logical` partition/boundary of distributed runtime (control, data and event plane) | ||
|
||
Dynamo namespace maps 1:1 with a k8s `DynamoGraphDeployment` CR. | ||
|
||
## Use cases for Dynamo namespace | ||
Dynamo namespace enables a hybrid sharing model where we share some resources (operator/etcd/nats deployments, resources, data in pvc) within a k8s namespace and not others (logical component deployments) across multiple dynamo namespaces. | ||
|
||
It helps multi-tenenacy use cases where each model deployment is managed in respective isolated`dynamo namespace`. | ||
|
||
Use cases: | ||
|
||
1. Users dont want to create distinct k8s namespaces for each model deployment. | ||
a. when dynamic namespace creation and management is not supported/possible | ||
b. simple to use and manage. Differentiating roles: | ||
- `Infra team`: deploys and manages etcd, nats, operator and resources. | ||
- `GenAI Inference Team`: deploys and manages etcd, nats, operator and resources. | ||
|
||
2. A/B test models in same namespace with same/shared model weights in RWX PVC volume but different parameters/backends. | ||
|
||
3. Deploy multiple models in same k8s namespace use Inference gateway (shared inference pool, endpoint picker) to serve multiple models. This allow configuring granular model routing, Flow control and scheduling policies. | ||
|
||
K8s namespace is the strongest isolation boundary in k8s. But its shared nothing architecture and causes issues with | ||
- sharing model weights | ||
- sharing common platform (etcd/nats/operator) | ||
- role based access control (rbac) of above resources vs components | ||
|
||
## Requirements | ||
|
||
1. Users **SHOULD** be able to create `multiple independent` `DynamoGraphDeployment` (serving same or different models) within single k8s namespace by specifying different `dynamoNamespace` in k8s CR. | ||
For example, I can create two dynamo graph deployments in same k8s namespace with same models with different parameters/backends and benchmark results. | ||
|
||
2. Single Dynamo cloud deployment (1 etcd+nats+operator) **SHOULD** be able to serve models from multiple dynamo namespaces. | ||
|
||
3. User **SHOULD** be able to deploy 1 or more `DynamoGraphDeployment` in same k8s namespace using different dynamo namespaces. | ||
|
||
## Design principles | ||
|
||
- Dynamo namespace is `dynamo graph deployment name` which is a logical grouping of components working together. | ||
|
||
- Reduce complexity and cognitive load. Reuse existing dynamo namespace as the isolation boundary. | ||
|
||
- Minimize cross-contention in FE (http, router, processor) across different namespaces. | ||
|
||
- Absolute share nothing architecture should use K8s namespace as the isolation boundary. | ||
|
||
|
||
## Proposal | ||
|
||
### Use K8s namespace as stronger isolation boundary: | ||
Dynamo graph deployment in different k8s namespaces are fully isolated from each other. | ||
|
||
They dont share any - | ||
1. deployment (etcd, nats, operator, etc) | ||
2. resources (cpu, memory, etc) | ||
3. data (PVC for models, etc) | ||
4. components (http, router, processor or llm backends) | ||
|
||
Use cases: | ||
1. Users can create distinct k8s namespaces for each model deployment. | ||
|
||
Pros: | ||
- Strong isolation boundary. | ||
- Already supported by k8s. | ||
|
||
Cons: | ||
- Need to create k8s namespaces dynamically for each model deployment. | ||
- Tightly coupling operability of infra team with genai inference team. | ||
- Need to manage more instances of etcd, nats, operator | ||
|
||
### Dynamo namespace as secondary isolation boundary within a k8s namespace: | ||
#### Isolation: | ||
This approach is a hybrid sharing model. | ||
|
||
Shared: | ||
1. deployment (etcd, nats, operator, etc) and its resources (cpu, memory, etc) | ||
2. data (PVC for models, etc) | ||
|
||
Not shared: | ||
1. components (http, router, processor or llm backends) | ||
|
||
#### High level design: | ||
|
||
 | ||
|
||
1. `DYNAMO_NAMESPACE` environment variable is used by components to scope their functionality. | ||
|
||
2. when `DYNAMO_NAMESPACE` is not specified, `default` is the default namespace. | ||
|
||
3. Frontend components (http, router, processor) are scoped to the dynamo namespace. | ||
Advantages: | ||
- Provides sharding ability to scale Routers independently. | ||
- Provides dynamo namespace scoped `sharding` ability to scale all-in-one frontend components independently. | ||
|
||
4. Dynamo namespace itself is hierarchial allowing heierchial isolation of Frontend components. | ||
|
||
### Implementation | ||
|
||
#### Dynamo Operator changes: | ||
|
||
Top level `dynamoNamespace` in DynamoGraphDeployment automatically sets `DYNAMO_NAMESPACE` environment variable in all components. | ||
|
||
```yaml | ||
apiVersion: dynamo.ai/v1alpha1 | ||
kind: DynamoGraphDeployment | ||
metadata: | ||
name: dynamo-graph-deployment | ||
spec: | ||
dynamoNamespace: default/model1 | ||
``` | ||
|
||
Similar changes are required in helm chart approach as well. | ||
|
||
 | ||
|
||
#### Dynamo Frontend components (http, router, processor): | ||
They use `DYNAMO_NAMESPACE` environment variable to read from etcd and nats. | ||
Phase 1: | ||
- Ignore any etcd data/watch events for namespaces other than the specific namespace. | ||
- Ignore any nats messages for namespaces other than the specific namespace. | ||
|
||
Phase 2: | ||
This phase will handle childrent namespaces as well. | ||
same frontend can handle `llama/version-A` and `llama/version-B` namespaces. | ||
- Subscribe to etcd data/watch events for specific namespace as prefix. | ||
- Subscribe to nats messages for namespaces specified namespace as prefix. | ||
- Ignore any event which doesn't match the prefix. | ||
|
||
#### Dynamo Backend components (vllm,sglang, trtllm): | ||
- uses `DYNAMO_NAMESPACE` environment variable to scope their functionality. | ||
|
||
Remove `--endpoint` argument in this format(`dyn://namespace.component.endpoint`) from all backend components. | ||
1. namespace is read from `DYNAMO_NAMESPACE` environment variable. This makes it possible to use same backend components for multiple namespaces. it makes a python command line to be `relocatable` across dynamo namespaces by just changing the `DYNAMO_NAMESPACE` environment variable and not the command line arguments. | ||
|
||
2. A component hosts multiple endpoints so `dyn://namespace.component.endpoint` is too specific. we dont use it in actual code. | ||
|
||
3. components name is unique in a dynamo namespace. we can delegate this until actually we have a use case for multiple components with same name in a namespace. | ||
|
||
Remove this argument from all backend components. | ||
```python | ||
parser.add_argument( | ||
"--endpoint", | ||
type=str, | ||
default=DEFAULT_ENDPOINT, | ||
help=f"Dynamo endpoint string in 'dyn://namespace.component.endpoint' format. Default: {DEFAULT_ENDPOINT}", | ||
) | ||
``` | ||
|
||
|
||
#### Heirarchial Dynamo namespace scoping | ||
|
||
Dynamo namespace itself is hierarchial allowing heierchial isolaton and request routing. | ||
|
||
A frontend launched with specific `DYNAMO_NAMESPACE` will be scoped to it's namespace and all children namespaces. | ||
|
||
1. It'll allow router to scope to a model as a sharding key and allow it to scale. | ||
|
||
2. two level heirarchy of dynamo namespace `tenant_name/model_name` (tenant:`user/org` and model:`model/version`) similar to huggingface model hub or github can help group model deployments/dynamo namespaces. | ||
for example, | ||
|
||
request to Frontend with `DYNAMO_NAMESPACE=org-1` can handle requests for both models as shown below: | ||
- `org-1/llama-8b` | ||
- `org-1/llama-70b` | ||
|
||
 | ||
|
||
for example, | ||
Frontend components in `org-1` dynamo namespace can be used to serve llama-8b model version-A or verion-B for all users. | ||
|
||
|
||
#### Dynamo Planner: | ||
TODO | ||
|
||
|
||
#### Common components: | ||
1. etcd | ||
2. nats | ||
3. operator | ||
5. observability stack |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
open to other thoughts here - but want to pull back from the notion of mult-tenancy and consider this more of isolation and reuse of nats / etcd for dynamo clusters.
we should break it into nats / etcd as part of the dynamo cloud instance?
So each k8s namespace has an operator and nats / etcd
every deployment within that share that nats and etcd
in order to deploy multiple isolated graphs within that deployment - we need hierarchical namspaces
I think we should motivate that last via customer requirement - is that needed? Likely yes -