-
Notifications
You must be signed in to change notification settings - Fork 5
DynamoModel: Kubernetes Custom Resource to simplify Model Lifecycle Management #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
2e4fb3f
to
9cd4ff8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the DGD controller shouldn't launch the model job.
The new DynamoModel should be in charge of reconciling all DynamoModel and launching jobs for models.
DGD reconciliation loop should requeue until DynamoModel is ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, I'll fix the diagram.
@biswapanda I think it would be worthwhile to at least do two thought experiments about how we'd use the new CR with the following constraints:
I think this would give us some insight if we're designing this well enough. Happy to expand on the above if needed. |
### Storage Persistence | ||
Downloaded model weights MUST be stored in Persistent Volume Claims (PVCs) that persist beyond the lifecycle of individual DGDs, enabling reuse across multiple deployments. | ||
|
||
### Credential Management |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For downloading from S3, it will not be via a Secret but via IAM (similar for other object stores), so we should be thoughtful on how this will work. It should not require someone to inject IAM credentials here, and instead should be getting it via normal means (e.g. IRSA in AWS)
### Status-Based Readiness | ||
DynamoModel MUST expose a status field indicating readiness states (`Pending`, `Downloading`, `Ready`, `Failed`). Dependent resources (DGD, AIperf Job) SHOULD be able to wait for `Ready` state before proceeding. | ||
|
||
### Storage Persistence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per my overall comment, let's re-think whether this is a true requirement.
name: huggingface-token | ||
key: token | ||
# Storage configuration | ||
storage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth a discussion on whether we want the DynamoModel to be in charge of setting up the PVC or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another q: If user deletes the DynamoModel CR, does model in PVC get deleted? what's the expected logic here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another: can multiple models be stored in a single PVC - or do we see this as a pvc per model?
VllmPrefillWorker: | ||
modelRef: | ||
name: llama-3-70b-instruct-v1 | ||
mountPath: /models # Where to mount in container |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this might be insufficient? For example, how will the launched worker know what the model key is to load and register in the MDC?
Also, I think it would be helpful to see how other systems do similar things and where they differ. Some that we can look at: AIBrix, Arks, OME Adding a section on this would be valuable. |
|
# subPath: llama-3-70b | ||
|
||
# Optional: Download configuration (defaults to HF Downloader or Base Dynamo image with HF) | ||
downloader: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might want to consider securityContext in this spec to make this JOb work on Openshift-style envs, that way you can run as non-root user. not a blocking issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes good point - we can add extraPodSpec
to allow users set security context, tolerations etc.
are we considering checksums in the scope of this? np if not in scope |
overall LGTM, great proposal @biswapanda. Left comments above; suggest we think strongly about the dependency on RWX |
lgtm as well overall we should definitely persist this metadata in CRDs when dynamo is running k8s already mentioned to you @biswapanda, but wanted to guage others opinions here as well: I am wondering if we should have model express persist this metadata as well since it is maintaining a dedicated database outside of etcd. This would decouple the dependency to run dynamo on k8s (and have etcd) in order to have access to this metadata. |
|
||
Currently, Dynamo users face three critical challenges: | ||
|
||
1. **Model Version Drift**: Inconsistent behavior occurs when AI-perf benchmarks use different model versions than deployments. This was observed during 70B model benchmarking where the deployment used stale weights while the benchmark job pulled the latest commit from HuggingFace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: when we say the benchmark job used the latest - is this referencing the tokenizer for ai perf for example mismatched with the weights of the deployment? Not sure I follow what 'drift' means here - maybe 'mismatch' is a term - where different components can have different versions of the model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I meant unintentional version mismatch by pointing to main
revision of a huggingface hub model.
There was tokenizer/config json changes while weights remained the same.
In this case aiperf was epehmeral job running on main ToT and deployment was an older snapshot of main
of a huggingface hub model
|
||
1. **Model Version Drift**: Inconsistent behavior occurs when AI-perf benchmarks use different model versions than deployments. This was observed during 70B model benchmarking where the deployment used stale weights while the benchmark job pulled the latest commit from HuggingFace. | ||
|
||
2. **No Cross-Deployment/perf job Model Reuse**: Multiple DGDs or aiperf jobs cannot easily share the same model weights, leading to duplicated operational overhead managing PVCs, secrets, and Jobs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: does this mean that multiple deployments can't share a PVC or other shared storage for weights? would model express solve this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, there is a path to use model express. But overall k8s spec is a contract/interface and user can
DynamoModel CRD is orthogonal and works with modelexpress
~
- enable model express
- bring their own model registry (some folks used JFrog. ML Flow etc..)
- directly download from HF
Added an additional feature for model validation ~ we need a mechanism to generate checksums for a commit. Currently Huggignface doesn't provide this OOTB. |
This proposal introduces
DynamoModel
, a dedicated Kubernetes Custom Resource (CR) for managing model lifecycle in the Dynamo ecosystem. DynamoModel decouples model downloading, versioning, and caching from DynamoGraphDeployment (DGD), enabling consistent model references across deployments, benchmarks, and services while eliminating boilerplate code and preventing model version drift.