Skip to content

Conversation

biswapanda
Copy link

This proposal introduces DynamoModel, a dedicated Kubernetes Custom Resource (CR) for managing model lifecycle in the Dynamo ecosystem. DynamoModel decouples model downloading, versioning, and caching from DynamoGraphDeployment (DGD), enabling consistent model references across deployments, benchmarks, and services while eliminating boilerplate code and preventing model version drift.

apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
  name: llama-3-70b-instruct-v1
  namespace: dynamo-system
spec:
  # Model identification
  modelName: meta-llama/Llama-3.3-70B-Instruct
  version: 8a4556b53a7d81d7e07db15eafb5af5dcd321b33  # HuggingFace commit SHA
  # Source configuration
  source:
    uri: hf://meta-llama/Llama-3.3-70B-Instruct
    secretRef:
      name: huggingface-token
      key: token
  # Storage configuration
  storage:
    pvc:
      create: true                       # Auto-create PVC
      name: llama-3-70b-instruct-v1-pvc  # Optional explicit name override defaults to <cr-name>-pvc
      storageClassName: fast-nvme        # Simple field for convenience
      size: 150Gi                        # Simple field for convenience
      accessModes:
          - ReadWriteMany
      extraPvcSpec: {}
    # OR reference existing PVC
    # pvc:
    #   name: existing-model-cache
    #   subPath: llama-3-70b

  # Optional: Download configuration (defaults to HF Downloader or Base Dynamo image with HF)
  downloader:
    image: my-registry/hf-downloader:my-tag # HF Downloader
    resources: {}
    retryLimit: 5
    timeout: 3600s

@biswapanda biswapanda self-assigned this Oct 7, 2025
@biswapanda biswapanda changed the title proposal for dynamo model CR DynamoModel: Kubernetes Custom Resource to simplify Model Lifecycle Management UX Oct 7, 2025
@biswapanda biswapanda changed the title DynamoModel: Kubernetes Custom Resource to simplify Model Lifecycle Management UX DynamoModel: Kubernetes Custom Resource to simplify Model Lifecycle Management Oct 7, 2025
@biswapanda biswapanda force-pushed the bis/model-management branch from 2e4fb3f to 9cd4ff8 Compare October 7, 2025 19:04

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the DGD controller shouldn't launch the model job.
The new DynamoModel should be in charge of reconciling all DynamoModel and launching jobs for models.
DGD reconciliation loop should requeue until DynamoModel is ready.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I'll fix the diagram.

@itay
Copy link

itay commented Oct 8, 2025

@biswapanda I think it would be worthwhile to at least do two thought experiments about how we'd use the new CR with the following constraints:

  1. Imagine we don't have a shared ReadWriteMany PVC and instead only have node-local storage. How would we utilize the CR?
  2. Imagine that the CR is not in charge of launching independent download jobs, i.e. we don't use the CR for caching purposes, but just to centralize information and lifecycle.

I think this would give us some insight if we're designing this well enough. Happy to expand on the above if needed.

### Storage Persistence
Downloaded model weights MUST be stored in Persistent Volume Claims (PVCs) that persist beyond the lifecycle of individual DGDs, enabling reuse across multiple deployments.

### Credential Management
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For downloading from S3, it will not be via a Secret but via IAM (similar for other object stores), so we should be thoughtful on how this will work. It should not require someone to inject IAM credentials here, and instead should be getting it via normal means (e.g. IRSA in AWS)

### Status-Based Readiness
DynamoModel MUST expose a status field indicating readiness states (`Pending`, `Downloading`, `Ready`, `Failed`). Dependent resources (DGD, AIperf Job) SHOULD be able to wait for `Ready` state before proceeding.

### Storage Persistence
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my overall comment, let's re-think whether this is a true requirement.

name: huggingface-token
key: token
# Storage configuration
storage:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth a discussion on whether we want the DynamoModel to be in charge of setting up the PVC or not.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another q: If user deletes the DynamoModel CR, does model in PVC get deleted? what's the expected logic here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another: can multiple models be stored in a single PVC - or do we see this as a pvc per model?

VllmPrefillWorker:
modelRef:
name: llama-3-70b-instruct-v1
mountPath: /models # Where to mount in container
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this might be insufficient? For example, how will the launched worker know what the model key is to load and register in the MDC?

@itay
Copy link

itay commented Oct 8, 2025

Also, I think it would be helpful to see how other systems do similar things and where they differ. Some that we can look at: AIBrix, Arks, OME

Adding a section on this would be valuable.

@athreesh
Copy link

athreesh commented Oct 8, 2025

  1. Imagine we don't have a shared ReadWriteMany PVC and instead only have node-local storage. How would we utilize the CR?
    double clicking on @itay feedback -- lot of cloud k8s envs don't support RWX. it would suck if customer's k8s environment didn't allow for this, and now they're immediately blocked

# subPath: llama-3-70b

# Optional: Download configuration (defaults to HF Downloader or Base Dynamo image with HF)
downloader:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to consider securityContext in this spec to make this JOb work on Openshift-style envs, that way you can run as non-root user. not a blocking issue

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes good point - we can add extraPodSpec to allow users set security context, tolerations etc.

@athreesh
Copy link

athreesh commented Oct 8, 2025

are we considering checksums in the scope of this? np if not in scope

@athreesh
Copy link

athreesh commented Oct 8, 2025

overall LGTM, great proposal @biswapanda. Left comments above; suggest we think strongly about the dependency on RWX

@KavinKrishnan
Copy link

lgtm as well overall

we should definitely persist this metadata in CRDs when dynamo is running k8s

already mentioned to you @biswapanda, but wanted to guage others opinions here as well:

I am wondering if we should have model express persist this metadata as well since it is maintaining a dedicated database outside of etcd. This would decouple the dependency to run dynamo on k8s (and have etcd) in order to have access to this metadata.


Currently, Dynamo users face three critical challenges:

1. **Model Version Drift**: Inconsistent behavior occurs when AI-perf benchmarks use different model versions than deployments. This was observed during 70B model benchmarking where the deployment used stale weights while the benchmark job pulled the latest commit from HuggingFace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: when we say the benchmark job used the latest - is this referencing the tokenizer for ai perf for example mismatched with the weights of the deployment? Not sure I follow what 'drift' means here - maybe 'mismatch' is a term - where different components can have different versions of the model?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I meant unintentional version mismatch by pointing to main revision of a huggingface hub model.

There was tokenizer/config json changes while weights remained the same.

In this case aiperf was epehmeral job running on main ToT and deployment was an older snapshot of main of a huggingface hub model


1. **Model Version Drift**: Inconsistent behavior occurs when AI-perf benchmarks use different model versions than deployments. This was observed during 70B model benchmarking where the deployment used stale weights while the benchmark job pulled the latest commit from HuggingFace.

2. **No Cross-Deployment/perf job Model Reuse**: Multiple DGDs or aiperf jobs cannot easily share the same model weights, leading to duplicated operational overhead managing PVCs, secrets, and Jobs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: does this mean that multiple deployments can't share a PVC or other shared storage for weights? would model express solve this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, there is a path to use model express. But overall k8s spec is a contract/interface and user can

DynamoModel CRD is orthogonal and works with modelexpress ~

  • enable model express
  • bring their own model registry (some folks used JFrog. ML Flow etc..)
  • directly download from HF

@biswapanda
Copy link
Author

checksums

Added an additional feature for model validation ~ we need a mechanism to generate checksums for a commit. Currently Huggignface doesn't provide this OOTB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants