Skip to content

Conversation

wxsms
Copy link
Contributor

@wxsms wxsms commented Apr 28, 2025

Overview:

As for #656 , we can add namespace to dynamo to allow multiple instances to share etcd/nats. However, currently in llm example:

  1. namespace are hardcoded in lots of place, it's not easy to use and change.
  2. nats streams has no respect to namespace

I sugguest we use a single env var to customize it. And if you guys have any better idea, please let me know. Thanks!

Details:

  1. change all hardcoded namespace to use a env var
  2. add namespace to nat stream between decode and prefill
  3. remove configurable endpoint in frontend (since this is related to namespace, we can only read it in python)

Where should the reviewer start?

components/configs

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

none

Summary by CodeRabbit

  • New Features
    • Introduced dynamic namespace support throughout the application, allowing the namespace to be set via an environment variable.
  • Improvements
    • Replaced all hardcoded namespace strings with dynamic retrieval, ensuring consistent and flexible namespace usage.
  • Configuration
    • Removed the need to manually specify the endpoint field in configuration files; endpoints are now constructed internally.
  • Bug Fixes
    • Ensured all components consistently use the correct runtime namespace, reducing potential misconfiguration.

Copy link

copy-pr-bot bot commented Apr 28, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link

👋 Hi wxsms! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added the external-contribution Pull request is from an external contributor label Apr 28, 2025
import os


def get_namespace():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: @ishandhanani , @mohammedabdulwahhab is something we could check from command line arg / graph for default as well? that could be a way to set the default and override in the graph config ...

Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jun 11, 2025
Copy link
Contributor

coderabbitai bot commented Jun 11, 2025

Walkthrough

This change introduces dynamic namespace resolution across multiple LLM service components by replacing hardcoded "dynamo" strings with a new get_namespace() utility function. All configuration files remove the explicit endpoint field, and the FrontendConfig model no longer includes an endpoint attribute. The new utility centralizes namespace retrieval using an environment variable.

Changes

Files/Groups Change Summary
examples/llm/utils/ns.py Added new module with get_namespace() to fetch namespace from DYN_NAMESPACE env var or default to "dynamo".
examples/llm/components/frontend.py Replaced hardcoded namespace and endpoint logic with get_namespace(); removed endpoint from FrontendConfig.
examples/llm/components/disagg_router.py,
kv_router.py,
processor.py,
prefill_worker.py,
worker.py
Replaced all hardcoded "dynamo" namespace strings with calls to get_namespace() for service decorators and internals.
examples/llm/configs/agg.yaml,
agg_router.yaml,
disagg.yaml,
disagg_router.yaml,
multinode-405b.yaml,
multinode_agg_r1.yaml,
mutinode_disagg_r1.yaml
Removed endpoint field from Frontend section in all configuration files.

Sequence Diagram(s)

sequenceDiagram
    participant Env as Environment
    participant Utils as utils.ns.get_namespace()
    participant Component as LLM Component (e.g., Frontend, Worker)
    Env->>Utils: Set DYN_NAMESPACE (optional)
    Component->>Utils: get_namespace()
    Utils-->>Component: Return namespace (from env or "dynamo")
    Component->>Component: Use namespace for service, cache, queue, etc.
Loading

Poem

🐇
A namespace once fixed, now set free,
Dynamic as clouds, as code bunnies decree.
No more "dynamo" hardwired in place,
Each service now finds its own space.
With configs trimmed and logic neat,
Our LLM garden grows—namespace complete!

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the feat label Jun 11, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (2)
examples/llm/utils/ns.py (1)

17-22: 💡 Verification agent

🧩 Analysis chain

Guard against empty env var & memoise the result

If DYN_NAMESPACE is present but empty ("") the function now returns an empty string.
That yields paths like //processor/ and an empty NATS / etcd namespace, which silently breaks routing.
While we are here, an os.getenv lookup for every call is unnecessary.

-import os
+import os
+from functools import lru_cache
@@
-def get_namespace():
-    return os.getenv("DYN_NAMESPACE", "dynamo")
+@lru_cache(maxsize=None)
+def get_namespace() -> str:
+    ns = os.getenv("DYN_NAMESPACE", "").strip()
+    return ns or "dynamo"

🏁 Script executed:

#!/bin/bash
# Find occurrences of get_namespace definition and its usages
rg -n "get_namespace" 

# Find all references to the environment variable key
rg -n "DYN_NAMESPACE"

Length of output: 3335


}


🏁 Script executed:

#!/bin/bash
# Display the get_namespace function in the k8s.py file to compare implementations
sed -n '130,160p' deploy/dynamo/api-store/ai_dynamo_store/api/k8s.py

Length of output: 993


Guard against empty DYN_NAMESPACE and memoize its lookup in examples/llm/utils/ns.py

To prevent returning an empty string when DYN_NAMESPACE is defined but blank (which leads to invalid paths like //processor/ or empty NATS/etcd namespaces), and to avoid repeated env-var lookups on every call, update get_namespace() as follows:

 import os
+from functools import lru_cache

- def get_namespace():
-    return os.getenv("DYN_NAMESPACE", "dynamo")
+@lru_cache(maxsize=None)
+def get_namespace() -> str:
+    ns = os.getenv("DYN_NAMESPACE", "").strip()
+    return ns or "dynamo"

• This ensures that if DYN_NAMESPACE="" or unset, you’ll still get "dynamo".
• The lru_cache decorator caches the result so subsequent calls are fast.

Additionally, there’s a separate get_namespace() in
deploy/dynamo/api-store/ai_dynamo_store/api/k8s.py (using DEFAULT_KUBE_NAMESPACE).
You may want to apply the same empty-string guard (and optional memoization) there for consistency.

examples/llm/components/worker.py (1)

66-72: 💡 Verification agent

🧩 Analysis chain

Prefill stream name may be invalid for NATS

self._prefill_queue_stream_name = get_namespace() + "_" + self.model_name

  1. NATS JetStream stream names are restricted (no dots, spaces, etc.).
    A user-supplied namespace like "team.alpha" or a model name with / will break stream creation.
  2. String concatenation repeats the env lookup; cache once.

Consider:

-        self._prefill_queue_stream_name = get_namespace() + "_" + self.model_name
+        self._namespace = get_namespace()
+        safe_model = self.model_name.replace("/", "_").replace(".", "_")
+        self._prefill_queue_stream_name = f"{self._namespace}_{safe_model}"

Add a small helper to sanitise / validate the final name.


🌐 Web query:

What characters are allowed in NATS JetStream stream names? Are there any restrictions on using dots, slashes, or other special characters?

💡 Result:

In NATS JetStream, stream names must adhere to specific character restrictions to ensure compatibility with the subject namespace and file system. The following characters are prohibited in stream names:

  • Whitespace characters (spaces, tabs)
  • Period (.)
  • Asterisk (*)
  • Greater-than (>)
  • Path separators (forward slash / and backward slash \)
  • Non-printable characters

Additionally, it's recommended to keep stream names under 32 characters to prevent potential issues with file system compatibility. (github.com)

By following these guidelines, you can ensure that your stream names are valid and function correctly within the NATS JetStream environment.

Citations:


Sanitize JetStream stream name before use

NATS JetStream stream names cannot contain whitespace, ., *, >, /, \ or non-printable characters and should stay under ~32 chars. If your namespace or model_name include any of these, stream creation will fail.

Please update examples/llm/components/worker.py around lines 66–72:

  • Cache the namespace once, not inline.
  • Filter out prohibited characters on both parts.
  • Optionally enforce the 32-char recommendation.

Suggested diff:

@@ examples/llm/components/worker.py:66
-        self._prefill_queue_stream_name = get_namespace() + "_" + self.model_name
+        import re
+
+        # Cache and sanitize namespace/model for JetStream
+        ns = get_namespace()
+        def _sanitize(name: str) -> str:
+            # replace dots, slashes, whitespace, *, >, etc.
+            return re.sub(r"[.\s*/>\\\*]", "_", name)
+
+        safe_ns = _sanitize(ns)
+        safe_model = _sanitize(self.model_name)
+        stream_name = f"{safe_ns}_{safe_model}"
+        if len(stream_name) > 32:
+            raise ValueError(f"JetStream stream name too long ({len(stream_name)} > 32)")
+
+        self._prefill_queue_stream_name = stream_name
  • Adds a helper to enforce valid characters.
  • Caches the namespace lookup.
  • Guards against overly long names.
♻️ Duplicate comments (3)
examples/llm/components/disagg_router.py (1)

41-44: Same namespace-prefix concern as in Processor

Same comments about repeated get_namespace() calls and empty-string handling apply here.

examples/llm/components/kv_router.py (1)

98-109: Namespace fetched repeatedly – consider caching

get_namespace() is invoked for the decorator, for the workers client, and for the KV listener.
Cache once (e.g., self._namespace) to ensure consistency and avoid redundant look-ups.

examples/llm/components/worker.py (1)

90-97: Env var propagation relies on same cached namespace

os.environ["VLLM_KV_NAMESPACE"] = get_namespace() repeats the lookup.
Reuse self._namespace from the previous suggestion to guarantee identical values.

🧹 Nitpick comments (11)
examples/llm/components/processor.py (1)

108-118: Avoid multiple env look-ups, keep consistency inside the class

get_namespace() is called three times (kv_listener, EtcdKvCache prefix, etc.).
If the env var were ever changed at runtime you might end up with mixed prefixes.
Store it once in __init__:

-        print(f"Processor init: {self.engine_args.router}")
+        self._namespace = get_namespace()
+        print(f"Processor init: {self.engine_args.router} (ns={self._namespace})")

and reuse self._namespace everywhere.

examples/llm/components/prefill_worker.py (4)

24-24: Resolve namespace once and reuse

get_namespace() is called repeatedly in this module. Resolve it once at import-time and reuse the constant – cheaper, safer (the env var can’t “mysteriously” change midway) and keeps the string identical everywhere.

+# Resolve namespace once for the entire module
+NAMESPACE = get_namespace()

Then replace subsequent calls with NAMESPACE.


44-48: Use the constant in the decorator as well

After introducing NAMESPACE, plug it in here to prevent an extra env-lookup during import:

-        "namespace": get_namespace(),
+        "namespace": NAMESPACE,

90-93: Minor: replace repeated lookup

Same comment as above – substitute with the cached constant.

-        self._metadata_store = NixlMetadataStore(get_namespace(), runtime)
+        self._metadata_store = NixlMetadataStore(NAMESPACE, runtime)

123-129: Prefer f-string for readability

The stream name is easier to read (and a tiny bit faster) with an f-string:

-        prefill_queue_stream_name = (
-            get_namespace()
-            + "_"
-            + (
-                self.engine_args.served_model_name
-                if self.engine_args.served_model_name is not None
-                else "vllm"
-            )
-        )
+        model = self.engine_args.served_model_name or "vllm"
+        prefill_queue_stream_name = f"{NAMESPACE}_{model}"
examples/llm/components/frontend.py (6)

24-24: Cache namespace once per module

Same optimisation as in prefill_worker.py. Add at top:

+NAMESPACE = get_namespace()

and reuse.


53-56: Decorator should also use cached constant

-        "namespace": get_namespace(),
+        "namespace": NAMESPACE,

79-88: Sub-commands: avoid repeated look-ups

Every llmctl invocation fetches the env var again. Substitute with NAMESPACE:

-                "-n",
-                get_namespace(),
+                "-n",
+                NAMESPACE,

(Apply to both remove and add calls.)


93-101: Endpoint construction can reuse the constant

-                f"{get_namespace()}.Processor.chat/completions",
+                f"{NAMESPACE}.Processor.chat/completions",

110-116: Pass cached namespace to HTTP server

-                "--namespace",
-                get_namespace(),
+                "--namespace",
+                NAMESPACE,

127-131: Cleanup call: reuse constant

-                "-n",
-                get_namespace(),
+                "-n",
+                NAMESPACE,
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 94702c7 and 6afdd57.

📒 Files selected for processing (14)
  • examples/llm/components/disagg_router.py (2 hunks)
  • examples/llm/components/frontend.py (6 hunks)
  • examples/llm/components/kv_router.py (3 hunks)
  • examples/llm/components/prefill_worker.py (4 hunks)
  • examples/llm/components/processor.py (3 hunks)
  • examples/llm/components/worker.py (5 hunks)
  • examples/llm/configs/agg.yaml (0 hunks)
  • examples/llm/configs/agg_router.yaml (0 hunks)
  • examples/llm/configs/disagg.yaml (0 hunks)
  • examples/llm/configs/disagg_router.yaml (0 hunks)
  • examples/llm/configs/multinode-405b.yaml (0 hunks)
  • examples/llm/configs/multinode_agg_r1.yaml (0 hunks)
  • examples/llm/configs/mutinode_disagg_r1.yaml (0 hunks)
  • examples/llm/utils/ns.py (1 hunks)
💤 Files with no reviewable changes (7)
  • examples/llm/configs/mutinode_disagg_r1.yaml
  • examples/llm/configs/multinode_agg_r1.yaml
  • examples/llm/configs/disagg.yaml
  • examples/llm/configs/multinode-405b.yaml
  • examples/llm/configs/agg_router.yaml
  • examples/llm/configs/disagg_router.yaml
  • examples/llm/configs/agg.yaml
🧰 Additional context used
🧬 Code Graph Analysis (5)
examples/llm/components/disagg_router.py (1)
examples/llm/utils/ns.py (1)
  • get_namespace (20-21)
examples/llm/components/prefill_worker.py (2)
examples/llm/utils/ns.py (1)
  • get_namespace (20-21)
examples/llm/utils/nixl.py (1)
  • NixlMetadataStore (58-109)
examples/llm/components/kv_router.py (2)
examples/llm/utils/ns.py (1)
  • get_namespace (20-21)
examples/llm/utils/logging.py (1)
  • check_required_workers (24-45)
examples/llm/components/worker.py (2)
examples/llm/utils/ns.py (1)
  • get_namespace (20-21)
examples/llm/utils/nixl.py (1)
  • NixlMetadataStore (58-109)
examples/llm/components/frontend.py (1)
examples/llm/utils/ns.py (1)
  • get_namespace (20-21)
🔇 Additional comments (2)
examples/llm/components/processor.py (1)

46-53: Namespace captured at import-time – make sure the env var is set before module import

get_namespace() is evaluated when the module is imported (inside the @service decorator).
If the launcher sets DYN_NAMESPACE after the import, the service will still register under the default "dynamo" namespace. Double-check your entry-point scripts / helm charts so the env var is exported before Python starts.

examples/llm/components/frontend.py (1)

20-21:

✅ Verification successful

Confirm utils.ns imports os

get_namespace() relies on os.getenv, but the snippet shown doesn’t include import os. Please verify the import exists to avoid a NameError.


🏁 Script executed:

#!/bin/bash
# Check that utils/ns.py contains 'import os'
rg -n "^import os" examples/llm/utils/ns.py || {
  echo "Missing 'import os' in utils/ns.py" >&2
  exit 1
}

Length of output: 58


No missing os import in utils/ns.py

Confirmed that examples/llm/utils/ns.py includes import os (line 17). No changes required.

@github-actions github-actions bot removed the Stale label Jun 12, 2025
Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jul 14, 2025
@grahamking
Copy link
Contributor

Replaced by ai-dynamo/enhancements#28

@grahamking grahamking closed this Aug 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor feat size/M Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants