Skip to content

Conversation

ajcasagrande
Copy link
Contributor

@ajcasagrande ajcasagrande commented Sep 28, 2025

This addresses a concern seen by a few users. Originally it was designed without a fallback on purpose for error simulation, but that is not relevant anymore. We can also decide to provide failure scenarios in other ways in the future.

Copy link

coderabbitai bot commented Sep 28, 2025

Walkthrough

Adds a fallback tokenizer mechanism across the mock server: config introduces a fallback_tokenizer field and env var, tokenizer service can set and use a fallback, app startup/config endpoints load and set it, and docs describe selection order and updated failure conditions. Minor import warning suppression was added.

Changes

Cohort / File(s) Summary
Docs: Fallback tokenizer behavior and config
integration-tests/README.md
Documents fallback tokenizer mechanism, selection order, new env var MOCK_SERVER_FALLBACK_TOKENIZER, config field fallback_tokenizer, and updates terminology (e.g., max_completion_tokens).
Config: Server-level fallback setting
integration-tests/mock_server/config.py
Adds fallback_tokenizer: str with default "Qwen/Qwen3-0.6B" to MockServerConfig.
Models: Configure message schema
integration-tests/mock_server/models.py
Adds optional `fallback_tokenizer: str
Tokenizer service: Fallback logic and imports
integration-tests/mock_server/tokenizer_service.py
Suppresses transformers import warnings; adds _fallback_tokenizer, set_fallback_tokenizer(...); get_tokenizer(...) now falls back to configured tokenizer and errors only if both requested and fallback are unavailable.
App: Startup/init and configure endpoint
integration-tests/mock_server/app.py
Consolidates tokenizer loading to include fallback; sets MOCK_SERVER_FALLBACK_TOKENIZER in env; configure endpoint can set and load a provided fallback_tokenizer.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant App as Mock Server App
  participant TokSvc as TokenizerService
  participant HF as Transformers

  rect rgb(245,245,255)
    note over App,TokSvc: Startup initialization
    App->>TokSvc: load_tokenizers(server_config.tokenizer_models + fallback)
    App->>TokSvc: set_fallback_tokenizer(server_config.fallback_tokenizer)
    TokSvc->>HF: Import AutoTokenizer (warnings suppressed)
    TokSvc-->>App: Tokenizers loaded
  end

  rect rgb(245,255,245)
    note over Client,TokSvc: Request-time tokenizer selection
    Client->>App: Request with model X
    App->>TokSvc: get_tokenizer(X)
    alt Tokenizer X loaded
      TokSvc-->>App: tokenizer(X)
    else Fallback available
      TokSvc-->>App: tokenizer(fallback)
    else Neither available
      TokSvc-->>App: ValueError (not found: X and fallback)
      App-->>Client: 404
    end
    App-->>Client: Proceed with tokenization using selected tokenizer
  end
Loading
sequenceDiagram
  autonumber
  actor Client
  participant App as Mock Server App
  participant TokSvc as TokenizerService

  rect rgb(255,250,240)
    note over Client,App: Configure endpoint
    Client->>App: POST /configure { fallback_tokenizer?, tokenizer_models? }
    alt fallback_tokenizer provided
      App->>TokSvc: load_tokenizers([fallback_tokenizer])
      App->>TokSvc: set_fallback_tokenizer(fallback_tokenizer)
      TokSvc-->>App: ack
    else no fallback provided
      App-->>Client: ack (no fallback change)
    end
    App-->>Client: 200 Config applied
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I twitched my ears at tokens’ fate,
When X went missing, don’t berate—
I hop to fallback, swift and clean,
A Qwen-ly net beneath the scene.
Now 404s are fewer, whee!
Carrots saved for you and me. 🥕✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly and accurately describes the primary enhancement introduced by this pull request, namely adding fallback tokenizer support to the mock server. It follows conventional “feat…” commit styling and clearly pinpoints the scope without unnecessary detail. It is concise, focused, and informative for teammates scanning the change history.
Docstring Coverage ✅ Passed Docstring coverage is 88.89% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ajc/mock

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

codecov bot commented Sep 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
integration-tests/mock_server/app.py (1)

43-51: Avoid loading duplicate tokenizer entries

If the fallback tokenizer is already listed in server_config.tokenizer_models, this will load the same model twice. You can keep order while deduplicating by normalizing the list before passing it to the service.

-    tokenizer_models = [
-        *server_config.tokenizer_models,
-        server_config.fallback_tokenizer,
-    ]
+    tokenizer_models = list(
+        dict.fromkeys(
+            [*server_config.tokenizer_models, server_config.fallback_tokenizer]
+        )
+    )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a58de00 and 3b49cb7.

📒 Files selected for processing (5)
  • integration-tests/README.md (9 hunks)
  • integration-tests/mock_server/app.py (3 hunks)
  • integration-tests/mock_server/config.py (1 hunks)
  • integration-tests/mock_server/models.py (1 hunks)
  • integration-tests/mock_server/tokenizer_service.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
integration-tests/mock_server/app.py (1)
integration-tests/mock_server/tokenizer_service.py (2)
  • set_fallback_tokenizer (72-74)
  • load_tokenizers (27-40)
🔇 Additional comments (3)
integration-tests/mock_server/config.py (1)

107-112: Fallback configuration wired cleanly

Field shape, description, and default tie in cleanly with the rest of the settings surface. Nice addition.

integration-tests/mock_server/models.py (1)

23-26: Model schema aligns with new config surface

Optional field makes sense here and keeps ConfigureMessage backwards compatible while exposing the new knob.

integration-tests/mock_server/app.py (1)

79-79: Good call adding the env propagation

Mirrors how the rest of the config is shared across workers; thanks for keeping that consistent.

Comment on lines +144 to +147
if request.fallback_tokenizer is not None:
tokenizer_service.load_tokenizers([request.fallback_tokenizer])
tokenizer_service.set_fallback_tokenizer(request.fallback_tokenizer)
logger.info(f"Fallback tokenizer set to {request.fallback_tokenizer}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Persist fallback choice back into server_config

We load and set the new fallback on the service, but server_config.fallback_tokenizer stays at its previous value. As a result, /configure responses and /health still report the old fallback, and any code that later reads server_config.fallback_tokenizer (including multi-worker env propagation or reload flows) will drift from the active tokenizer. Please assign it so the in-memory config stays truthful.

     if request.fallback_tokenizer is not None:
         tokenizer_service.load_tokenizers([request.fallback_tokenizer])
         tokenizer_service.set_fallback_tokenizer(request.fallback_tokenizer)
+        server_config.fallback_tokenizer = request.fallback_tokenizer
         logger.info(f"Fallback tokenizer set to {request.fallback_tokenizer}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if request.fallback_tokenizer is not None:
tokenizer_service.load_tokenizers([request.fallback_tokenizer])
tokenizer_service.set_fallback_tokenizer(request.fallback_tokenizer)
logger.info(f"Fallback tokenizer set to {request.fallback_tokenizer}")
if request.fallback_tokenizer is not None:
tokenizer_service.load_tokenizers([request.fallback_tokenizer])
tokenizer_service.set_fallback_tokenizer(request.fallback_tokenizer)
server_config.fallback_tokenizer = request.fallback_tokenizer
logger.info(f"Fallback tokenizer set to {request.fallback_tokenizer}")
🤖 Prompt for AI Agents
In integration-tests/mock_server/app.py around lines 144 to 147, the code sets
the fallback on the tokenizer_service but doesn't update the in-memory
server_config; set server_config.fallback_tokenizer = request.fallback_tokenizer
after loading/setting the tokenizer so the server_config reflects the active
fallback (ensure server_config is in scope or import it if needed).

Comment on lines 44 to 50
if model_name not in self._tokenizers:
raise ValueError(f"No tokenizer loaded for {model_name}")
if self._fallback_tokenizer not in self._tokenizers:
raise ValueError(
f"No tokenizer loaded for {model_name} or {self._fallback_tokenizer}"
)
model_name = self._fallback_tokenizer

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Restore lazy tokenizer loading before falling back.

We now short-circuit to the fallback whenever the requested model key is missing, which means we never even try to lazily load the requested tokenizer anymore. In the current server flows that rely on lazy loading (run without preloading, runtime configure updates, etc.), this silently swaps responses to the fallback tokenizer or just raises when no fallback is configured—a regression from today’s behavior. Please attempt to load the requested tokenizer first and only fall back when that load really fails, while also bootstrapping the fallback if it hasn’t been loaded yet.

-        if model_name not in self._tokenizers:
-            if self._fallback_tokenizer not in self._tokenizers:
-                raise ValueError(
-                    f"No tokenizer loaded for {model_name} or {self._fallback_tokenizer}"
-                )
-            model_name = self._fallback_tokenizer
+        if model_name not in self._tokenizers:
+            try:
+                logger.info(f"Lazy-loading tokenizer for model: {model_name}")
+                self._tokenizers[model_name] = AutoTokenizer.from_pretrained(
+                    model_name, trust_remote_code=True
+                )
+            except Exception as exc:
+                fallback = self._fallback_tokenizer
+                if not fallback:
+                    raise ValueError(
+                        f"No tokenizer loaded for {model_name}"
+                    ) from exc
+                if fallback not in self._tokenizers:
+                    logger.info(f"Lazy-loading fallback tokenizer: {fallback}")
+                    self._tokenizers[fallback] = AutoTokenizer.from_pretrained(
+                        fallback, trust_remote_code=True
+                    )
+                model_name = fallback
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if model_name not in self._tokenizers:
raise ValueError(f"No tokenizer loaded for {model_name}")
if self._fallback_tokenizer not in self._tokenizers:
raise ValueError(
f"No tokenizer loaded for {model_name} or {self._fallback_tokenizer}"
)
model_name = self._fallback_tokenizer
if model_name not in self._tokenizers:
try:
logger.info(f"Lazy-loading tokenizer for model: {model_name}")
self._tokenizers[model_name] = AutoTokenizer.from_pretrained(
model_name, trust_remote_code=True
)
except Exception as exc:
fallback = self._fallback_tokenizer
if not fallback:
raise ValueError(
f"No tokenizer loaded for {model_name}"
) from exc
if fallback not in self._tokenizers:
logger.info(f"Lazy-loading fallback tokenizer: {fallback}")
self._tokenizers[fallback] = AutoTokenizer.from_pretrained(
fallback, trust_remote_code=True
)
model_name = fallback

@ajcasagrande ajcasagrande marked this pull request as draft September 28, 2025 19:29
Copy link
Contributor

@the-david-oy the-david-oy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job catching this! This feature will be helpful.

logger.info("Tokenizer models loaded successfully")

if request.fallback_tokenizer is not None:
tokenizer_service.load_tokenizers([request.fallback_tokenizer])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always supposed to be called, even if the tokenizer specified exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i need to add a check to see if its already been loaded. Code rabbit brought that up a few tim4es.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants