Feature/epd #2

ST-XX · 2025-12-02T04:01:03Z

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

debuglog f to %s

gemini-code-assist · 2025-12-02T04:01:25Z

Summary of Changes

Hello @ST-XX, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on optimizing internal communication and data handling within the FastDeploy framework, with a strong emphasis on multimodal inputs and request scheduling. Key changes include migrating the serialization mechanism to pickle for improved data type compatibility and performance, refining the processing of multimodal inputs, and addressing minor issues in the request fetching logic and router configuration for better robustness and consistency.

Highlights

Serialization Protocol Update: The pull request transitions the serialization mechanism from orjson (JSON) to pickle (protocol 5) for request objects within the scheduler and connector components. This change enhances support for complex Python data types and improves efficiency, particularly when handling large arrays.
Multimodal Input Handling Enhancements: Conditional filtering for multimodal input parameters has been introduced, dependent on the envs.ENABLE_V1_KVCACHE_SCHEDULER flag. Additionally, specific logic for processing position_ids in the GPU model runner was added, and checks for image presence in multimodal tasks were made more robust.
Request Scheduling Logic Refinement: Adjustments were made to the state management of the is_fetching flag within the common engine's request scheduling, ensuring more accurate asynchronous behavior and preventing potential race conditions.
Router Port Type Standardization: The port parameter in the router configuration and its corresponding command-line argument have been standardized to use an integer type instead of a string, aligning with common practices for port numbers.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

ST-XX · 2025-12-02T04:02:14Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces several changes, primarily focused on enabling and optimizing multimodal data handling with disaggregation support. Key modifications include switching from orjson to pickle for serialization to better handle complex data types, refactoring asynchronous task fetching logic, and adding specific handling for multimodal inputs in the GPU model runner. The changes are generally positive, especially the serialization improvements. However, I've identified two critical issues: one in the ZMQ message deserialization logic that could lead to failures, and another in the multimodal input handling that could cause a runtime error. Addressing these issues is crucial for the stability of the new features.

fastdeploy/splitwise/splitwise_connector.py

fastdeploy/worker/gpu_model_runner.py

gemini-code-assist

Code Review

This pull request introduces several changes, including refactoring for thread safety by adjusting the is_fetching flag logic, and switching the serialization method from orjson to pickle to better support complex data types like numpy arrays. It also includes a bug fix for a duplicated line of code and improves type correctness by changing a port's type hint from string to integer. Additionally, new logic for handling multimodal inputs has been added.

My review identifies two critical issues that could lead to runtime errors. One is in fastdeploy/engine/request.py where a None check is missing before accessing multimodal_inputs, which could cause a TypeError. The other is in fastdeploy/worker/gpu_model_runner.py, where there are multiple potential TypeError, KeyError, or AttributeError exceptions when handling position_ids for multimodal inputs. I have provided suggestions to fix these issues.

fastdeploy/worker/gpu_model_runner.py

fastdeploy/engine/request.py

ST-XX · 2025-12-02T05:17:55Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces significant enhancements for multimodal disaggregated deployments, primarily by switching to pickle for more efficient serialization of numpy arrays. The changes also include important correctness and robustness fixes, such as preventing a race condition in request fetching, using safer dictionary access patterns, and correcting the data type for network ports. Overall, the changes are well-implemented and improve the performance and stability of the system. I have a few minor suggestions to improve type hinting and robustness further.

fastdeploy/worker/gpu_model_runner.py

fastdeploy/splitwise/splitwise_connector.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

ST-XX added 5 commits November 26, 2025 14:19

RouterArgs port str -> int

7716060

fix race condition [is_fetching] causing multiple fetch requests

ced50ef

bugfix: Delete duplicate input_ids tensor creation

3a75fa9

mm pd splitwise json -> pickle5; multimodal_inputs only pos id;

5df9fe4

debuglog f to %s

fix ENABLE_V1_KVCACHE_SCHEDULER=0 mm model lack pos_id, ...

17fcd96

gemini-code-assist bot reviewed Dec 2, 2025

View reviewed changes

fastdeploy/splitwise/splitwise_connector.py Show resolved Hide resolved

fastdeploy/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Dec 2, 2025

View reviewed changes

fastdeploy/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

fastdeploy/engine/request.py Outdated Show resolved Hide resolved

update cr

3be161f

gemini-code-assist bot reviewed Dec 2, 2025

View reviewed changes

fastdeploy/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

fastdeploy/splitwise/splitwise_connector.py Outdated Show resolved Hide resolved

Apply suggestions from code review

817c9f6

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

ST-XX merged commit aa7b584 into develop Dec 2, 2025
10 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/epd #2

Feature/epd #2

Uh oh!

ST-XX commented Dec 2, 2025

Uh oh!

gemini-code-assist bot commented Dec 2, 2025

Uh oh!

ST-XX commented Dec 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

ST-XX commented Dec 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feature/epd #2

Feature/epd #2

Uh oh!

Conversation

ST-XX commented Dec 2, 2025

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

gemini-code-assist bot commented Dec 2, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

ST-XX commented Dec 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

ST-XX commented Dec 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants