Skip to content

Conversation

@ST-XX
Copy link
Owner

@ST-XX ST-XX commented Dec 2, 2025

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@gemini-code-assist
Copy link

Summary of Changes

Hello @ST-XX, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on optimizing internal communication and data handling within the FastDeploy framework, with a strong emphasis on multimodal inputs and request scheduling. Key changes include migrating the serialization mechanism to pickle for improved data type compatibility and performance, refining the processing of multimodal inputs, and addressing minor issues in the request fetching logic and router configuration for better robustness and consistency.

Highlights

  • Serialization Protocol Update: The pull request transitions the serialization mechanism from orjson (JSON) to pickle (protocol 5) for request objects within the scheduler and connector components. This change enhances support for complex Python data types and improves efficiency, particularly when handling large arrays.
  • Multimodal Input Handling Enhancements: Conditional filtering for multimodal input parameters has been introduced, dependent on the envs.ENABLE_V1_KVCACHE_SCHEDULER flag. Additionally, specific logic for processing position_ids in the GPU model runner was added, and checks for image presence in multimodal tasks were made more robust.
  • Request Scheduling Logic Refinement: Adjustments were made to the state management of the is_fetching flag within the common engine's request scheduling, ensuring more accurate asynchronous behavior and preventing potential race conditions.
  • Router Port Type Standardization: The port parameter in the router configuration and its corresponding command-line argument have been standardized to use an integer type instead of a string, aligning with common practices for port numbers.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@ST-XX
Copy link
Owner Author

ST-XX commented Dec 2, 2025

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several changes, primarily focused on enabling and optimizing multimodal data handling with disaggregation support. Key modifications include switching from orjson to pickle for serialization to better handle complex data types, refactoring asynchronous task fetching logic, and adding specific handling for multimodal inputs in the GPU model runner. The changes are generally positive, especially the serialization improvements. However, I've identified two critical issues: one in the ZMQ message deserialization logic that could lead to failures, and another in the multimodal input handling that could cause a runtime error. Addressing these issues is crucial for the stability of the new features.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several changes, including refactoring for thread safety by adjusting the is_fetching flag logic, and switching the serialization method from orjson to pickle to better support complex data types like numpy arrays. It also includes a bug fix for a duplicated line of code and improves type correctness by changing a port's type hint from string to integer. Additionally, new logic for handling multimodal inputs has been added.

My review identifies two critical issues that could lead to runtime errors. One is in fastdeploy/engine/request.py where a None check is missing before accessing multimodal_inputs, which could cause a TypeError. The other is in fastdeploy/worker/gpu_model_runner.py, where there are multiple potential TypeError, KeyError, or AttributeError exceptions when handling position_ids for multimodal inputs. I have provided suggestions to fix these issues.

@ST-XX
Copy link
Owner Author

ST-XX commented Dec 2, 2025

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant enhancements for multimodal disaggregated deployments, primarily by switching to pickle for more efficient serialization of numpy arrays. The changes also include important correctness and robustness fixes, such as preventing a race condition in request fetching, using safer dictionary access patterns, and correcting the data type for network ports. Overall, the changes are well-implemented and improve the performance and stability of the system. I have a few minor suggestions to improve type hinting and robustness further.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@ST-XX ST-XX merged commit aa7b584 into develop Dec 2, 2025
10 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants