-
Notifications
You must be signed in to change notification settings - Fork 0
Feature/epd #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/epd #2
Conversation
debuglog f to %s
Summary of ChangesHello @ST-XX, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request primarily focuses on optimizing internal communication and data handling within the FastDeploy framework, with a strong emphasis on multimodal inputs and request scheduling. Key changes include migrating the serialization mechanism to Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces several changes, primarily focused on enabling and optimizing multimodal data handling with disaggregation support. Key modifications include switching from orjson to pickle for serialization to better handle complex data types, refactoring asynchronous task fetching logic, and adding specific handling for multimodal inputs in the GPU model runner. The changes are generally positive, especially the serialization improvements. However, I've identified two critical issues: one in the ZMQ message deserialization logic that could lead to failures, and another in the multimodal input handling that could cause a runtime error. Addressing these issues is crucial for the stability of the new features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces several changes, including refactoring for thread safety by adjusting the is_fetching flag logic, and switching the serialization method from orjson to pickle to better support complex data types like numpy arrays. It also includes a bug fix for a duplicated line of code and improves type correctness by changing a port's type hint from string to integer. Additionally, new logic for handling multimodal inputs has been added.
My review identifies two critical issues that could lead to runtime errors. One is in fastdeploy/engine/request.py where a None check is missing before accessing multimodal_inputs, which could cause a TypeError. The other is in fastdeploy/worker/gpu_model_runner.py, where there are multiple potential TypeError, KeyError, or AttributeError exceptions when handling position_ids for multimodal inputs. I have provided suggestions to fix these issues.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant enhancements for multimodal disaggregated deployments, primarily by switching to pickle for more efficient serialization of numpy arrays. The changes also include important correctness and robustness fixes, such as preventing a race condition in request fetching, using safer dictionary access patterns, and correcting the data type for network ports. Overall, the changes are well-implemented and improve the performance and stability of the system. I have a few minor suggestions to improve type hinting and robustness further.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.