Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support Disaggregated Prefilling (experimental) #7

Closed
gaocegege opened this issue Jan 23, 2025 · 3 comments
Closed

feat: Support Disaggregated Prefilling (experimental) #7

gaocegege opened this issue Jan 23, 2025 · 3 comments

Comments

@gaocegege
Copy link
Contributor

Thanks for the project.

The documentation at https://docs.vllm.ai/en/latest/features/disagg_prefill.html introduces a proxy server along with prefill and decode instances. I am uncertain whether the proxy server overlaps with the router in this project.

However, I am confident that it is not compatible with the Helm chart. Ideally, a Kubernetes Custom Resource Definition (CRD) should be implemented instead of a Helm chart to accommodate more complex deployment configurations.

Just raising this for discussion—it shouldn't be considered a high priority at this time.

@ApostaC
Copy link
Collaborator

ApostaC commented Jan 23, 2025

Thanks @gaocegege ! We are currently discussing the potential solutions with @KuntaiDu (the main contributor of vLLM disagg prefill functionality).

One potential solution is to integrate the proxy server functionality into the router, so it does not need extra k8s-level configurations.

@ApostaC
Copy link
Collaborator

ApostaC commented Jan 23, 2025

Will create an RFC issue once we have a more concrete design.

@gaocegege
Copy link
Contributor Author

Thanks, I am closing this since there will be a RFC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants