feat: Support Disaggregated Prefilling (experimental) #7

gaocegege · 2025-01-23T06:50:20Z

Thanks for the project.

The documentation at https://docs.vllm.ai/en/latest/features/disagg_prefill.html introduces a proxy server along with prefill and decode instances. I am uncertain whether the proxy server overlaps with the router in this project.

However, I am confident that it is not compatible with the Helm chart. Ideally, a Kubernetes Custom Resource Definition (CRD) should be implemented instead of a Helm chart to accommodate more complex deployment configurations.

Just raising this for discussion—it shouldn't be considered a high priority at this time.

ApostaC · 2025-01-23T15:37:08Z

Thanks @gaocegege ! We are currently discussing the potential solutions with @KuntaiDu (the main contributor of vLLM disagg prefill functionality).

One potential solution is to integrate the proxy server functionality into the router, so it does not need extra k8s-level configurations.

ApostaC · 2025-01-23T15:37:58Z

Will create an RFC issue once we have a more concrete design.

gaocegege · 2025-01-24T08:37:49Z

Thanks, I am closing this since there will be a RFC.

gaocegege closed this as completed Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support Disaggregated Prefilling (experimental) #7

feat: Support Disaggregated Prefilling (experimental) #7

gaocegege commented Jan 23, 2025

ApostaC commented Jan 23, 2025

ApostaC commented Jan 23, 2025

gaocegege commented Jan 24, 2025

feat: Support Disaggregated Prefilling (experimental) #7

feat: Support Disaggregated Prefilling (experimental) #7

Comments

gaocegege commented Jan 23, 2025

ApostaC commented Jan 23, 2025

ApostaC commented Jan 23, 2025

gaocegege commented Jan 24, 2025