Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support custom device mesh for tensor parallel workers #3757

Closed
wants to merge 236 commits into from

Conversation

fzyzcjy
Copy link
Collaborator

@fzyzcjy fzyzcjy commented Feb 21, 2025

Motivation

Just #2827 but updated

To allow users (e.g. Verl) to pass in custom device mesh, the parallel_state.py is changed, and uses a provided device mesh instead of creating its own groups.

Modifications

Checklist

fzyzcjy and others added 28 commits February 22, 2025 12:00
# Conflicts:
#	python/sglang/srt/entrypoints/engine.py
#	python/sglang/srt/managers/generation_manager.py
#	python/sglang/srt/managers/tokenizer_manager.py
# Conflicts:
#	python/sglang/srt/entrypoints/engine.py
#	python/sglang/srt/entrypoints/http_server.py
#	python/sglang/srt/managers/scheduler.py
# Conflicts:
#	python/sglang/srt/entrypoints/engine_fragment.py
#	python/sglang/srt/orchestration/spmd/orchestrator.py
@fzyzcjy
Copy link
Collaborator Author

fzyzcjy commented Feb 26, 2025

After discussing with Lianmin, now I know there is no need to follow implementation requirements in e.g. 2736. Thus I spent several hours writing a new PR #3852, and this one is deprecated.

@fzyzcjy fzyzcjy closed this Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants