[Bug] sglang crashes when running two deepseek-r1 instances #3837

echozyr2001 · 2025-02-25T07:47:27Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

[2025-02-25 06:55:06] INFO:     100.64.0.128:41863 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[2025-02-25 06:55:06 TP0] Prefill batch. #new-seq: 1, #new-token: 5395, #cached-token: 6, cache hit rate: 0.11%, token usage: 0.03, #running-req: 3, #queue-req: 0
[2025-02-25 06:55:09 TP0] Decode batch. #running-req: 4, #token: 23066, token usage: 0.04, gen throughput (token/s): 53.58, #queue-req: 0
[2025-02-25 06:55:11 TP0] Decode batch. #running-req: 4, #token: 23226, token usage: 0.05, gen throughput (token/s): 86.63, #queue-req: 0
[2025-02-25 06:55:13 TP0] Decode batch. #running-req: 4, #token: 23386, token usage: 0.05, gen throughput (token/s): 86.65, #queue-req: 0
[2025-02-25 06:55:15] INFO:     100.64.0.129:49365 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[2025-02-25 06:55:15 TP0] Prefill batch. #new-seq: 1, #new-token: 5397, #cached-token: 6, cache hit rate: 0.11%, token usage: 0.03, #running-req: 3, #queue-req: 0
[2025-02-25 06:55:16 TP0] Decode batch. #running-req: 4, #token: 22916, token usage: 0.04, gen throughput (token/s): 42.97, #queue-req: 0
[2025-02-25 06:55:18 TP0] Decode batch. #running-req: 2, #token: 11039, token usage: 0.02, gen throughput (token/s): 65.55, #queue-req: 0
[2025-02-25 06:55:19 TP0] Decode batch. #running-req: 2, #token: 11119, token usage: 0.02, gen throughput (token/s): 49.52, #queue-req: 0
[2025-02-25 06:55:21 TP0] Decode batch. #running-req: 1, #token: 5526, token usage: 0.01, gen throughput (token/s): 30.34, #queue-req: 0
[2025-02-25 06:55:22] INFO:     100.64.0.128:41863 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[2025-02-25 06:55:22 TP0] Prefill batch. #new-seq: 1, #new-token: 5391, #cached-token: 6, cache hit rate: 0.11%, token usage: 0.01, #running-req: 1, #queue-req: 0
[ubuntu:191  :0:25645] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
0b44010 parent 0x55b6aee7dd50 rank 5 nranks 16 color 1197013201 key 5 prev 4 next 6 - DONE
ubuntu:191:25036 [5] NCCL INFO ncclCommSplit comm 0x55b6d0b44010 rank 5 nranks 16 cudaDev 5 nvmlDev 5 busId ae000 parent 0x55b6aee7dd50 color 1197013201 key 5 commId 0x893042912752bf22 - Init START
ubuntu:191:25036 [5] NCCL INFO MNNVL busId 0xae000 fabric UUID 0.0 cliqueId 0x0 state 3 healthMask 0x0
ubuntu:191:25036 [5] NCCL INFO Setting affinity for GPU 5 to ffffffff,ffffff00,00000000,0000ffff,ffffffff,ff000000,00000000
ubuntu:191:25036 [5] NCCL INFO comm 0x55b6d0b44010 rank 5 nRanks 16 nNodes 2 localRanks 8 localRank 5 MNNVL 0
ubuntu:191:25036 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/13/-1->5->-1 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->13 [9] 6/-1/-1->5->4
ubuntu:191:25036 [5] NCCL INFO P2P Chunksize set to 131072
ubuntu:191:25036 [5] NCCL INFO Channel 03/0 : 14[6] -> 5[5] [receive] via NET/IBext_v8/3
ubuntu:191:25036 [5] NCCL INFO Channel 08/0 : 14[6] -> 5[5] [receive] via NET/IBext_v8/3
ubuntu:191:25036 [5] NCCL INFO Channel 02/0 : 5[5] -> 12[4] [send] via NET/IBext_v8/3
ubuntu:191:25036 [5] NCCL INFO Channel 07/0 : 5[5] -> 12[4] [send] via NET/IBext_v8/3
ubuntu:191:25036 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Channel 08/0 : 5[5] -> 4[4] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Channel 09/0 : 5[5] -> 4[4] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Connected all rings
ubuntu:191:25036 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Channel 03/0 : 13[5] -> 5[5] [receive] via NET/IBext_v8/3
ubuntu:191:25036 [5] NCCL INFO Channel 08/0 : 13[5] -> 5[5] [receive] via NET/IBext_v8/3
ubuntu:191:25036 [5] NCCL INFO Channel 03/0 : 5[5] -> 13[5] [send] via NET/IBext_v8/3
ubuntu:191:25036 [5] NCCL INFO Channel 08/0 : 5[5] -> 13[5] [send] via NET/IBext_v8/3
ubuntu:191:25036 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/IPC
ubuntu:191:25036 [5] NCCL INFO Connected all trees
ubuntu:191:25036 [5] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512
ubuntu:191:25036 [5] NCCL INFO 10 coll channels, 10 collnet channels, 0 nvls channels, 16 p2p channels, 2 p2p channels per peer
ubuntu:191:25036 [5] NCCL INFO ncclCommSplit comm 0x55b6d0b44010 rank 5 nranks 16 cudaDev 5 nvmlDev 5 busId ae000 parent 0x55b6aee7dd50 color 1197013201 key 5 commId 0x893042912752bf22 - Init COMPLETE
[2025-02-25 06:55:22] INFO:     100.64.0.127:18498 - "POST /v1/chat/completions HTTP/1.1" 200 OK
==== backtrace (tid:  25645) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x000000000006a920 SaveProxy()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/proxy.cc:518
 2 0x000000000006ca3a ncclProxySaveOp()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/proxy.cc:546
 3 0x0000000000049543 uploadProxyOps()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/enqueue.cc:1145
 4 0x0000000000051a7f hostStreamPlanTask()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/enqueue.cc:1163
 5 0x0000000000051bd9 hostStreamPlanCallback()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/enqueue.cc:1175
 6 0x00000000002af853 cuEGLApiInit()  ???:0
 7 0x00000000002bdae3 cuEGLApiInit()  ???:0
 8 0x0000000000094ac3 pthread_condattr_setpshared()  ???:0
 9 0x0000000000126850 __xmknodat()  ???:0
=================================
Fatal Python error: Segmentation fault

Thread 0x00007fbaf7ff7640 (most recent call first):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 462 in watchdog_thread
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fbaf87f8640 (most recent call first):
  File "/sgl-workspace/sglang/python/sglang/srt/models/deepseek_v2.py", line 527 in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747 in _call_impl
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736 in _wrapped_call_impl
  File "/sgl-workspace/sglang/python/sglang/srt/models/deepseek_v2.py", line 773 in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747 in _call_impl
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736 in _wrapped_call_impl
  File "/sgl-workspace/sglang/python/sglang/srt/models/deepseek_v2.py", line 835 in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747 in _call_impl
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736 in _wrapped_call_impl
  File "/sgl-workspace/sglang/python/sglang/srt/models/deepseek_v2.py", line 874 in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116 in decorate_context
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 761 in forward_extend
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 796 in forward
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 164 in forward_batch_generation
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 140 in forward_thread_func_
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116 in decorate_context
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 109 in forward_thread_func
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fd0c2fc5640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fd0c1fc3640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007fd5bcf5e480 (most recent call first):
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2425 in broadcast
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 83 in wrapper
  File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 693 in broadcast_pyobj
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 575 in recv_requests
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 493 in event_loop_overlap
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116 in decorate_context
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1825 in run_scheduler_process
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108 in run
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314 in _bootstrap
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 129 in _main
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116 in spawn_main
  File "<string>", line 1 in <module>

Extension modules: charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, uvloop.loop, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, psutil._psutil_linux, psutil._psutil_posix, setproctitle, zmq.backend.cython._zmq, yaml._yaml, markupsafe._speedups, PIL._imaging, PIL._imagingft, msgspec._core, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, msgpack._cmsgpack, google._upb._message, ray._raylet, sentencepiece._sentencepiece, regex._regex, cuda_utils, __triton_launcher (total: 52)

Reproduction

DeepSeek-R1 & DeepSeek-V3

Environment

Python: 3.10.12 CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H20
GPU 0,1,2,3,4,5,6,7 CUDA_HOME: /usr/local/cuda
NVCC: Cuda CUDA Driver Version: 565.57.01
PyTorch: 2.5.1+cu124
sgl_kernel: 0.0.3.post6
flashinfer: triton: 3.1.0
transformers: 4.48.3
torchao: 0.8.0
numpy: 1.26.4
aiohttp: 3.11.12
fastapi: 0.115.8
hf_transfer: 0.1.9
huggingface_hub: 0.28.1
interegular: 0.3.3
modelscope: 1.23.0
orjson: 3.10.15
packaging: 24.2
psutil: 7.0.0
pydantic: 2.10.6
multipart: 0.0.20
zmq: 26.2.1
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.7.2
openai: 1.63.2
tiktoken: 0.9.0
anthropic: 0.45.2
decord: 0.6.0
NVIDIA Topology:
GPU0 GPU1 GPU0 X GPU1 NV18 X GPU2 NV18 NV18 X GPU3 NV18 NV18 NV18 GPU4 NV18 GPU5 NV18 GPU6 NV18 GPU7 NV18 NIC0 PIX NIC1 NODE NODE PIX NIC2 SYS NIC3 SYS NIC4 NODE NIC5 NODE NIC6 SYS NIC7 SYS (main, Jan 17 2025, 14:35:34) [GCC 11.4.0]
Compute Capability: 9.0
compilation tools, release 12.4, V12.4.131
0.2.1.post2+cu124torch2.5
GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 NIC6 NIC7 CPU Affinity NUMA Affinity GPU NUMA ID
NV18 NV18 NV18 NV18 NV18 NV18 NV18 PIX NODE SYS SYS NODE NODE SYS SYS 0-55,112-167 0 N/A
NV18 NV18 NV18 NV18 NV18 NV18 NODE NODE SYS SYS NODE NODE SYS SYS 0-55,112-167 0 N/A
NV18 NV18 NV18 NV18 NV18 NODE PIX SYS SYS NODE NODE SYS SYS 0-55,112-167 0 N/A
X NV18 NV18 NV18 NV18 NODE NODE SYS SYS NODE NODE SYS SYS 0-55,112-167 0 N/A
NV18 NV18 NV18 X NV18 NV18 NV18 SYS SYS PIX NODE SYS SYS NODE NODE 56-111,168-223 1 N/A
NV18 NV18 NV18 NV18 X NV18 NV18 SYS SYS NODE NODE SYS SYS NODE NODE 56-111,168-223 1 N/A
NV18 NV18 NV18 NV18 NV18 X NV18 SYS SYS NODE PIX SYS SYS NODE NODE 56-111,168-223 1 N/A
NV18 NV18 NV18 NV18 NV18 NV18 X SYS SYS NODE NODE SYS SYS NODE NODE 56-111,168-223 1 N/A
NODE NODE NODE SYS SYS SYS SYS X NODE SYS SYS NODE NODE SYS SYS
NODE SYS SYS SYS SYS NODE X SYS SYS NODE NODE SYS SYS
SYS SYS SYS PIX NODE NODE NODE SYS SYS X NODE SYS SYS NODE NODE
SYS SYS SYS NODE NODE PIX NODE SYS SYS NODE X SYS SYS NODE NODE
NODE NODE NODE SYS SYS SYS SYS NODE NODE SYS SYS X PIX SYS SYS
NODE NODE NODE SYS SYS SYS SYS NODE NODE SYS SYS PIX X SYS SYS
SYS SYS SYS NODE NODE NODE NODE SYS SYS NODE NODE SYS SYS X PIX
SYS SYS SYS NODE NODE NODE NODE SYS SYS NODE NODE SYS SYS PIX X

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

NIC Legend:

NIC0: ibp14s0
NIC1: ibp71s0
NIC2: ibp134s0
NIC3: ibp195s0
NIC4: rocep31s0f0
NIC5: rocep31s0f1
NIC6: rocep153s0f0
NIC7: rocep153s0f1

ulimit soft: 1048576

minleminzui self-assigned this Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] sglang crashes when running two deepseek-r1 instances #3837

[Bug] sglang crashes when running two deepseek-r1 instances #3837

echozyr2001 commented Feb 25, 2025

[Bug] sglang crashes when running two deepseek-r1 instances #3837

[Bug] sglang crashes when running two deepseek-r1 instances #3837

Comments

echozyr2001 commented Feb 25, 2025

Checklist

Describe the bug

Reproduction

Environment