-
Notifications
You must be signed in to change notification settings - Fork 21
Open
Description
File: synthetic-multi-round-qa/long_input_short_output_run.sh
Summary
The warmup and benchmark phases use the same --init-user-id, causing user ID overlap. When vLLM's prefix caching is enabled, the first benchmark request gets a cache hit from warmup, showing ~60ms TTFT instead of the expected ~4-16 seconds for 21,000-token prompts.
Bug Location
# Lines 48 and 69 both use the same INIT_USER_ID
warmup() {
python3 ... --init-user-id "$INIT_USER_ID" ... # Line 48
}
run_benchmark() {
warmup
python3 ... --init-user-id "$INIT_USER_ID" ... # Line 69 - same value
INIT_USER_ID=$(( INIT_USER_ID + NUM_USERS_WARMUP )) # Incremented after, not between
}Why User 97?
With INIT_USER_ID=81:
- Warmup (10s,
gap_between_users=0.5s): spawns users 83, 84, ... ~102 - Benchmark
_ramp_up(): creates users 82-96 with virtual history- Users 83-96 are mid-conversation (no cache hit possible)
- User 82 has largest offset →
question_id=20=num_rounds→ already "done" at t=0
- Benchmark replaces "done" user 82 with fresh spawn: user 97
User 97 exists in both warmup and benchmark with identical 21k-token prompts → vLLM prefix cache hit.
Evidence
Every benchmark run has exactly one request with impossible ~60ms TTFT for 21,000 tokens:
| QPS | 1st Request TTFT | 2nd Request TTFT | Prompt Tokens |
|---|---|---|---|
| 0.25 | 0.061s | 4.84s | 21,045 |
| 0.5 | 0.058s | 11.24s | 21,045 |
| 1.0 | 0.060s | 14.60s | 21,045 |
| 2.0 | 0.058s | 16.12s | 21,046 |
60ms for 21,000 tokens = 350,000 tokens/sec prefill
Reproduction
- Start vLLM:
vllm serve --port 8000 --gpu-memory-utilization 0.80 --max-model-len 32000 Qwen/Qwen3-8B
- Run:
./long_input_short_output_run.sh Qwen/Qwen3-8B http://localhost:8000 ./test_output 0.25 0.5 0.75 1.0 1.25 1.5 1.75 2.0
- Check each
test_output_output_*.csv: first request has ~60ms TTFT, second has ~4000ms+
Potential Fix
Add increment between warmup and benchmark:
run_benchmark() {
warmup
INIT_USER_ID=$(( INIT_USER_ID + NUM_USERS_WARMUP )) # ADD: skip past warmup users
python3 ... --init-user-id "$INIT_USER_ID" ...
sleep 10
INIT_USER_ID=$(( INIT_USER_ID + NUM_USERS_WARMUP )) # KEEP: skip past benchmark users
}
warmup.csv (User 97)
prompt_tokens,generation_tokens,ttft,generation_time,user_id,question_id,launch_time,finish_time
21045.0,100.0,66.03855228424072,26.03457522392273,97,1,1768228281.4217596,1768228373.4948874output.csv (User 97)
prompt_tokens,generation_tokens,ttft,generation_time,user_id,question_id,launch_time,finish_time
21045.0,100.0,0.06429886817932129,29.62334132194519,97,1,1768228377.583655,1768228407.2712955Full Warmup
prompt_tokens,generation_tokens,ttft,generation_time,user_id,question_id,launch_time,finish_time
21045.0,100.0,4.0325767993927,30.620036840438843,83,1,1768228274.410371,1768228309.062985
21045.0,100.0,7.745214462280273,31.387547969818115,84,1,1768228274.9116695,1768228314.0444322
21045.0,100.0,11.579217433929443,32.04589629173279,85,1,1768228275.4124584,1768228319.0375724
21045.0,100.0,15.546088457107544,32.542940616607666,86,1,1768228275.9133482,1768228324.0023775
21045.0,100.0,19.24747633934021,32.79804229736328,87,1,1768228276.4143176,1768228328.4598362
21045.0,100.0,23.450685024261475,33.05583620071411,88,1,1768228276.9149776,1768228333.4214993
21045.0,100.0,27.83380889892578,33.13601541519165,89,1,1768228277.415833,1768228338.3856573
21045.0,100.0,33.816718101501465,33.097196102142334,90,1,1768228277.9167144,1768228344.8306289
21045.0,100.0,38.28431248664856,33.11025857925415,91,1,1768228278.4174414,1768228349.812013
21045.0,100.0,42.73971652984619,32.667768478393555,92,1,1768228278.9180489,1768228354.325534
21045.0,100.0,46.781073331832886,32.58684802055359,93,1,1768228279.418835,1768228358.7867563
21045.0,100.0,51.220503091812134,32.61380910873413,94,1,1768228279.9195848,1768228363.7538974
21045.0,100.0,55.66105651855469,32.631749391555786,95,1,1768228280.4203286,1768228368.7131348
21045.0,100.0,60.10296177864075,30.73545265197754,96,1,1768228280.9210896,1768228371.7595043
21045.0,100.0,66.03855228424072,26.03457522392273,97,1,1768228281.4217596,1768228373.4948874
21045.0,100.0,70.10655903816223,21.79459857940674,98,1,1768228281.9224482,1768228373.8236058
21045.0,100.0,74.58828473091125,17.163147926330566,99,1,1768228282.4232018,1768228374.1746347
21046.0,100.0,79.03585720062256,12.499950408935547,100,1,1768228282.923967,1768228374.4597747
21046.0,100.0,83.47929239273071,7.807913780212402,101,1,1768228283.424782,1768228374.7119882
21046.0,100.0,87.65989756584167,3.3384878635406494,102,1,1768228283.9255624,1768228374.9239478Full Benchmark
prompt_tokens,generation_tokens,ttft,generation_time,user_id,question_id,launch_time,finish_time
21046.0,100.0,15.451107740402222,32.98561429977417,83,1,1768228380.7874646,1768228429.224187
21173.0,100.0,59.79264044761658,33.4884295463562,83,2,1768228429.2511828,1768228522.532253
21046.0,100.0,32.88628697395325,33.44575047492981,84,1,1768228383.9910362,1768228450.3230739
21173.0,100.0,63.26081323623657,33.09952783584595,84,2,1768228450.3874269,1768228546.747768
21046.0,100.0,49.24628400802612,32.93089699745178,85,1,1768228387.1946235,1768228469.3718047
21173.0,100.0,75.51643824577332,8.048285722732544,85,2,1768228469.423379,1768228552.9881034
21046.0,100.0,4.300148248672485,31.531845569610596,86,1,1768228378.3848696,1768228414.216864
21172.0,100.0,53.35515594482422,33.377912521362305,86,2,1768228414.2351642,1768228500.9682329
21046.0,100.0,19.02207589149475,33.119444847106934,87,1,1768228381.5882692,1768228433.72979
21172.0,100.0,60.35233736038208,33.05792427062988,87,2,1768228433.7558215,1768228527.1660833
21046.0,100.0,37.06291079521179,33.533162355422974,88,1,1768228384.7920196,1768228455.388093
21173.0,100.0,69.83038926124573,26.496456146240234,88,2,1768228455.3932168,1768228551.7200623
21046.0,100.0,53.433995723724365,32.9296555519104,89,1,1768228387.9958158,1768228474.359467
21172.0,100.0,75.38190579414368,3.390859842300415,89,2,1768228474.4290762,1768228553.201842
21046.0,100.0,7.909069776535034,32.09600353240967,90,1,1768228379.1858044,1768228419.190878
21173.0,100.0,53.3221390247345,33.405515909194946,90,2,1768228419.2401805,1768228505.9678357
21045.0,100.0,23.115331172943115,33.24005389213562,91,1,1768228382.3890383,1768228438.7444239
21171.0,100.0,64.90626335144043,33.02594494819641,91,2,1768228438.772832,1768228536.7050407
21045.0,100.0,41.262274980545044,33.08293962478638,92,1,1768228385.592925,1768228459.93814
21170.0,100.0,70.32939887046814,21.833589553833008,92,2,1768228459.998516,1768228552.1615045
21045.0,100.0,57.610445976257324,32.932292222976685,93,1,1768228388.7965636,1768228479.3393018
21045.0,100.0,11.610830545425415,32.60366344451904,94,1,1768228379.9866097,1768228424.201104
21171.0,100.0,58.282360553741455,33.435781955718994,94,2,1768228424.245589,1768228515.9637315
21045.0,100.0,27.26021122932434,33.28636121749878,95,1,1768228383.1898859,1768228443.7364585
21171.0,100.0,64.88479328155518,33.054969787597656,95,2,1768228443.7790673,1768228541.7188306
21045.0,100.0,45.05794095993042,32.94720673561096,96,1,1768228386.3938456,1768228464.3989935
21170.0,100.0,70.52046370506287,17.526437044143677,96,2,1768228464.4031951,1768228552.4500961
21045.0,100.0,0.06429886817932129,29.62334132194519,97,1,1768228377.583655,1768228407.2712955
21170.0,100.0,50.73230814933777,32.89437675476074,97,2,1768228407.317312,1768228490.9439971
21045.0,100.0,60.204896211624146,33.36027669906616,98,1,1768228392.8007195,1768228486.3658926
21045.0,100.0,54.6002357006073,33.348283529281616,99,1,1768228408.0283864,1768228495.9769058
21046.0,100.0,54.27728247642517,33.447853088378906,100,1,1768228423.2444117,1768228510.9695475
21046.0,100.0,60.22674560546875,32.98647689819336,101,1,1768228438.4724448,1768228531.6856675
21046.0,100.0,64.96357321739197,31.338298559188843,102,1,1768228453.6917243,1768228549.9935963
21046.0,100.0,71.00475406646729,12.809065818786621,103,1,1768228468.922831,1768228552.7366512aknvda
Metadata
Metadata
Assignees
Labels
No labels