Skip to content

Warmup and Benchmark User ID Overlap Causes Prefix Cache Hits #35

@ajcasagrande

Description

@ajcasagrande

File: synthetic-multi-round-qa/long_input_short_output_run.sh

Summary

The warmup and benchmark phases use the same --init-user-id, causing user ID overlap. When vLLM's prefix caching is enabled, the first benchmark request gets a cache hit from warmup, showing ~60ms TTFT instead of the expected ~4-16 seconds for 21,000-token prompts.

Bug Location

# Lines 48 and 69 both use the same INIT_USER_ID
warmup() {
    python3 ... --init-user-id "$INIT_USER_ID" ...  # Line 48
}

run_benchmark() {
    warmup
    python3 ... --init-user-id "$INIT_USER_ID" ...  # Line 69 - same value
    INIT_USER_ID=$(( INIT_USER_ID + NUM_USERS_WARMUP ))  # Incremented after, not between
}

Why User 97?

With INIT_USER_ID=81:

  • Warmup (10s, gap_between_users=0.5s): spawns users 83, 84, ... ~102
  • Benchmark _ramp_up(): creates users 82-96 with virtual history
    • Users 83-96 are mid-conversation (no cache hit possible)
    • User 82 has largest offset → question_id=20=num_rounds → already "done" at t=0
  • Benchmark replaces "done" user 82 with fresh spawn: user 97

User 97 exists in both warmup and benchmark with identical 21k-token prompts → vLLM prefix cache hit.

Evidence

Every benchmark run has exactly one request with impossible ~60ms TTFT for 21,000 tokens:

QPS 1st Request TTFT 2nd Request TTFT Prompt Tokens
0.25 0.061s 4.84s 21,045
0.5 0.058s 11.24s 21,045
1.0 0.060s 14.60s 21,045
2.0 0.058s 16.12s 21,046

60ms for 21,000 tokens = 350,000 tokens/sec prefill

Reproduction

  1. Start vLLM:
    vllm serve --port 8000 --gpu-memory-utilization 0.80 --max-model-len 32000 Qwen/Qwen3-8B
  2. Run:
    ./long_input_short_output_run.sh Qwen/Qwen3-8B http://localhost:8000 ./test_output 0.25 0.5 0.75 1.0 1.25 1.5 1.75 2.0
  3. Check each test_output_output_*.csv: first request has ~60ms TTFT, second has ~4000ms+

Potential Fix

Add increment between warmup and benchmark:

run_benchmark() {
    warmup
    INIT_USER_ID=$(( INIT_USER_ID + NUM_USERS_WARMUP ))  # ADD: skip past warmup users
    python3 ... --init-user-id "$INIT_USER_ID" ...
    sleep 10
    INIT_USER_ID=$(( INIT_USER_ID + NUM_USERS_WARMUP ))  # KEEP: skip past benchmark users
}
Image

warmup.csv (User 97)

prompt_tokens,generation_tokens,ttft,generation_time,user_id,question_id,launch_time,finish_time
21045.0,100.0,66.03855228424072,26.03457522392273,97,1,1768228281.4217596,1768228373.4948874

output.csv (User 97)

prompt_tokens,generation_tokens,ttft,generation_time,user_id,question_id,launch_time,finish_time
21045.0,100.0,0.06429886817932129,29.62334132194519,97,1,1768228377.583655,1768228407.2712955

Full Warmup

prompt_tokens,generation_tokens,ttft,generation_time,user_id,question_id,launch_time,finish_time
21045.0,100.0,4.0325767993927,30.620036840438843,83,1,1768228274.410371,1768228309.062985
21045.0,100.0,7.745214462280273,31.387547969818115,84,1,1768228274.9116695,1768228314.0444322
21045.0,100.0,11.579217433929443,32.04589629173279,85,1,1768228275.4124584,1768228319.0375724
21045.0,100.0,15.546088457107544,32.542940616607666,86,1,1768228275.9133482,1768228324.0023775
21045.0,100.0,19.24747633934021,32.79804229736328,87,1,1768228276.4143176,1768228328.4598362
21045.0,100.0,23.450685024261475,33.05583620071411,88,1,1768228276.9149776,1768228333.4214993
21045.0,100.0,27.83380889892578,33.13601541519165,89,1,1768228277.415833,1768228338.3856573
21045.0,100.0,33.816718101501465,33.097196102142334,90,1,1768228277.9167144,1768228344.8306289
21045.0,100.0,38.28431248664856,33.11025857925415,91,1,1768228278.4174414,1768228349.812013
21045.0,100.0,42.73971652984619,32.667768478393555,92,1,1768228278.9180489,1768228354.325534
21045.0,100.0,46.781073331832886,32.58684802055359,93,1,1768228279.418835,1768228358.7867563
21045.0,100.0,51.220503091812134,32.61380910873413,94,1,1768228279.9195848,1768228363.7538974
21045.0,100.0,55.66105651855469,32.631749391555786,95,1,1768228280.4203286,1768228368.7131348
21045.0,100.0,60.10296177864075,30.73545265197754,96,1,1768228280.9210896,1768228371.7595043
21045.0,100.0,66.03855228424072,26.03457522392273,97,1,1768228281.4217596,1768228373.4948874
21045.0,100.0,70.10655903816223,21.79459857940674,98,1,1768228281.9224482,1768228373.8236058
21045.0,100.0,74.58828473091125,17.163147926330566,99,1,1768228282.4232018,1768228374.1746347
21046.0,100.0,79.03585720062256,12.499950408935547,100,1,1768228282.923967,1768228374.4597747
21046.0,100.0,83.47929239273071,7.807913780212402,101,1,1768228283.424782,1768228374.7119882
21046.0,100.0,87.65989756584167,3.3384878635406494,102,1,1768228283.9255624,1768228374.9239478

Full Benchmark

prompt_tokens,generation_tokens,ttft,generation_time,user_id,question_id,launch_time,finish_time
21046.0,100.0,15.451107740402222,32.98561429977417,83,1,1768228380.7874646,1768228429.224187
21173.0,100.0,59.79264044761658,33.4884295463562,83,2,1768228429.2511828,1768228522.532253
21046.0,100.0,32.88628697395325,33.44575047492981,84,1,1768228383.9910362,1768228450.3230739
21173.0,100.0,63.26081323623657,33.09952783584595,84,2,1768228450.3874269,1768228546.747768
21046.0,100.0,49.24628400802612,32.93089699745178,85,1,1768228387.1946235,1768228469.3718047
21173.0,100.0,75.51643824577332,8.048285722732544,85,2,1768228469.423379,1768228552.9881034
21046.0,100.0,4.300148248672485,31.531845569610596,86,1,1768228378.3848696,1768228414.216864
21172.0,100.0,53.35515594482422,33.377912521362305,86,2,1768228414.2351642,1768228500.9682329
21046.0,100.0,19.02207589149475,33.119444847106934,87,1,1768228381.5882692,1768228433.72979
21172.0,100.0,60.35233736038208,33.05792427062988,87,2,1768228433.7558215,1768228527.1660833
21046.0,100.0,37.06291079521179,33.533162355422974,88,1,1768228384.7920196,1768228455.388093
21173.0,100.0,69.83038926124573,26.496456146240234,88,2,1768228455.3932168,1768228551.7200623
21046.0,100.0,53.433995723724365,32.9296555519104,89,1,1768228387.9958158,1768228474.359467
21172.0,100.0,75.38190579414368,3.390859842300415,89,2,1768228474.4290762,1768228553.201842
21046.0,100.0,7.909069776535034,32.09600353240967,90,1,1768228379.1858044,1768228419.190878
21173.0,100.0,53.3221390247345,33.405515909194946,90,2,1768228419.2401805,1768228505.9678357
21045.0,100.0,23.115331172943115,33.24005389213562,91,1,1768228382.3890383,1768228438.7444239
21171.0,100.0,64.90626335144043,33.02594494819641,91,2,1768228438.772832,1768228536.7050407
21045.0,100.0,41.262274980545044,33.08293962478638,92,1,1768228385.592925,1768228459.93814
21170.0,100.0,70.32939887046814,21.833589553833008,92,2,1768228459.998516,1768228552.1615045
21045.0,100.0,57.610445976257324,32.932292222976685,93,1,1768228388.7965636,1768228479.3393018
21045.0,100.0,11.610830545425415,32.60366344451904,94,1,1768228379.9866097,1768228424.201104
21171.0,100.0,58.282360553741455,33.435781955718994,94,2,1768228424.245589,1768228515.9637315
21045.0,100.0,27.26021122932434,33.28636121749878,95,1,1768228383.1898859,1768228443.7364585
21171.0,100.0,64.88479328155518,33.054969787597656,95,2,1768228443.7790673,1768228541.7188306
21045.0,100.0,45.05794095993042,32.94720673561096,96,1,1768228386.3938456,1768228464.3989935
21170.0,100.0,70.52046370506287,17.526437044143677,96,2,1768228464.4031951,1768228552.4500961
21045.0,100.0,0.06429886817932129,29.62334132194519,97,1,1768228377.583655,1768228407.2712955
21170.0,100.0,50.73230814933777,32.89437675476074,97,2,1768228407.317312,1768228490.9439971
21045.0,100.0,60.204896211624146,33.36027669906616,98,1,1768228392.8007195,1768228486.3658926
21045.0,100.0,54.6002357006073,33.348283529281616,99,1,1768228408.0283864,1768228495.9769058
21046.0,100.0,54.27728247642517,33.447853088378906,100,1,1768228423.2444117,1768228510.9695475
21046.0,100.0,60.22674560546875,32.98647689819336,101,1,1768228438.4724448,1768228531.6856675
21046.0,100.0,64.96357321739197,31.338298559188843,102,1,1768228453.6917243,1768228549.9935963
21046.0,100.0,71.00475406646729,12.809065818786621,103,1,1768228468.922831,1768228552.7366512

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions