Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add deepgemm and sglang fp8 block-wise gemm benchmark #3893

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

BBuf
Copy link
Collaborator

@BBuf BBuf commented Feb 26, 2025

The script benchmark all FP8 w8a8 block-wise matrix multiplications involved in DeepSeek V3/R1 under the current tensor parallelism (TP) setting with DeepGemm and SGLang.

In h100:

run correctness test

图片

run benchmark

The results below represent the time taken, with the unit in milliseconds (ms).

  • tp-size=1
fp8-gemm-performance-comparison-tp1:
          m        n        k  tp_size     DeepGEMM        SGLang
0       8.0    576.0   7168.0      1.0   456.111997    350.879997
1      16.0    576.0   7168.0      1.0   451.023996    349.312007
2      32.0    576.0   7168.0      1.0   450.783998    349.151999
3      64.0    576.0   7168.0      1.0   446.496010    353.760004
4     128.0    576.0   7168.0      1.0   451.552004    353.615999
5     256.0    576.0   7168.0      1.0   450.223982    351.583987
6    1024.0    576.0   7168.0      1.0   447.039992    356.864005
7    2048.0    576.0   7168.0      1.0   452.672005    435.647994
8    4096.0    576.0   7168.0      1.0   634.368002    682.816029
9       8.0  24576.0   7168.0      1.0  3139.024019   3153.520107
10     16.0  24576.0   7168.0      1.0  3143.039942   3147.232056
11     32.0  24576.0   7168.0      1.0  3142.112017   3148.000002
12     64.0  24576.0   7168.0      1.0  3145.215988   3151.103973
13    128.0  24576.0   7168.0      1.0  3153.679848   3223.135948
14    256.0  24576.0   7168.0      1.0  3181.456089   3339.231968
15   1024.0  24576.0   7168.0      1.0  3454.911947   4134.687901
16   2048.0  24576.0   7168.0      1.0  3842.592001   5206.831932
17   4096.0  24576.0   7168.0      1.0  4589.727879   7277.344227
18      8.0  32768.0    512.0      1.0   447.039992    390.368015
19     16.0  32768.0    512.0      1.0   443.040013    388.767987
20     32.0  32768.0    512.0      1.0   444.415987    388.000011
21     64.0  32768.0    512.0      1.0   443.648010    388.655990
22    128.0  32768.0    512.0      1.0   447.167993    397.152007
23    256.0  32768.0    512.0      1.0   455.071986    417.376012
24   1024.0  32768.0    512.0      1.0   453.871995    512.639999
25   2048.0  32768.0    512.0      1.0   516.304016    637.920022
26   4096.0  32768.0    512.0      1.0   634.624004    886.879981
27      8.0   7168.0  16384.0      1.0  2133.183956   2159.199953
28     16.0   7168.0  16384.0      1.0  2144.559860   2157.871962
29     32.0   7168.0  16384.0      1.0  2146.496058   2170.192003
30     64.0   7168.0  16384.0      1.0  2149.104118   2174.416065
31    128.0   7168.0  16384.0      1.0  2162.640095   2220.927954
32    256.0   7168.0  16384.0      1.0  2190.464020   2315.135956
33   1024.0   7168.0  16384.0      1.0  2539.055824   3038.239956
34   2048.0   7168.0  16384.0      1.0  2909.087896   3834.944010
35   4096.0   7168.0  16384.0      1.0  3749.839783   5541.088104
36      8.0   7168.0  18432.0      1.0  2406.208038   2424.223900
37     16.0   7168.0  18432.0      1.0  2404.464006   2426.784039
38     32.0   7168.0  18432.0      1.0  2401.391983   2429.663897
39     64.0   7168.0  18432.0      1.0  2407.696009   2441.087961
40    128.0   7168.0  18432.0      1.0  2420.608044   2488.271952
41    256.0   7168.0  18432.0      1.0  2460.671902   2600.352049
42   1024.0   7168.0  18432.0      1.0  2849.679947   3413.248062
43   2048.0   7168.0  18432.0      1.0  3260.799885   4283.184052
44   4096.0   7168.0  18432.0      1.0  4173.664093   6296.703815
45      8.0  36864.0   7168.0      1.0  4667.871952   4707.104206
46     16.0  36864.0   7168.0      1.0  4668.704033   4694.895744
47     32.0  36864.0   7168.0      1.0  4669.023991   4697.263718
48     64.0  36864.0   7168.0      1.0  4672.255993   4699.903965
49    128.0  36864.0   7168.0      1.0  4682.335854   4792.384148
50    256.0  36864.0   7168.0      1.0  4725.423813   5004.576206
51   1024.0  36864.0   7168.0      1.0  5087.584019   6096.303940
52   2048.0  36864.0   7168.0      1.0  5626.912117   7612.015724
53   4096.0  36864.0   7168.0      1.0  6670.896053  10620.832443
54      8.0  24576.0   7168.0      1.0  3098.880053   3154.239893
55     16.0  24576.0   7168.0      1.0  3142.528057   3147.423983
56     32.0  24576.0   7168.0      1.0  3142.751932   3146.944046
57     64.0  24576.0   7168.0      1.0  3145.600080   3148.927927
58    128.0  24576.0   7168.0      1.0  3154.799938   3222.768068
59    256.0  24576.0   7168.0      1.0  3182.496071   3337.311983
60   1024.0  24576.0   7168.0      1.0  3458.608150   4132.639885
61   2048.0  24576.0   7168.0      1.0  3851.488113   5256.272316
62   4096.0  24576.0   7168.0      1.0  4620.384216   7347.136021
63      8.0  32768.0    512.0      1.0   460.608006    391.167998
64     16.0  32768.0    512.0      1.0   458.352000    389.376014
65     32.0  32768.0    512.0      1.0   460.480005    389.472008
66     64.0  32768.0    512.0      1.0   459.807992    387.935996
67    128.0  32768.0    512.0      1.0   467.199981    395.711988
68    256.0  32768.0    512.0      1.0   468.991995    415.071994
69   1024.0  32768.0    512.0      1.0   462.015986    507.327974
70   2048.0  32768.0    512.0      1.0   514.144003    632.655978
71   4096.0  32768.0    512.0      1.0   638.895988    886.336029
72      8.0  24576.0   1536.0      1.0   754.688025    754.944026
73     16.0  24576.0   1536.0      1.0   758.463979    757.152021
74     32.0  24576.0   1536.0      1.0   758.239985    756.447971
75     64.0  24576.0   1536.0      1.0   760.447979    759.984016
76    128.0  24576.0   1536.0      1.0   763.231993    777.552009
77    256.0  24576.0   1536.0      1.0   774.016023    803.135991
78   1024.0  24576.0   1536.0      1.0   844.864011    981.664002
79   2048.0  24576.0   1536.0      1.0   942.496002   1222.032070
80   4096.0  24576.0   1536.0      1.0  1143.007994   1716.384053
81      8.0   4096.0   7168.0      1.0   618.143976    648.959994
82     16.0   4096.0   7168.0      1.0   620.127976    652.639985
83     32.0   4096.0   7168.0      1.0   619.887948    654.688001
84     64.0   4096.0   7168.0      1.0   617.824018    654.240012
85    128.0   4096.0   7168.0      1.0   624.127984    659.936011
86    256.0   4096.0   7168.0      1.0   640.703976    672.415972
87   1024.0   4096.0   7168.0      1.0   744.512022    854.816020
88   2048.0   4096.0   7168.0      1.0   904.767990   1135.888100
89   4096.0   4096.0   7168.0      1.0  1200.255990   1649.760008
90      8.0   7168.0  18432.0      1.0  2388.720036   2419.951916
91     16.0   7168.0  18432.0      1.0  2399.856091   2417.168140
92     32.0   7168.0  18432.0      1.0  2395.632029   2424.288034
93     64.0   7168.0  18432.0      1.0  2401.440144   2435.199976
94    128.0   7168.0  18432.0      1.0  2414.911985   2480.704069
95    256.0   7168.0  18432.0      1.0  2454.848051   2589.695930
96   1024.0   7168.0  18432.0      1.0  2842.544079   3396.255970
97   2048.0   7168.0  18432.0      1.0  3269.759893   4313.152313
98   4096.0   7168.0  18432.0      1.0  4179.808140   6296.480179
99      8.0   7168.0  16384.0      1.0  2135.071993   2157.567978
100    16.0   7168.0  16384.0      1.0  2145.408154   2161.104202
101    32.0   7168.0  16384.0      1.0  2141.632080   2164.703846
102    64.0   7168.0  16384.0      1.0  2144.351959   2169.727802
103   128.0   7168.0  16384.0      1.0  2159.183979   2215.359926
104   256.0   7168.0  16384.0      1.0  2191.440105   2315.040112
105  1024.0   7168.0  16384.0      1.0  2537.807941   3023.583889
106  2048.0   7168.0  16384.0      1.0  2899.807930   3846.208096
107  4096.0   7168.0  16384.0      1.0  3752.079964   5596.320152
108     8.0   7168.0   2048.0      1.0   453.552008    357.968003
109    16.0   7168.0   2048.0      1.0   465.376019    369.279981
110    32.0   7168.0   2048.0      1.0   450.751990    356.256008
111    64.0   7168.0   2048.0      1.0   449.519992    358.736008
112   128.0   7168.0   2048.0      1.0   454.463989    358.112007
113   256.0   7168.0   2048.0      1.0   454.320014    376.352012
114  1024.0   7168.0   2048.0      1.0   468.896002    461.919993
115  2048.0   7168.0   2048.0      1.0   455.871999    569.200039
116  4096.0   7168.0   2048.0      1.0   585.695982    808.319986
  • tp-size=8
fp8-gemm-performance-comparison-tp8:
          m        n        k  tp_size     DeepGEMM       SGLang
0       8.0    576.0   7168.0      8.0   472.127974   370.656013
1      16.0    576.0   7168.0      8.0   466.751993   371.616006
2      32.0    576.0   7168.0      8.0   467.680007   372.224003
3      64.0    576.0   7168.0      8.0   466.304004   375.295997
4     128.0    576.0   7168.0      8.0   466.719985   372.336000
5     256.0    576.0   7168.0      8.0   464.767992   374.783993
6    1024.0    576.0   7168.0      8.0   464.479983   375.248015
7    2048.0    576.0   7168.0      8.0   471.632004   428.624004
8    4096.0    576.0   7168.0      8.0   627.071977   677.680016
9       8.0  24576.0   7168.0      8.0  3137.408018  3150.160074
10     16.0  24576.0   7168.0      8.0  3137.408018  3143.167973
11     32.0  24576.0   7168.0      8.0  3139.391899  3144.736052
12     64.0  24576.0   7168.0      8.0  3141.439915  3146.463871
13    128.0  24576.0   7168.0      8.0  3148.752213  3219.023943
14    256.0  24576.0   7168.0      8.0  3180.079937  3335.936069
15   1024.0  24576.0   7168.0      8.0  3453.808069  4130.239964
16   2048.0  24576.0   7168.0      8.0  3835.488081  5207.119942
17   4096.0  24576.0   7168.0      8.0  4618.495941  7261.504173
18      8.0  32768.0    512.0      8.0   501.199961   387.775987
19     16.0  32768.0    512.0      8.0   450.224012   384.368002
20     32.0  32768.0    512.0      8.0   448.751986   385.856003
21     64.0  32768.0    512.0      8.0   452.672005   386.416018
22    128.0  32768.0    512.0      8.0   492.720008   394.751996
23    256.0  32768.0    512.0      8.0   491.935998   410.239995
24   1024.0  32768.0    512.0      8.0   465.056002   508.512020
25   2048.0  32768.0    512.0      8.0   511.135995   635.168016
26   4096.0  32768.0    512.0      8.0   630.432010   883.135974
27      8.0   7168.0  16384.0      8.0  2132.031918  2152.704000
28     16.0   7168.0  16384.0      8.0  2138.240099  2153.887987
29     32.0   7168.0  16384.0      8.0  2135.904074  2159.711838
30     64.0   7168.0  16384.0      8.0  2138.623953  2165.231943
31    128.0   7168.0  16384.0      8.0  2152.287960  2211.904049
32    256.0   7168.0  16384.0      8.0  2186.880112  2308.448076
33   1024.0   7168.0  16384.0      8.0  2532.016039  3019.999981
34   2048.0   7168.0  16384.0      8.0  2902.400017  3855.360031
35   4096.0   7168.0  16384.0      8.0  3745.968103  5611.968040
36      8.0   7168.0  18432.0      8.0  2388.800144  2411.456108
37     16.0   7168.0  18432.0      8.0  2395.648003  2415.807962
38     32.0   7168.0  18432.0      8.0  2390.511990  2417.423964
39     64.0   7168.0  18432.0      8.0  2395.088196  2429.343939
40    128.0   7168.0  18432.0      8.0  2409.568071  2477.600098
41    256.0   7168.0  18432.0      8.0  2451.807976  2589.247942
42   1024.0   7168.0  18432.0      8.0  2833.791971  3388.495922
43   2048.0   7168.0  18432.0      8.0  3250.192165  4262.176037
44   4096.0   7168.0  18432.0      8.0  4164.288044  6305.696011
45      8.0   4608.0   7168.0      8.0   684.895992   690.335989
46     16.0   4608.0   7168.0      8.0   683.456004   689.823985
47     32.0   4608.0   7168.0      8.0   681.439996   694.303989
48     64.0   4608.0   7168.0      8.0   680.064023   695.999980
49    128.0   4608.0   7168.0      8.0   687.792003   712.911963
50    256.0   4608.0   7168.0      8.0   702.207983   746.720016
51   1024.0   4608.0   7168.0      8.0   820.447981   953.904033
52   2048.0   4608.0   7168.0      8.0  1001.024008  1280.447960
53   4096.0   4608.0   7168.0      8.0  1282.816052  1809.728026
54      8.0   3072.0   7168.0      8.0   500.239968   497.024000
55     16.0   3072.0   7168.0      8.0   496.416003   498.079985
56     32.0   3072.0   7168.0      8.0   493.791997   503.103971
57     64.0   3072.0   7168.0      8.0   492.000014   504.096031
58    128.0   3072.0   7168.0      8.0   498.335987   514.496028
59    256.0   3072.0   7168.0      8.0   512.031972   539.456010
60   1024.0   3072.0   7168.0      8.0   613.551974   701.919973
61   2048.0   3072.0   7168.0      8.0   758.224010   920.671999
62   4096.0   3072.0   7168.0      8.0  1022.576094  1366.368055
63      8.0   4096.0    512.0      8.0   470.687985   368.831992
64     16.0   4096.0    512.0      8.0   465.631992   363.824010
65     32.0   4096.0    512.0      8.0   469.792008   365.872025
66     64.0   4096.0    512.0      8.0   464.367986   368.927985
67    128.0   4096.0    512.0      8.0   483.072013   384.703994
68    256.0   4096.0    512.0      8.0   480.672002   379.360020
69   1024.0   4096.0    512.0      8.0   492.736012   400.128007
70   2048.0   4096.0    512.0      8.0   485.599995   381.184012
71   4096.0   4096.0    512.0      8.0   480.192006   378.831983
72      8.0   3072.0   1536.0      8.0   477.791995   369.343996
73     16.0   3072.0   1536.0      8.0   473.360002   365.440011
74     32.0   3072.0   1536.0      8.0   471.632004   367.199987
75     64.0   3072.0   1536.0      8.0   472.768009   358.384013
76    128.0   3072.0   1536.0      8.0   481.920004   375.088006
77    256.0   3072.0   1536.0      8.0   501.312017   390.560001
78   1024.0   3072.0   1536.0      8.0   498.943985   396.575987
79   2048.0   3072.0   1536.0      8.0   499.231994   399.040014
80   4096.0   3072.0   1536.0      8.0   497.312009   391.152024
81      8.0    512.0   7168.0      8.0   491.439998   390.751988
82     16.0    512.0   7168.0      8.0   489.279985   393.519998
83     32.0    512.0   7168.0      8.0   487.744004   380.127996
84     64.0    512.0   7168.0      8.0   482.879996   380.448014
85    128.0    512.0   7168.0      8.0   497.247994   387.903988
86    256.0    512.0   7168.0      8.0   489.600003   390.143991
87   1024.0    512.0   7168.0      8.0   500.000000   395.776004
88   2048.0    512.0   7168.0      8.0   496.704012   396.319985
89   4096.0    512.0   7168.0      8.0   581.727982   628.256023
90      8.0   7168.0   2304.0      8.0   492.608011   382.367998
91     16.0   7168.0   2304.0      8.0   480.352014   382.079989
92     32.0   7168.0   2304.0      8.0   483.328015   380.607992
93     64.0   7168.0   2304.0      8.0   483.072013   414.303988
94    128.0   7168.0   2304.0      8.0   484.672010   392.672002
95    256.0   7168.0   2304.0      8.0   480.704010   407.040000
96   1024.0   7168.0   2304.0      8.0   491.999984   503.903985
97   2048.0   7168.0   2304.0      8.0   502.816021   626.879990
98   4096.0   7168.0   2304.0      8.0   636.384010   897.104025
99      8.0   7168.0   2048.0      8.0   483.904004   390.432000
100    16.0   7168.0   2048.0      8.0   484.672010   386.319995
101    32.0   7168.0   2048.0      8.0   484.607995   391.023993
102    64.0   7168.0   2048.0      8.0   478.639990   388.319999
103   128.0   7168.0   2048.0      8.0   481.391996   385.728002
104   256.0   7168.0   2048.0      8.0   482.015997   381.119996
105  1024.0   7168.0   2048.0      8.0   487.744004   462.671995
106  2048.0   7168.0   2048.0      8.0   488.079995   569.424033
107  4096.0   7168.0   2048.0      8.0   577.023983   803.807974
108     8.0   7168.0    256.0      8.0   484.560013   376.864016
109    16.0   7168.0    256.0      8.0   481.503993   386.687994
110    32.0   7168.0    256.0      8.0   473.567992   362.367988
111    64.0   7168.0    256.0      8.0   475.136012   365.312010
112   128.0   7168.0    256.0      8.0   498.896003   383.167982
113   256.0   7168.0    256.0      8.0   489.567995   375.375986
114  1024.0   7168.0    256.0      8.0   481.631994   370.352000
115  2048.0   7168.0    256.0      8.0   485.119998   382.016003
116  4096.0   7168.0    256.0      8.0   489.728004   392.383993

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant