Skip to content

xlinsist/triton-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

triton-benchmark

examples使用示例

见examples目录下子文件夹里的makefile文件。

benchmarks使用示例

7个AI算子分别在1核、4核和8核,采用gcc、clang和triton,在RISC-V CPU(SpacemiT Muse Pi)上的执行结果如下,执行步骤见benchmarks目录下的README

##### correlation kernel performance #####
shape (OUT_CHANNELxIN_CHANNELxHEIGHTxWIDTHxRUN_COUNT)	gcc_T1	clang_T1	triton_T1	gcc_T4	clang_T4	triton_T4	gcc_T8	clang_T8	triton_T8
5x58x112x88x10	0.406807	0.550575	0.185582	0.0896354	0.131778	0.0554982	0.049427	0.0692719	0.0323904



##### dropout kernel performance #####
shape (NxRUN_COUNT)	gcc_T1	clang_T1	triton_T1	gcc_T4	clang_T4	triton_T4	gcc_T8	clang_T8	triton_T8
1048576x10	0.437414	44.4074	16.5371	0.116678	11.179	4.19115	0.0652238	5.60955	2.13063



##### layernorm kernel performance #####
shape (NxDxRUN_COUNT)	gcc_T1	clang_T1	triton_T1	gcc_T4	clang_T4	triton_T4	gcc_T8	clang_T8	triton_T8
1151x8192x10	3.99499	5.10841	4.87809	1.1298	1.37735	1.63461	1.03388	0.985506	1.00223



##### matmul kernel performance #####
shape (MxNxKxRUN_COUNT)	gcc_T1	clang_T1	triton_T1	gcc_T4	clang_T4	triton_T4	gcc_T8	clang_T8	triton_T8
128x128x64x10	0.0529591	0.063293	2.19012	0.017613	0.0199949	0.564048	0.0123523	0.0135753	0.286434



##### resize kernel performance #####
shape (HxWxCxRUN_COUNT)	gcc_T1	clang_T1	triton_T1	gcc_T4	clang_T4	triton_T4	gcc_T8	clang_T8	triton_T8
512x512x3x10	0.802106	1.19739	1.42896	0.205363	0.304806	0.362561	0.107914	0.157187	0.186983



##### rope kernel performance #####
shape (SEQ_LENxBATCH_NUMxHEAD_NUMxHEAD_DIMxRUN_COUNT)	gcc_T1	clang_T1	triton_T1	gcc_T4	clang_T4	triton_T4	gcc_T8	clang_T8	triton_T8
512x16x8x1024x10	6.71872	7.74758	11.4048	2.25791	2.43972	3.19468	1.86013	1.97924	2.25463



##### softmax_kernel kernel performance #####
shape (RxCxRUN_COUNT)	gcc_T1	clang_T1	triton_T1	gcc_T4	clang_T4	triton_T4	gcc_T8	clang_T8	triton_T8
1823x781x10	0.841515	0.820086	0.904563	0.221429	0.212233	0.232222	0.116075	0.112719	0.122978



##### warp kernel performance #####
shape (HxWxCxRUN_COUNT)	gcc_T1	clang_T1	triton_T1	gcc_T4	clang_T4	triton_T4	gcc_T8	clang_T8	triton_T8
1024x1024x3x10	0.620494	1.33427	1.22298	0.161322	0.339647	0.311901	0.0858445	0.174764	0.161152

About

RISCV C and Triton AI-Benchmark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published