Skip to content

Commit 1b95803

Browse files
spolifroni-amdafagaj
authored andcommitted
updated the changelog with 7.1 and beyond info
1 parent 211d64e commit 1b95803

File tree

1 file changed

+35
-31
lines changed

1 file changed

+35
-31
lines changed

CHANGELOG.md

Lines changed: 35 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,44 @@
22

33
Documentation for Composable Kernel available at [https://rocm.docs.amd.com/projects/composable_kernel/en/latest/](https://rocm.docs.amd.com/projects/composable_kernel/en/latest/).
44

5-
## Composable Kernel 1.2.0 for ROCm 7.0.0
5+
## (Unreleased) Composable Kernel for ROCm
6+
7+
### Added
68

7-
### Added
89
* Added a compute async pipeline in the CK TILE universal GEMM on gfx950
910
* Added support for B Tensor type pk_int4_t in the CK TILE weight preshuffle GEMM.
1011
* Added the new api to load different memory sizes to SGPR.
1112
* Added support for B Tensor Preshuffle in CK TILE Grouped GEMM.
1213
* Added a basic copy kernel example and supporting documentation for new CK Tile developers.
1314
* Added support for grouped_gemm kernels to perform multi_d elementwise operation.
15+
* Added support for Multiple ABD GEMM
16+
* Added benchmarking support for tile engine GEMM Multi D.
17+
* Added block scaling support in CK_TILE GEMM, allowing flexible use of quantization matrices from either A or B operands.
18+
* Added the row-wise column-wise quantization for CK_TILE GEMM & CK_TILE Grouped GEMM.
19+
* Added support for f32 to FMHA (fwd/bwd).
20+
* Added tensor-wise quantization for CK_TILE GEMM.
21+
* Added support for batched contraction kernel.
22+
* Added pooling kernel in CK_TILE
23+
24+
### Changed
25+
26+
* Removed `BlockSize` in `make_kernel` and `CShuffleEpilogueProblem` to support Wave32 in CK_TILE (#2594)
27+
28+
## Composable Kernel 1.1.0 for ROCm 7.1.0
29+
30+
### Added
31+
32+
* Added support for hdim as a multiple of 32 for FMHA (fwd/fwd_splitkv/bwd)
33+
* Added support for elementwise kernel.
34+
35+
### Upcoming changes
36+
37+
* Non-grouped convolutions are deprecated. Their functionality is supported by grouped convolution.
38+
39+
## Composable Kernel 1.1.0 for ROCm 7.0.0
40+
41+
### Added
42+
1443
* Added support for bf16, f32, and f16 for 2D and 3D NGCHW grouped convolution backward data
1544
* Added a fully asynchronous HOST (CPU) arguments copy flow for CK grouped GEMM kernels.
1645
* Added support GKCYX layout for grouped convolution forward (NGCHW/GKCYX/NGKHW, number of instances in instance factory for NGCHW/GKYXC/NGKHW has been reduced).
@@ -19,55 +48,30 @@ Documentation for Composable Kernel available at [https://rocm.docs.amd.com/proj
1948
* Added support for GKCYX layout for grouped convolution backward data (NGCHW/GKCYX/NGKHW).
2049
* Added support for Stream-K version of mixed fp8/bf16 GEMM
2150
* Added support for Multiple D GEMM
22-
* Added support for Multiple ABD GEMM
2351
* Added GEMM pipeline for microscaling (MX) FP8/FP6/FP4 data types
2452
* Added support for FP16 2:4 structured sparsity to universal GEMM.
2553
* Added support for Split K for grouped convolution backward data.
2654
* Added logit soft-capping support for fMHA forward kernels.
2755
* Added support for hdim as a multiple of 32 for FMHA (fwd/fwd_splitkv)
28-
* Added support for hdim as a multiple of 32 for FMHA (fwd/fwd_splitkv/bwd)
2956
* Added benchmarking support for tile engine GEMM.
3057
* Added Ping-pong scheduler support for GEMM operation along the K dimension.
3158
* Added rotating buffer feature for CK_Tile GEMM.
3259
* Added int8 support for CK_TILE GEMM.
33-
* Added support for elementwise kernel.
34-
* Added benchmarking support for tile engine GEMM Multi D.
35-
* Added block scaling support in CK_TILE GEMM, allowing flexible use of quantization matrices from either A or B operands.
36-
* Added the row-wise column-wise quantization for CK_TILE GEMM & CK_TILE Grouped GEMM.
37-
* Added support for f32 to FMHA (fwd/bwd).
38-
* Added tensor-wise quantization for CK_TILE GEMM.
39-
* Added support for batched contraction kernel.
40-
* Added pooling kernel in CK_TILE
4160

4261
### Optimized
4362

63+
* Optimize the gemm multiply multiply preshuffle & lds bypass with Pack of KGroup and better instruction layout.
64+
* Added Vectorize Transpose optimization for CK Tile
65+
* Added the asynchronous copy for gfx950
4466

45-
* Optimize the gemm multiply multiply preshuffle & lds bypass with Pack of KGroup and better instruction layout. (#2166)
46-
* Added Vectorize Transpose optimization for CK Tile (#2131)
47-
* Added the asynchronous copy for gfx950 (#2425)
48-
49-
50-
### Fixes
51-
52-
None
53-
54-
### Changes
67+
### Changed
5568

5669
* Removed support for gfx940 and gfx941 targets (#1944)
5770
* Replaced the raw buffer load/store intrinsics with Clang20 built-ins (#1876)
5871
* DL and DPP kernels are now enabled by default.
5972
* Number of instances in instance factory for grouped convolution forward NGCHW/GKYXC/NGKHW has been reduced.
6073
* Number of instances in instance factory for grouped convolution backward weight NGCHW/GKYXC/NGKHW has been reduced.
6174
* Number of instances in instance factory for grouped convolution backward data NGCHW/GKYXC/NGKHW has been reduced.
62-
* Removed `BlockSize` in `make_kernel` and `CShuffleEpilogueProblem` to support Wave32 in CK_TILE (#2594)
63-
64-
### Known issues
65-
66-
None
67-
68-
### Upcoming changes
69-
70-
* Non-grouped convolutions are deprecated. All of their functionality is supported by grouped convolution.
7175

7276
## Composable Kernel 1.1.0 for ROCm 6.1.0
7377

0 commit comments

Comments
 (0)