-
Notifications
You must be signed in to change notification settings - Fork 244
Stream-K Gemm Example for fp8 and bf8 #3041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to overwrite the existing types, or just add a new type for BF8 and FP8? (Right now they are overwritten)
| std::tuple< Row, Col, Row, F16, F16, F32, F16>, | ||
| std::tuple< Row, Col, Row, F8, F8, F32, F16>, | ||
| std::tuple< Row, Col, Row, BF8, BF8, F32, F16>, | ||
| std::tuple< Row, Col, Row, BF16, BF16, F32, BF16> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we don't want to overwrite the existing smoke tests, I think we can remove FP16 and BF16 from these types. Also, can we please add RRR, CRR, and CCR for FP8 and BF8?
| using Col = ck_tile::tensor_layout::gemm::ColumnMajor; | ||
|
|
||
| // clang-format off | ||
| using KernelTypesStreamK = ::testing::Types< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a duplicate of KernelTypesStreamK in test/ck_tile/gemm_streamk/test_gemm_streamk_types.hpp.
| const float max_accumulated_value = | ||
| *std::max_element(c_m_n_host_ref.mData.begin(), c_m_n_host_ref.mData.end()); | ||
| const auto rtol_atol = calculate_rtol_atol<ADataType, BDataType, AccDataType, CDataType>( | ||
| K, /*kbatch*/ 1, max_accumulated_value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to use estimate_num_wgs_per_tile here because for some test cases, we have >1 workgroup atomically adding to the same C macro tile. This could lead to round-off error and result in failing tests in a sporadic nature.
| @@ -0,0 +1,282 @@ | |||
| // Copyright © Advanced Micro Devices, Inc., or its affiliates. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a test harness already in test_gemm_streamk.hpp, I don't think we need to duplicate it here
| using Row = ck_tile::tensor_layout::gemm::RowMajor; | ||
| using Col = ck_tile::tensor_layout::gemm::ColumnMajor; | ||
|
|
||
| using Mem = ck_tile::integral_constant<GemmPipelineType, GemmPipelineType::Mem>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are types needed for the rest of the smoke tests
|
|
||
| include_directories(BEFORE ${CMAKE_CURRENT_SOURCE_DIR}) | ||
|
|
||
| add_gtest_executable(test_ck_tile_streamk test_gemm_streamk_fp8_bf8.cpp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should look at adding the fp8 tests to the smoke test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to consolidate new fp8 tests into smoke tests with existing test harness.
Proposed changes
This PR is expanding the supported datatypes for Stream-K Gemm by adding examples for fp8 and bf8. Previously examples for fp16 and bf 16 were added through this PR. Currently these examples only support atomic reduction. Unit tests for fp8 and bf8 have been added through a test suite.
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered