-
Notifications
You must be signed in to change notification settings - Fork 266
Closed
Labels
Description
Hello, I have some trouble to compile composable_kernel for my AMD GPU architecture (gfx1010)
cmake \
-D CMAKE_PREFIX_PATH=/opt/rocm \
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
-D CMAKE_CXX_FLAGS="-O3" \
-D CMAKE_BUILD_TYPE=Release \
-D GPU_TARGETS="gfx1010" \
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is Clang 15.0.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rocm/bin/hipcc - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
CMAKE_CXX_COMPILER_ID: Clang
OpenMP_CXX_LIB_NAMES: libomp;libgomp;libiomp5
OpenMP_gomp_LIBRARY:
OpenMP_pthread_LIBRARY:
OpenMP_CXX_FLAGS: -fopenmp=libomp -Wno-unused-command-line-argument
-- hip::amdhip64 is SHARED_LIBRARY
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success
-- Build with HIP 5.4.22505
-- Clang tidy not found
-- Clang tidy checks: *,-abseil-*,-android-cloexec-fopen,-cert-msc30-c,-bugprone-exception-escape,-bugprone-macro-parentheses,-cert-env33-c,-cert-msc32-c,-cert-msc50-cpp,-cert-msc51-cpp,-cert-dcl37-c,-cert-dcl51-cpp,-clang-analyzer-alpha.core.CastToStruct,-clang-analyzer-optin.performance.Padding,-clang-diagnostic-deprecated-declarations,-clang-diagnostic-extern-c-compat,-clang-diagnostic-unused-command-line-argument,-cppcoreguidelines-avoid-c-arrays,-cppcoreguidelines-avoid-magic-numbers,-cppcoreguidelines-explicit-virtual-functions,-cppcoreguidelines-init-variables,-cppcoreguidelines-macro-usage,-cppcoreguidelines-non-private-member-variables-in-classes,-cppcoreguidelines-pro-bounds-array-to-pointer-decay,-cppcoreguidelines-pro-bounds-constant-array-index,-cppcoreguidelines-pro-bounds-pointer-arithmetic,-cppcoreguidelines-pro-type-member-init,-cppcoreguidelines-pro-type-reinterpret-cast,-cppcoreguidelines-pro-type-union-access,-cppcoreguidelines-pro-type-vararg,-cppcoreguidelines-special-member-functions,-fuchsia-*,-google-explicit-constructor,-google-readability-braces-around-statements,-google-readability-todo,-google-runtime-int,-google-runtime-references,-hicpp-vararg,-hicpp-braces-around-statements,-hicpp-explicit-conversions,-hicpp-named-parameter,-hicpp-no-array-decay,-hicpp-avoid-c-arrays,-hicpp-signed-bitwise,-hicpp-special-member-functions,-hicpp-uppercase-literal-suffix,-hicpp-use-auto,-hicpp-use-equals-default,-hicpp-use-override,-llvm-header-guard,-llvm-include-order,-llvmlibc-restrict-system-libc-headers,-llvmlibc-callee-namespace,-llvmlibc-implementation-in-namespace,-llvm-else-after-return,-llvm-qualified-auto,-misc-misplaced-const,-misc-non-private-member-variables-in-classes,-misc-no-recursion,-modernize-avoid-bind,-modernize-avoid-c-arrays,-modernize-pass-by-value,-modernize-use-auto,-modernize-use-default-member-init,-modernize-use-equals-default,-modernize-use-trailing-return-type,-modernize-use-transparent-functors,-performance-unnecessary-value-param,-readability-braces-around-statements,-readability-else-after-return,-readability-function-cognitive-complexity,-readability-isolate-declaration,-readability-magic-numbers,-readability-named-parameter,-readability-uppercase-literal-suffix,-readability-convert-member-functions-to-static,-readability-qualified-auto,-readability-redundant-string-init,-bugprone-narrowing-conversions,-cppcoreguidelines-narrowing-conversions,-altera-struct-pack-align,-cppcoreguidelines-prefer-member-initializer
CMAKE_CXX_FLAGS: -O3
adding instance device_batched_gemm_instance
adding instance device_batched_gemm_add_relu_gemm_add_instance
adding instance device_batched_gemm_bias_permute_instance
adding instance device_batched_gemm_gemm_instance
adding instance device_batched_gemm_multi_d_instance
adding instance device_batched_gemm_reduce_instance
adding instance device_batched_gemm_softmax_gemm_instance
adding instance device_batched_gemm_softmax_gemm_permute_instance
adding instance device_batchnorm_instance
adding instance device_contraction_bilinear_instance
adding instance device_contraction_scale_instance
adding instance device_conv1d_bwd_data_instance
adding instance device_conv2d_bwd_data_instance
adding instance device_conv2d_fwd_instance
adding instance device_conv2d_fwd_bias_relu_instance
adding instance device_conv2d_fwd_bias_relu_add_instance
adding instance device_conv3d_bwd_data_instance
adding instance device_elementwise_instance
adding instance device_elementwise_normalization_instance
adding instance device_gemm_instance
adding instance device_gemm_add_add_fastgelu_instance
adding instance device_gemm_add_fastgelu_instance
adding instance device_gemm_add_multiply_instance
adding instance device_gemm_add_relu_add_layernorm_instance
adding instance device_gemm_bias_add_reduce_instance
adding instance device_gemm_bilinear_instance
adding instance device_gemm_fastgelu_instance
adding instance device_gemm_reduce_instance
adding instance device_gemm_splitk_instance
adding instance device_grouped_conv1d_bwd_weight_instance
adding instance device_grouped_conv1d_fwd_instance
adding instance device_grouped_conv2d_bwd_data_instance
adding instance device_grouped_conv2d_bwd_weight_instance
adding instance device_grouped_conv2d_fwd_instance
adding instance device_grouped_conv3d_bwd_weight_instance
adding instance device_grouped_conv3d_fwd_instance
adding instance device_grouped_gemm_instance
adding instance device_grouped_gemm_fastgelu_instance
adding instance device_normalization_instance
adding instance device_pool_fwd_instance
adding instance device_quantization_instance
adding instance device_reduce_instance
adding instance device_softmax_instance
adding example example_gemm_dl_fp32
adding example example_gemm_dl_fp16
adding example example_gemm_dl_int8
adding example example_gemm_xdl_fp16
adding example example_gemm_xdl_wavelet_fp16
adding example example_gemm_xdl_bf16
adding example example_gemm_xdl_int8
adding example example_gemm_xdl_skip_b_lds_fp16
adding example example_gemm_xdl_fp64
adding example example_convnd_fwd_dl_fp16
adding example example_convnd_fwd_dl_fp32
adding example example_convnd_fwd_dl_int8
adding example example_reduce_blockwise
adding example example_reduce_multiblock_atomic_add
adding example example_reduce_blockwise_two_call
adding example example_pool2d_fwd_fp16
adding example example_pool2d_fwd_fp32
adding example example_gemm_dl_quantization_int8
adding example example_grouped_gemm_xdl_fp32
adding example example_grouped_gemm_xdl_fp16
adding example example_grouped_gemm_xdl_bfp16
adding example example_grouped_gemm_xdl_int8
adding example example_grouped_gemm_multiple_d_dl_fp16
adding example example_grouped_gemm_xdl_splitk_fp16
adding example example_convnd_bwd_data_dl_fp16
adding example example_broadcast_add_2d_amn_bn
adding example example_broadcast_add_3d_am_bmnk
adding example example_elementwise_add_1d
adding example example_elementwise_add_4d
adding example example_grouped_conv_bwd_weight_dl_fp16
adding example example_cgemm_xdl_bf16
adding example example_cgemm_xdl_fp16
adding example example_cgemm_xdl_fp32
adding example example_cgemm_xdl_int8
adding example example_softmax_blockwise
adding example example_batched_gemm_xdl_fp32
adding example example_batched_gemm_xdl_fp16
adding example example_batched_gemm_xdl_bfp16
adding example example_batched_gemm_xdl_int8
adding example example_gemm_bias_e_permute_g1m3n2k1_xdl_fp16
adding example example_gemm_bias_e_permute_g1m2n3k1_xdl_fp16
adding example example_contraction_bilinear_xdl_fp32
adding example example_contraction_scale_xdl_fp32
adding example example_contraction_bilinear_xdl_fp64
adding example example_contraction_scale_xdl_fp64
adding example example_layernorm_fp16
adding example example_layernorm_splitk_fp16
adding example example_grouped_gemm_bias_e_permute_xdl_fp16
adding example example_batched_gemm_bias_e_permute_xdl_fp16
adding example example_batched_gemm_scale_softmax_gemm_xdl_fp16
adding example example_batched_gemm_scale_softmax_gemm_xdl_bf16
adding example example_batched_gemm_scale_softmax_gemm_permute_xdl_fp16
adding example example_batched_gemm_scale_softmax_gemm_permute_xdl_bf16
adding example example_grouped_gemm_scale_softmax_gemm_permute_xdl_fp16
adding example example_batched_gemm_lower_triangle_scale_softmax_gemm_permute_xdl_fp16
adding example example_grouped_gemm_lower_triangle_scale_softmax_gemm_permute_xdl_fp16
adding example example_dual_reduce_multiblock
adding example example_dual_reduce_threadwise
adding example example_batchnorm_forward_training
adding example example_batchnorm_forward_inferring
adding example example_batchnorm_backward
adding example example_sparse_embedding3_forward_layernorm
adding example example_batched_gemm_add_add_relu_gemm_add_xdl_fp16
adding example example_permute_1xHxW_fp16
adding example example_permute_NxHxW_fp16
adding example example_permute_HxWx4_fp16
adding example example_conv2d_fwd_dl_perlayer_quantization_int8
adding example example_conv2d_fwd_dl_perchannel_quantization_int8
adding example example_conv2d_fwd_dl_bias_relu_perlayer_quantization_int8
adding example example_conv2d_fwd_dl_bias_relu_perchannel_quantization_int8
adding example example_conv2d_fwd_dl_bias_tanh_perlayer_quantization_int8
adding example example_conv2d_fwd_dl_bias_tanh_perchannel_quantization_int8
adding example example_groupnorm_sigmoid_mul_fp16
adding example example_groupnorm_splitk_fp16
adding example example_groupnorm_swish_fp16
adding example example_splitk_gemm_bias_e_permute_xdl_fp16
adding example example_splitk_gemm_bias_e_permute_xdl_fp32
adding example example_elementwise_permute_4D_fp16
adding example example_elementwise_permute_4D_fp16_2d
adding example example_elementwise_layernorm_blockwise
adding example example_gemm_add_multiply_dl_fp16
adding example example_gemm_add_multiply_xdl_fp16
adding example example_pool3d_fwd_fp16
adding example example_maxpool2d_bwd_bf16
adding example example_maxpool2d_bwd_fp16
adding example example_maxpool2d_bwd_fp32
adding example example_put_element_fp16
-- Fetching GoogleTest
-- Suppressing googltest warnings with flags: -Wno-undef;-Wno-reserved-identifier;-Wno-global-constructors;-Wno-missing-noreturn;-Wno-disabled-macro-expansion;-Wno-used-but-marked-unused;-Wno-switch-enum;-Wno-zero-as-null-pointer-constant;-Wno-unused-member-function;-Wno-comma;-Wno-old-style-cast;-Wno-deprecated;-Wno-unsafe-buffer-usage
-- Found Python: /usr/local/bin/python3 (found version "3.8.10") found components: Interpreter
adding test test_magic_number_division
adding test test_space_filling_curve
adding gtest test_conv_util
adding gtest test_reference_conv_fwd
adding test test_gemm_fp32
adding test test_gemm_fp16
adding test test_gemm_bf16
adding test test_gemm_int8
adding test test_gemm_standalone_xdl_fp16
adding test test_gemm_reduce_fp16
adding test test_reduce_no_index
adding test test_reduce_with_index
adding gtest test_grouped_convnd_fwd
adding gtest test_block_to_ctile_map
adding gtest test_softmax_rank3
adding gtest test_softmax_rank4
adding gtest test_softmax_interface
adding gtest test_layernorm2d_fp32
adding gtest test_layernorm2d_fp16
adding gtest test_groupnorm_fp16
adding gtest test_groupnorm_fp32
adding gtest test_fp8
adding gtest test_elementwise_layernorm_fp16
adding gtest test_batchnorm_fwd_rank_4
adding gtest test_batchnorm_bwd_rank_4
adding gtest test_batchnorm_infer_rank_4
adding gtest test_contraction
adding gtest test_avg_pool2d_fwd
adding gtest test_avg_pool3d_fwd
adding gtest test_max_pool2d_fwd
adding gtest test_max_pool3d_fwd
adding gtest test_batched_gemm_multi_d
RPM version 4.14.2.1
-- Configuring done
-- Generating done
-- Build files have been written to: /home/tyra/rocm/composable_kernel/build
Scanning dependencies of target device_softmax_instance
[ 0%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f16_f16_instance_rank3_reduce1.cpp.o
In file included from /home/tyra/rocm/composable_kernel/library/src/tensor_operation_instance/gpu/softmax/device_softmax_f16_f16_instance_rank3_reduce1.cpp:9:
In file included from /home/tyra/rocm/composable_kernel/library/include/ck/library/tensor_operation_instance/gpu/softmax/device_softmax_f16_f16_instance_type.hpp:8:
In file included from /home/tyra/rocm/composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_softmax_impl.hpp:12:
In file included from /home/tyra/rocm/composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_reduce_common.hpp:9:
In file included from /home/tyra/rocm/composable_kernel/include/ck/utility/common_header.hpp:36:
/home/tyra/rocm/composable_kernel/include/ck/utility/amd_buffer_addressing.hpp:32:48: error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'
wave_buffer_resource.config(Number<3>{}) = CK_BUFFER_RESOURCE_3RD_DWORD;
^
/home/tyra/rocm/composable_kernel/include/ck/utility/amd_buffer_addressing.hpp:47:48: error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'
wave_buffer_resource.config(Number<3>{}) = CK_BUFFER_RESOURCE_3RD_DWORD;
^
2 errors generated when compiling for gfx1010.
make[2]: *** [library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/build.make:82: library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f16_f16_instance_rank3_reduce1.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:14807: library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/all] Error 2
make: *** [Makefile:182: all] Error 2
Any ideas about a solution?