Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about __ARM_ARCH #16

Open
chauncyyoung opened this issue Jun 15, 2020 · 7 comments
Open

Questions about __ARM_ARCH #16

chauncyyoung opened this issue Jun 15, 2020 · 7 comments

Comments

@chauncyyoung
Copy link

I have two questions below:

First, when I used the extended_sgemm, I found it went into __ARM_ARCH acquiescently. But I can not find the place that it was defined. Could you help me solve this problem?

Second, I tried to use jit_avx512_common_gemm_f32 but was failed because of a undefined references ocuured in libmkldnn. Should I adjust other parameters to run it?

  • OS version: aarch64 GNU/Linux
  • Compiler version gcc (Ubuntu/Linaro 5.4.0-6kord1~16.04.12) 5.4.0 20160609
  • MKLROOT value (echo MKLROOT=$MKLROOT)

#ifdef __ARM_ARCH
// return ref_gemm(transa, transb, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
//else // #ifdef __ARM_ARCH
if (mayiuse(avx512_mic)) {
printf("enter 1\n");
return jit_avx512_common_gemm_f32(transa, transb,
M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
} else if (mayiuse(avx)) {
printf("enter 2\n");
float *dummy_ao = NULL;
float *dummy_bo = NULL;

    return gemm_driver(transa, transb, bias ? "C" : NULL, M, N, K, alpha,
            A, lda, dummy_ao, B, ldb, dummy_bo, beta, C, ldc, bias,
            force_jit_nocopy_gemm);
} else {
    printf("enter 3\n");
    return ref_gemm<float>(transa, transb,
            M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
}

#endif // #ifdef __ARM_ARCH

@Takumi-Honda
Copy link
Collaborator

Hi chauncyyoung

Regarding the first question:
__ARM_ARCH is a predefined macro for specifying architecture in compilers.

Regarding the second question:
Are you facing build error? On our environment, the error does not occur.
Did you modify source codes other than the above?

@chauncyyoung
Copy link
Author

Hi chauncyyoung

Regarding the first question:
__ARM_ARCH is a predefined macro for specifying architecture in compilers.

Regarding the second question:
Are you facing build error? On our environment, the error does not occur.
Did you modify source codes other than the above?

Thank you for your reply. I just tried to ignore the case of ref_gemm by adding "//" before it just like follows:

#ifdef __ARM_ARCH
// return ref_gemm(transa, transb, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
//#else // #ifdef __ARM_ARCH
if (mayiuse(avx512_mic)) {
return jit_avx512_common_gemm_f32(transa, transb,
M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
} else if (mayiuse(avx)) {
float *dummy_ao = NULL;
float *dummy_bo = NULL;

    return gemm_driver(transa, transb, bias ? "C" : NULL, M, N, K, alpha,
            A, lda, dummy_ao, B, ldb, dummy_bo, beta, C, ldc, bias,
            force_jit_nocopy_gemm);
} else {
    return ref_gemm<float>(transa, transb,
            M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
}

#endif // #ifdef __ARM_ARCH

And then the errors happened when built. Other source codes hadn't been modified. So I don't know whether I should modify the cmake files.
I also found this case has used
kernel_table[isTransA][isTransB][hasBias][beta_idx(beta)] = new xbyak_gemm(isTransA, isTransB, beta, hasBias);
in ./jit_avx512_common_gemm_f32.cpp while it used AVX512 in xbyak_gemm such as 'vgatherqps'(I'm not sure because I'm a newcomer...) If it's true, would it be transfered by xbyak to the Arm Assembly?
Thank you again for helping me!!! :)

@kawakami-k
Copy link
Collaborator

Hi chauncyyoung-san

Thank you for trying dnnl_aarch64.

I tried your procedure.

  • cloned dnnl_aarch64
  • modified gemm.cpp (#ifdef __ARM_ARCH -> #ifdef __ARM_ARCH_)
  • and finally execute cmake and make
#ifndef __ARM_ARCH_
    if (mayiuse(avx512_mic)) {
        return jit_avx512_common_gemm_f32(transa, transb,
                M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
    } else if (mayiuse(avx)) {
        float *dummy_ao = NULL;
        float *dummy_bo = NULL;

        return gemm_driver(transa, transb, bias ? "C" : NULL, M, N, K, alpha,
                A, lda, dummy_ao, B, ldb, dummy_bo, beta, C, ldc, bias,
                force_jit_nocopy_gemm);
    } else
#endif // __ARM_ARCH                                                                                                                                                                                                                                                                                            
    {
        return ref_gemm<float>(transa, transb,
                M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
    }

All binary are successfully built in my environment,
but ./test_gemm_f32 becomes SEGV in 14-th test pattern.
I'll try to bug fix.

[==========] Running 21 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 21 tests from TestGEMM_fp32/gemm_test
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/0
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/0 (0 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/1
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/1 (0 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/2
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/2 (0 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/3
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/3 (0 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/4
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/4 (3296 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/5
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/5 (9 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/6
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/6 (11 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/7
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/7 (12 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/8
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/8 (22 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/9
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/9 (9 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/10
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/10 (8 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/11
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/11 (6 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/12
[       OK ] TestGEMM_fp32/gemm_test.TestGEMM/12 (1 ms)
[ RUN      ] TestGEMM_fp32/gemm_test.TestGEMM/13
zsh: segmentation fault (core dumped)  ./test_gemm_f32

@kawakami-k
Copy link
Collaborator

Currently, dnnl_aarch64 is assumed to be run on an Armv8-A+SVE instruction set CPU. If you don't have such envrionment, you can use QEMU to emulate Armv8-A+SVE instructions.

@chauncyyoung
Copy link
Author

Currently, dnnl_aarch64 is assumed to be run on an Armv8-A+SVE instruction set CPU. If you don't have such envrionment, you can use QEMU to emulate Armv8-A+SVE instructions.

Thank you for your reply, I think it may be influenced by version of dnnl_aarch64. I used the branch of release_base_0.19. I'll try the latest version with QEMU later.

@chauncyyoung
Copy link
Author

Currently, dnnl_aarch64 is assumed to be run on an Armv8-A+SVE instruction set CPU. If you don't have such envrionment, you can use QEMU to emulate Armv8-A+SVE instructions.

I'm also not sure about xbyak whether it translates x86 assembler to aarch64 assembler?

Another question occurs in gemm.cpp as follows:

#ifdef __ARM_ARCH
return ref_gemm(
transa, transb, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
#else // #ifdef __ARM_ARCH
if (mayiuse(avx512_mic)) {
return jit_avx512_common_gemm_f32(transa, transb,
M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
} else if (mayiuse(avx)) {
float *dummy_ao = NULL;
float *dummy_bo = NULL;

    return gemm_driver(transa, transb, bias ? "C" : NULL, M, N, K, alpha,
            A, lda, dummy_ao, B, ldb, dummy_bo, beta, C, ldc, bias,
            force_jit_nocopy_gemm);
} else {
    return ref_gemm<float>(transa, transb,
            M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
}

#endif // #ifdef __ARM_ARCH
}

As ref_gemm is before jit_avx512_common_gemm_f32, does ref_gemm has a higher priority? Or in another words, does ref_gemm has a better performance than jit_avx512_common_gemm_f32?

@kawakami-k
Copy link
Collaborator

chauncyyoung-san

"release_base_0.19" does not output any JIT-ed code except jit_uni_reorder.cpp so that ref_gemm is always used for AArch64.

Please use "release_base_0.21" to try various JIT-ed code on AArch64. This version generates some JIT-ed code directly by using Xbyak_aarch64. It is implemented src/cpu/jit_sve_*.cpp. And this version also outputs some JIT-ed code indirectly by using Xbyak_translator_aarch64, which translates x86 JIT-ed instructions to AArch64 instructions one by one.

If you want to try JIT-ed gemm, replace

#ifndef __ARM_ARCH

of
https://github.com/fujitsu/dnnl_aarch64/blob/release_base_0.21/src/cpu/gemm/gemm.cpp#L123
to

#ifdef __ARM_ARCH

Currently, "release_base_0.21" has some bugs in JIT-ed gemm, it is disabled by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants