Skip to content

Conversation

matcraje
Copy link
Contributor

This PR introduces a specialized kernel that utilizes ARM SME1 (Scalable Matrix Extension) capabilities to optimize the cblas_ssymm function.

@matcraje
Copy link
Contributor Author

Looks like cmake builds are failing. I will check and modify the changes.

@matcraje matcraje force-pushed the topic/ssymm_direct_sme1 branch from 4d628d7 to 6dda7cf Compare September 29, 2025 08:52
@matcraje
Copy link
Contributor Author

The checks using 'make' are successful.
The checks using 'cmake' fail due to undefined symbols (Eg. undefined symbol '_ssymm_direct_alpha_betaLL_A64FX' etc).

@martin-frbg Do you know if I missed something?

@martin-frbg
Copy link
Collaborator

kernel/CMakeLists.txt has a second block of definitions starting around line 1100 that specifically handles DYNAMIC_ARCH builds - you need to add the equivalent lines for your ssymm_direct_alpha_betaLL etc. with an added ${TSUFFIX} there

@matcraje matcraje force-pushed the topic/ssymm_direct_sme1 branch from 6dda7cf to 5c49707 Compare September 30, 2025 05:44
@martin-frbg
Copy link
Collaborator

WoA build still fails with the

2025-09-30T05:58:18.3418812Z lld-link: error: duplicate symbol: __arm_tpidr2_save
2025-09-30T05:58:18.3419188Z >>> defined at kernel\CMakeFiles\kernel.dir\CMakeFiles\ssymm_direct_alpha_betaLU.c.obj

that we've come to understand happens when arm_sme.h is included unconditionally - please fix

Also please move your additions in interface/symm.c after the error handling (the line where xerbla is called if "info" is not zero) - this will probably require another #if defined CBLAS but protects your kernel from getting called with invalid arguments

@matcraje matcraje force-pushed the topic/ssymm_direct_sme1 branch from 5c49707 to 1926847 Compare September 30, 2025 09:35
@matcraje
Copy link
Contributor Author

Thanks @martin-frbg for the comments. Earlier errors are now resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants