Skip to content

Conversation

ChipKerchner
Copy link
Contributor

Add vectorized packing for FP16 and BF16 - up to a 3X improvement.

Reactivate vector packing for FP64 transposed - turned out slowdown in previous MR was from use of vector load/store segment (which is slow on some platforms for FP64).

@ChipKerchner
Copy link
Contributor Author

#5457

@ChipKerchner ChipKerchner changed the title Add vectorized packing for FP16 and BF16. Reactivate vector packing for FP64 transposed Add vectorized packing for FP16 and BF16 for RISC-V. Reactivate vector packing for FP64 transposed Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant