mratsim / laser Sponsor Star 282 Code Issues Pull requests The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers deep-learning assembler parallel openmp jit simd matrix-multiplication high-performance-computing blas convolution tensor compiler-optimization gemm runtime-cpu-detection Updated Jan 4, 2024 Nim