#

matmul

Here are 20 public repositories matching this topic...

COSMA

eth-cscs / COSMA

Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm

linear-algebra mpi cuda scalapack matrix-multiplication gpu-acceleration rocm matmul communication-optimal pdgemm

Updated Apr 2, 2025
C++

eth-cscs / Tiled-MM

Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.

amd gpu cuda cublas nvidia matrix-multiplication rocm cublasxt matmul rocblasxt rocblas

Updated Apr 2, 2025
C++

paxbun / float-matmul

Floating-point matrix multiplication implementation (arbitrary precision)

fpga verilog floating-point matmul

Updated Aug 3, 2021
Verilog

sagi21805 / matmul-npu

Matrix multiplication on the NPU inside RK3588

opencv matrix-multiplication npu matmul rk3588 orange-pi-5 rk3588s

Updated Jun 27, 2024
C++

formatmul

gha3mi / formatmul

ForMatmul - A Fortran library that overloads the matmul function to enable efficient matrix multiplication with/without coarray.

fortran coarray matmul fortran-package-manager

Updated Feb 1, 2024
Fortran

digital-nomad-cheng / matmul_cuda_kernel_tvm

Generate optimized MatMul cuda kernel automatically using tvm auto schedule.

hpc gpu cuda gemm tvm gemm-optimization matmul

Updated Feb 25, 2023
Jupyter Notebook

eduand-alvarez / CUDA_Custom_MatMul_Experiment

This project integrates a custom CUDA-based matrix multiplication kernel into a PyTorch deep learning model, leveraging GPU acceleration for matrix operations. The goal is to compare the performance of this custom kernel with PyTorch's built-in matrix multiplication and demonstrate how custom CUDA kernels can optimize compute-intensive operations.

cuda-kernels matmul

Updated Aug 26, 2024
Python

LaserBorg / circuitpython_benchmark

Raspberry Pi Pico (RP2040) and Adafruit Metro M7 (NXP IMXRT10XX) benchmark

benchmark adafruit python3 mcu circuitpython float32 matmul raspberry-pi-pico adafruit-metro-m7

Updated Jan 12, 2024
Python

alprn42 / Instruction-Counter

In this project, ınstruction numbers from a c program are counted with pin and c++.

counter cpp pin registers instruction matmul instruction-counter branch-instruction resgister-counter

Updated Feb 21, 2020
C++

Awrsha / Advanced-CUDA-Programming-GPU-Architecture

This repository provides a comprehensive guide to optimizing GPU kernels for performance, with a focus on NVIDIA GPUs. It covers key tools and techniques such as CUDA, PyTorch, and Triton, aimed at improving computational efficiency for deep learning and scientific computing tasks.

multiprocessing multithreading jit triton kernels gpu-programming cuda-programming matmul torchquantum mojo-language

Updated Nov 13, 2024
Cuda

xone4 / optimized-Mat-Mul-cuda-code

The provided code is a Python script that uses the CuPy library to perform optimized GPU operations, specifically matrix multiplication. The script includes a custom CUDA kernel that is optimized for performance and energy consumption. The kernel uses half-precision floating-point numbers (float16) for improved performance and warp utilization.

optimization cuda-kernels matmul

Updated Oct 7, 2024
Python

LRZ-BADW / OMMOP

OpenMP Matrix Multiplication Offloading Playground

gpu openmp offloading gemm matmul

Updated Dec 2, 2022
C++

Alexieviri / Parallel-Computing-on-CUDA

📰 This repository contains time measurements of various algorithms on the CPU and GPU using PyCuda: matrix multiplication, Pi computation, and bilateral filtering.

python cuda parallel-programming picalculator matmul bilateral-filtering

Updated Jan 29, 2022
Jupyter Notebook

akifejaz / matmul-testbench

This is the simple script that generate matrixes of size 4 by 4, for testing Matmul.

python testbench matmul

Updated Nov 18, 2022
Python

akifejaz / HwVerification

This repo contains the python scripts for MatMul's all modules testing.

testing hardware matmul

Updated Apr 28, 2023
Python

martins0n / matmul

Matrix-matrix multiplication implementations benchmarking

matrix-multiplication blas gemm matmul

Updated Dec 2, 2021
Rust

jhson989 / matmul_cublas

cuBLAS GEMM Example for FP32 MatMul

cuda cublas matmul

Updated Mar 29, 2022
Cuda

DelSquared / Rust-Basic-Matrix-Multiplication

Rust Basic Matrix Multiplication

rust algebra matrix multiplication linear linalg matmul

Updated Mar 30, 2019
Rust

jhson989 / SYCL-heterogeneous

CPU, GPU, and FPGA matrix multiplication examples via SYCL

cpu fpga gpu sycl matmul

Updated Feb 23, 2022
C++

WilliamSpanfelner / day-76-computation_with_numpy

Check out the power of NumPy

numpy random matrices arrays ndarray vectors tensors flip imshow scalars matmul linspace arange

Updated Jan 21, 2023
Jupyter Notebook

Improve this page

Add a description, image, and links to the matmul topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the matmul topic, visit your repo's landing page and select "manage topics."