tensorcore

Star

Here are 12 public repositories matching this topic...

YukeWang96 / APNN-TC_SC21

Star

Artifact for SC21: APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores.

gpu dnn tensorcore quantized-neural-networks

Updated Aug 26, 2021
Cuda

ShaoKAi100812 / CudaCore_TensorCore_Acceleration

Star

Compare the different runtime of CNN computation on CPU and GPU

gpu cuda tensorcore

Updated May 1, 2022
C++

hinofafa / torch_accelerator

Star

Experiments to accelerate GPU device for PyTorch training

pytorch gpu-acceleration mixed-precision tensorcore gpu-profiler

Updated Dec 15, 2021
Jupyter Notebook

enp1s0 / cuMpSGEMM

Star

Fast SGEMM emulation on Tensor Cores

gpu cuda gemm half-precision mixed-precision tensorcore tensorcores fp32

Updated Aug 19, 2024
Cuda

wmmae / mma.simt

Star

A software TensorCore using warp shuffle

cuda tensorcore wmma-api

Updated Jul 22, 2021
C++

eshibusawa / Simple-Examples

Star

simple examples of tools and libraries

python cuda pybind11 cupy cub pytorch-extension tensorcore

Updated May 24, 2024
Python

wmmae / hmma.f32.f32

Star

An extension library of WMMA API for single precision matrix operation using TensorCores and error correction technique

gpu cuda tensorcore tensorcores wmma-api

Updated Jul 22, 2021
C++

robbwu / tensorsvm

Star

Fast Kernel SVM on TensorCore enabled GPU

machine-learning gpu svm tensorcore

Updated Aug 15, 2022
Cuda

enp1s0 / ozIMMU

Star

FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme

cuda gemm mixed-precision tensorcore tensorcores

Updated Sep 7, 2024
Cuda

YukeWang96 / QGTC_PPoPP22

Star

Artifact for PPoPP22 QGTC: Accelerating Quantized GNN via GPU Tensor Core.

cuda pytorch tensorcore

Updated Feb 12, 2022
Python

wmmae / wmma_extension

Star

An extension library of WMMA API (Tensor Core API)

gpu matrix gpu-computing gpu-programming tensorcore tensorcores wmma-api

Updated Jul 12, 2024
Cuda

Zhen-Dong / HAWQ

Star

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

pytorch quantization hessian 8-bit model-compression distillation tvm 4-bit mixed-precision tensorcore quantized-neural-networks hardware-aware efficient-neural-networks

Updated May 15, 2023
Python

Improve this page

Add a description, image, and links to the tensorcore topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tensorcore topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensorcore

Here are 12 public repositories matching this topic...

YukeWang96 / APNN-TC_SC21

ShaoKAi100812 / CudaCore_TensorCore_Acceleration

hinofafa / torch_accelerator

enp1s0 / cuMpSGEMM

wmmae / mma.simt

eshibusawa / Simple-Examples

wmmae / hmma.f32.f32

robbwu / tensorsvm

enp1s0 / ozIMMU

YukeWang96 / QGTC_PPoPP22

wmmae / wmma_extension

Zhen-Dong / HAWQ

Improve this page

Add this topic to your repo