Skip to content

Feedforward neural network on the GPU using CUDA, demonstrating high-performance matrix operations and GPU acceleration.

Notifications You must be signed in to change notification settings

JayZenith/CUDA_NN_Inference_Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CUDA Neural Network Inference Engine

A minimal GPU implementation of a 2-layer feedforward neural network using CUDA. Demonstrates matrix multiplication with shared memory tiling and GPU vs CPU performance.

Features

  • Forward pass: input → hidden → output
  • ReLU and Softmax activation kernels
  • Tiled matrix multiplication (16×16 blocks)
  • GPU vs CPU timing comparison

Requirements

  • NVIDIA GPU with CUDA support
  • CUDA Toolkit installed
  • C++17 compiler

Build & Run

nvcc -arch=sm_75 -O3 src/main.cu -o build/main
./build/main

Sample Output

GPU forward pass time: 0.31 ms
probability of class 0: 0.00
CPU forward pass time: 4.19 ms

Tested with CUDA 12.4 and C++17. Adjust -arch to match your GPU’s compute capability.

About

Feedforward neural network on the GPU using CUDA, demonstrating high-performance matrix operations and GPU acceleration.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages