CUDA Neural Network Inference Engine

A minimal GPU implementation of a 2-layer feedforward neural network using CUDA. Demonstrates matrix multiplication with shared memory tiling and GPU vs CPU performance.

Features

Forward pass: input → hidden → output
ReLU and Softmax activation kernels
Tiled matrix multiplication (16×16 blocks)
GPU vs CPU timing comparison

Requirements

NVIDIA GPU with CUDA support
CUDA Toolkit installed
C++17 compiler

Build & Run

nvcc -arch=sm_75 -O3 src/main.cu -o build/main
./build/main

Sample Output

GPU forward pass time: 0.31 ms
probability of class 0: 0.00
CPU forward pass time: 4.19 ms

Tested with CUDA 12.4 and C++17. Adjust -arch to match your GPU’s compute capability.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
build		build
practice		practice
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CUDA Neural Network Inference Engine

Features

Requirements

Build & Run

Sample Output

About

Uh oh!

Releases

Packages

Languages

JayZenith/CUDA_NN_Inference_Engine

Folders and files

Latest commit

History

Repository files navigation

CUDA Neural Network Inference Engine

Features

Requirements

Build & Run

Sample Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages