Introduction

This is my GPU course final project in MICS600J. The main content is my attempt to implement the attention mechanism efficiently.

Paraments

In order to simplify the process, I replaced the original matrix dimensions [batch_size, nheads, seq_len, headdim] with [seq_len, headdim].

N: seq_len d: headdim

Algorithm process / matrix dimension

The attention mechanism is well known and I won’t go into details.

In = N * d
WQ WK WV = d * d
Q K V = N * d
P = Q * K^T = N * N
S = SoftMax(P) = N * N
o = S * V = N * d
Out = S * W = N * d

Technical Highlights

Use Tensor core to compute GEMM.
Use Asynchronous transfer to overlap computation and communication(transfer data from global memory to shared memory).
Bank Conflict free.

Build

Use make command to build program.

Experiment

Range for N,d: N~(32, 1024), d~(32, 2048). Please get more detail from script.
Test on NVIDIA A100 in HKUST(GZ)-HPC Server.

Done

Fine-tuning Llama-2-7B, when using Sparse Attention Mechanism, we found that accuracy can be improved and restored with little overhead.

Doing...

Kernel Fusion, just like Flash attention.
Sparse Attention Mechanism, just like DFSS, make full use of sparse tensor core.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
bin		bin
Makefile		Makefile
Readme.md		Readme.md
attention.cu		attention.cu
attention.cuh		attention.cuh
check.h		check.h
kernel.cuh		kernel.cuh
output copy.csv		output copy.csv
output.csv		output.csv
test.sh		test.sh
utils.cuh		utils.cuh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Paraments

Algorithm process / matrix dimension

Technical Highlights

Build

Experiment

Done

Doing...

About

Releases

Packages

Languages

xxyux/Attention

Folders and files

Latest commit

History

Repository files navigation

Introduction

Paraments

Algorithm process / matrix dimension

Technical Highlights

Build

Experiment

Done

Doing...

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages