Skip to content

Latest commit

 

History

History
80 lines (60 loc) · 3.07 KB

README.md

File metadata and controls

80 lines (60 loc) · 3.07 KB

LargeModel

Infra

imbue from baremetal to 70b model

Transformer

Training from scratch

Training framework performance

Fine-tune with single node

Inference explaination

Performance projection

Model reference

Tracing

Profiling

Trace analysis: https://github.com/facebookresearch/HolisticTraceAnalysis/tree/main/examples

Rewrite

Model Visulization

Training time, Flops estimation

GPU benchmarks

git clone https://github.com/te42kyfo/gpu-benches.git
cd gpu-benches/gpu-stream/
/usr/local/cuda/bin/nvcc -o stream main.cu
./stream

GPU foundamentals

Compilation

Chip architecture