CPU: llm.c vs Pytorch Benchmarks #253

dagelf · 2024-04-25T10:44:36Z

dagelf
Apr 25, 2024

Looks like there is still A LOT on the table with regards to CPU performance.

I think I'm starting to understand that Pytorch seems to have been written on and for CPU at least initially.... both its startup and training is still faster much than train_gpt2.c on CPU. (and its CPU startup is MUCH faster than its GPU startup)

Some anecdotal benchmarks from posts and PRs:

grep name /proc/cpuinfo |head -1
( kill -STOP -1; CUDA_VISIBLE_DEVICES="" timeout 120s python train_gpt2.py ; kill -CONT -1)
( kill -STOP -1; timeout 120s ./train_gpt2 ; kill -CONT -1)

CPU	llm.c	pytorch	∆	compiler
AMD Ryzen Threadripper PRO 7995W	598 ms	295 ms	-203%
AMD Ryzen 5 3600	1776 ms	950 ms	-185%	gcc linux
i3-9100F CPU @ 3.60GHz	3464 ms	1090 ms	-313%	gcc linux
i7-5600U CPU @ 2.60GHz	12601 ms	2142 ms	-588%	gcc windows

Looks like the compiler optimizations do a bit more for AMD... or pytorch is optimized for intel 😅

CPU improvements go in dev/cpu

karpathy · 2024-04-26T22:56:55Z

karpathy
Apr 26, 2024
Maintainer

Oh this is 100% my expectation. The C code we have is just a clean algorithmic reference.

The major thing missing is taking advantage of vector instructions - AVX or NEON or etc depending on platform. That's a deep rabbit hole you can go down on. You'd probably get really far just by doing it for matmul forward/backward.

0 replies

dagelf · 2024-05-21T20:13:28Z

dagelf
May 21, 2024
Author

First bash at matmul #411 brings it down to just less than 4x slower

0 replies

dagelf · 2024-06-06T09:24:21Z

dagelf
Jun 6, 2024
Author

Just a note, proving that you can't really benchmark CPU on your local machine...

[llm.c]$ timeout 30 ./train_gpt2
[GPT-2]
max_seq_len: 1024
vocab_size: 50257
padded_vocab_size: 50304
num_layers: 12
num_heads: 12
channels: 768
num_parameters: 124475904
train dataset num_batches: 1192
val dataset num_batches: 128
num_activations: 73347840
val loss 5.325523
step 0: train loss 5.356185 (took 3668.129276 ms)
step 1: train loss 4.301032 (took 3411.109197 ms)
step 2: train loss 4.623315 (took 3091.760633 ms)
step 3: train loss 4.600415 (took 2997.896758 ms)
step 4: train loss 4.616777 (took 3327.673955 ms)
step 5: train loss 4.231482 (took 3278.418725 ms)
step 6: train loss 3.754167 (took 3139.744739 ms)
step 7: train loss 3.652230 (took 3098.114652 ms)

# Suspend ALL other user processes.... 
[llm.c]$ (kill -SIGSTOP -1; timeout 30 ./train_gpt2; kill -SIGCONT -1) # DO NOT run without timeout or outside of a loop
[GPT-2]
max_seq_len: 1024
vocab_size: 50257
padded_vocab_size: 50304
num_layers: 12
num_heads: 12
channels: 768
num_parameters: 124475904
train dataset num_batches: 1192
val dataset num_batches: 128
num_activations: 73347840
val loss 5.325523
step 0: train loss 5.356185 (took 2339.281550 ms)
step 1: train loss 4.301032 (took 1913.788371 ms)
step 2: train loss 4.623315 (took 1889.559844 ms)
step 3: train loss 4.600415 (took 1962.420138 ms)
step 4: train loss 4.616777 (took 1950.769527 ms)
step 5: train loss 4.231482 (took 1921.486374 ms)
step 6: train loss 3.754167 (took 1884.934583 ms)
step 7: train loss 3.652230 (took 1884.412058 ms)
step 8: train loss 4.183515 (took 1889.294094 ms)
step 9: train loss 4.199315 (took 1870.744052 ms)
val loss 4.323445
step 10: train loss 4.288396 (took 1880.534094 ms)
step 11: train loss 3.558985 (took 1877.116088 ms)
step 12: train loss 3.730804 (took 1905.344666 ms)
[llm.c]$

Not running under X shaves another 100ms off, by the looks of it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU: llm.c vs Pytorch Benchmarks #253

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

CPU: llm.c vs Pytorch Benchmarks #253

dagelf Apr 25, 2024

Replies: 3 comments

karpathy Apr 26, 2024 Maintainer

dagelf May 21, 2024 Author

dagelf Jun 6, 2024 Author

dagelf
Apr 25, 2024

karpathy
Apr 26, 2024
Maintainer

dagelf
May 21, 2024
Author

dagelf
Jun 6, 2024
Author