Curious about the torch.compile baseline in benchmark_litgpt #123

mitkotak · 2024-04-03T11:44:39Z

Was curious to know why the reduce-overhead option was not used as a baseline and if there's a comparison somewhere using reduce-overhead ?

Thanks !

cc @apaz-cli

The text was updated successfully, but these errors were encountered:

carmocca · 2024-04-03T14:47:32Z

It should be because we are also not enabling CUDAGraphs with Thunder too ("reduce-overhead" simply enables CUDAGraphs). So this way is more of an apples-to-apples comparison. Maybe @parthmannan can correct me here

Either way, from my personal experience in running this, cudagraphs won't help during training (especially if your model is large enough) because you are not overhead bound at all

mitkotak · 2024-04-03T17:51:14Z

Thanks for the reply ! How about max-autotune-no-cudagraphs ?

carmocca · 2024-04-03T18:10:21Z

Sure, it could be used. Anecdotically, I found it to provide tiny speedups (if any) for a whole lot of compilation time, but it's a valid suggestion.

mitkotak added enhancement New feature or request help wanted Extra attention is needed labels Apr 3, 2024

carmocca added question Further information is requested and removed enhancement New feature or request help wanted Extra attention is needed labels Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Curious about the torch.compile baseline in benchmark_litgpt #123

Curious about the torch.compile baseline in benchmark_litgpt #123

mitkotak commented Apr 3, 2024 •

edited

Loading

carmocca commented Apr 3, 2024

mitkotak commented Apr 3, 2024

carmocca commented Apr 3, 2024

Curious about the torch.compile baseline in benchmark_litgpt #123

Curious about the torch.compile baseline in benchmark_litgpt #123

Comments

mitkotak commented Apr 3, 2024 • edited Loading

carmocca commented Apr 3, 2024

mitkotak commented Apr 3, 2024

carmocca commented Apr 3, 2024

mitkotak commented Apr 3, 2024 •

edited

Loading