Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThunderFX is slower than torch.compile for vicuna-33b-v1.3, Platypus-30B and falcon-40b with FSDP & zero3. #1366

Open
mpatel31415 opened this issue Oct 30, 2024 · 1 comment
Assignees
Labels
mixology Issues that the mixology team has surfaced

Comments

@mpatel31415
Copy link
Contributor

🐛 Bug

image

To Reproduce

Steps to reproduce the behavior:

Please use:
2 node(s), each with 8 GPUs.
Image "INTERNAL_IMAGE:20241025"
Training script:

python /opt/pytorch/lightning-thunder/thunder/benchmarks/benchmark_litgpt.py \
    --model_name vicuna-33b-v1.3 \
    --distributed_mode fsdp \
    --shard_mode zero3 \
    --compile dynamo_thunder \
    --checkpoint_activations False \
    --low_precision_mode none  \
    --micro_batch_size 2

Environment

system.device_product_name DGXH100
system.gpu_driver_version 535.129.03
libraries.cuda 12.6.2.004
libraries.pip.lightning 2.4.0.dev20240728
libraries.pip.lightning-thunder 0.2.0.dev0
libraries.pip.lightning-utilities 0.11.8
libraries.pip.litgpt 0.4.11
libraries.pip.nvfuser 0.2.20+git85c22a2
libraries.pip.pytorch-lightning 2.4.0
libraries.pip.torch 2.6.0a0+git96b30dc
libraries.pip.torchmetrics 1.5.1
libraries.pip.torchvision 0.19.0a0+d23a6e1

@nvMelissa nvMelissa added the mixology Issues that the mixology team has surfaced label Nov 4, 2024
@mpatel31415
Copy link
Contributor Author

Recent comparison to torch.compile. There are some new models:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mixology Issues that the mixology team has surfaced
Projects
None yet
Development

No branches or pull requests

3 participants