Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransformerEngine's FP8 LayerNorm support #658

Closed
tfogal opened this issue Jun 26, 2024 · 6 comments · Fixed by #689
Closed

TransformerEngine's FP8 LayerNorm support #658

tfogal opened this issue Jun 26, 2024 · 6 comments · Fixed by #689
Assignees
Labels
enhancement New feature or request mixology Issues that the mixology team has surfaced

Comments

@tfogal
Copy link
Collaborator

tfogal commented Jun 26, 2024

🚀 Feature

120 Mixology runs are failing due to:

raise ValueError("LayerNorm is currently not supported by Thunder!")

Additional context

set -e
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
export NCCL_BLOCKING_WAIT=1
export TORCH_NCCL_BLOCKING_WAIT=1
export NCCL_ASYNC_ERROR_HANDLING=1
export TORCH_NCCL_ASYNC_ERROR_HANDLING=1
export NCCL_TIMEOUT=600
python -m mixology_logs.execution.main \
--nsys.enable True \
--nsys.output_path /jet/assets/recipe/-gemma-7b_-lit-gpt-pjnl_-perf-train_--eos-dgx-h100-_-bfloat16_-1_-8_--1_-train_-false_--_-thunder-cudnn_-ddp_-fp8-delayed-te_-none_-s_-lit-gpt/nsys_report \
--nsys.new_kwargs '{""--nsys_enabled"": ""True"", ""--output_dir"": ""/tmp""}' \
'{""--micro_batch_size"": ""exp_range(0, 10)""}' \
""python thunder/benchmarks/benchmark_litgpt.py \
    --max_iters 20 \
    --warmup_iters 5 \
    --output_dir /jet/logs/recipe/-gemma-7b_-lit-gpt-pjnl_-perf-train_--eos-dgx-h100-_-bfloat16_-1_-8_--1_-train_-false_--_-thunder-cudnn_-ddp_-fp8-delayed-te_-none_-s_-lit-gpt \
    --model_name Gemma-7b \
    --distributed_mode ddp \
    --shard_mode None \
    --compile thunder_cudnn \
    --checkpoint_activations False \
    --low_precision_mode fp8-delayed-te""
@tfogal tfogal added enhancement New feature or request mixology Issues that the mixology team has surfaced labels Jun 26, 2024
@t-vi
Copy link
Collaborator

t-vi commented Jun 26, 2024

But what's the traceback? I don't see that error being raised anywhere in the thunder repo...

@tfogal tfogal self-assigned this Jun 26, 2024
@tfogal
Copy link
Collaborator Author

tfogal commented Jun 26, 2024

I don't see that error being raised anywhere in the thunder repo...

Yes, ditto. We think this is coming from the mixology scripts.

There's an internal thread with @wprazuch; stay tuned, we'll report back here.
I can't seem to assign to @wprazuch (?) so assigning to me temporarily instead.

@wprazuch
Copy link
Contributor

Thanks @tfogal for notifying!
Yes, that is the functionality we introduced internally in our fork, since there was request to benchmark additionally FP8 TransformerEngine for lit-gpt, and we added that functionality. I think right now this issue is not relevant for the main repository.

But because we are speaking about this right now - I could create a PR for adding this functionality for the main repo, if you are interested in tracking and benchmarking FP8 as well. I wanted to do that some time ago, but due to other tasks I de-prioritized it. Let me know what you think about this.

@IvanYashchuk IvanYashchuk changed the title LayerNorm support TransformerEngine's FP8 LayerNorm support Jun 27, 2024
@tfogal
Copy link
Collaborator Author

tfogal commented Jun 27, 2024

I could create a PR for adding this functionality for the main repo, if you are interested in tracking and benchmarking FP8 as well. I wanted to do that some time ago, but due to other tasks I de-prioritized it. Let me know what you think about this.

That would be great! Yes, I do think we should be tracking FP8 perf over time.

@wprazuch
Copy link
Contributor

I will prepare required changes then

@tfogal
Copy link
Collaborator Author

tfogal commented Jul 1, 2024

triage review:

  • make sure to have both paths available
  • patch is on the right track and should resolve this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request mixology Issues that the mixology team has surfaced
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants