Enable fp8 on sm89 #3624

jjsjann123 · 2024-12-19T22:37:35Z

fp8's supported has been lifted to sm89 since PTX ISA 8.1 and later per https://docs.nvidia.com/cuda/parallel-thread-execution/

jjsjann123 · 2024-12-20T00:29:04Z

!test

crcrpar

thank you! now I can see the same traces as in Lightning-AI/lightning-thunder#1551 on my environment with RTX6000 Ada, with a diff in thunder

jacobhinkle · 2024-12-20T17:10:44Z

fp8's supported has been lifted to sm89 since PTX ISA 8.1 and later per https://docs.nvidia.com/cuda/parallel-thread-execution/

Does that technically mean we only support CUDA 12+ for this feature?

jjsjann123 · 2024-12-20T17:58:38Z

fp8's supported has been lifted to sm89 since PTX ISA 8.1 and later per https://docs.nvidia.com/cuda/parallel-thread-execution/

Does that technically mean we only support CUDA 12+ for this feature?

good call. I think I should conditionally relax this one, depending on the build time CUDA version.

jjsjann123 · 2024-12-20T18:33:27Z

!test

jjsjann123 · 2024-12-20T18:38:21Z

!test

Fixing a version check for fp8 support. bump nvfuser version for PR #3624, Framework integration needs to guard against versions in order to decide whether to send fp8 operations to nvfuser

naoyam · 2024-12-31T04:40:46Z

@jjsjann123 I'm seeing an error on RTX 6000 (sm_89):

[ RUN      ] NVFuserTest.FusionFp8CastOps_CUDA
unknown file: Failure
C++ exception with description " INTERNAL ASSERT FAILED at "/home/nmaruyama/nvfuser/debug3/csrc/runtime/executor_utils.cpp":859, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues.
__global__ void nvfuser_none_f0_c0_r0_g0(Tensor<__bfloat, 2, 2> T0, Tensor<__bfloat, 2, 2> T2) {
  __e4m3 T1[1LL];
  T1[0LL]
     = __bfloat2e4m3(T0[((T0.alloc_stride[1LL] * ((nvfuser_index_t)threadIdx.x)) + (T0.alloc_stride[0LL] * ((nvfuser_index_t)blockIdx.x)))]);
  T2[(((nvfuser_index_t)threadIdx.x) + (T0.logical_size[1LL] * ((nvfuser_index_t)blockIdx.x)))]
     = __e4m32bfloat(T1[0LL]);
}
}

CUDA NVRTC compile error: ptxas application ptx input, line 47; error   : Feature 'cvt with .f16.bf16' requires .target sm_90 or higher
ptxas application ptx input, line 58; error   : Feature 'cvt with .bf16.f16' requires .target sm_90 or higher
ptxas fatal   : Ptx assembly aborted due to errors

Exception raised from invoke at /home/nmaruyama/nvfuser/debug3/csrc/runtime/executor_utils.cpp:859 (most recent call first):

jjsjann123 · 2024-12-31T05:20:06Z

Did I get the cuda TK check wrong?! I thought CUDA TK version would determine PTX ISA version...

Are you running this in a container? I'm curious how the setup is like.
sm_89 should have fp8 support since cuda 12.1.

naoyam · 2024-12-31T05:21:04Z

This is on my own container with 12.6.

jjsjann123 · 2024-12-31T05:22:41Z

wait, it's not complaining about fp8 though..

cvt with .bf16.f16

jjsjann123 · 2024-12-31T05:26:50Z

looks like cvt to/from bf16 does require sm_90. https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt

I wonder why our check is only requiring sm_80.

Fuser/csrc/device_lower/analysis/device_version.cpp

Lines 17 to 21 in 6466834

    
           if (val->dtype() == DataType::BFloat16) { 
        
             ensureVersion( 
        
                 {8, 0}, 
        
                 "Fusion contains BFloat16 values which was introduced in Ampere (8.0)"); 
        
           }

Looks like this is just a test thing. I'll update that along with the checks. Thanks for raising the issue @naoyam

jjsjann123 · 2024-12-31T05:28:10Z

.relu modifier and {.f16x2, .bf16, .bf16x2, .tf32} destination formats require sm_80 or higher.
cvt.bf16.{u8/s8/u16/s16/u32/s32/u64/s64/f16/f64/bf16}, cvt.{u8/s8/u16/s16/u32/s32/u64/s64/f16/f64}.bf16, and cvt.tf32.f32.{relu}.{rn/rz} require sm_90 or higher.

smoke test

ee3060b

jjsjann123 requested review from crcrpar and xwang233 December 19, 2024 22:37

wujingyue approved these changes Dec 20, 2024

View reviewed changes

crcrpar mentioned this pull request Dec 20, 2024

[nvfuser executor] Allow sm89 for fp8 types Lightning-AI/lightning-thunder#1576

Draft

crcrpar approved these changes Dec 20, 2024

View reviewed changes

fixing check and tests

39b7773

jjsjann123 and others added 2 commits December 20, 2024 10:36

why did we print that we skip earlier?

44e949b

Merge branch 'main' into fp8_enable_on_sm89

48e620f

jjsjann123 merged commit 410e48f into main Dec 21, 2024
48 checks passed

jjsjann123 deleted the fp8_enable_on_sm89 branch December 21, 2024 13:58

jjsjann123 mentioned this pull request Dec 23, 2024

version bump for updated fp8 support #3638

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable fp8 on sm89 #3624

Enable fp8 on sm89 #3624

jjsjann123 commented Dec 19, 2024

jjsjann123 commented Dec 20, 2024

crcrpar left a comment

jacobhinkle commented Dec 20, 2024

jjsjann123 commented Dec 20, 2024

jjsjann123 commented Dec 20, 2024

jjsjann123 commented Dec 20, 2024

naoyam commented Dec 31, 2024

jjsjann123 commented Dec 31, 2024

naoyam commented Dec 31, 2024

jjsjann123 commented Dec 31, 2024

jjsjann123 commented Dec 31, 2024

jjsjann123 commented Dec 31, 2024

Enable fp8 on sm89 #3624

Enable fp8 on sm89 #3624

Conversation

jjsjann123 commented Dec 19, 2024

jjsjann123 commented Dec 20, 2024

crcrpar left a comment

Choose a reason for hiding this comment

jacobhinkle commented Dec 20, 2024

jjsjann123 commented Dec 20, 2024

jjsjann123 commented Dec 20, 2024

jjsjann123 commented Dec 20, 2024

naoyam commented Dec 31, 2024

jjsjann123 commented Dec 31, 2024

naoyam commented Dec 31, 2024

jjsjann123 commented Dec 31, 2024

jjsjann123 commented Dec 31, 2024

jjsjann123 commented Dec 31, 2024