Is it possible to use the Layer Normalization optimization out of the transformer layer? #1335

bpmsilva · 2021-09-01T11:42:32Z

bpmsilva
Sep 1, 2021

Hello DeepSpeed Team,

I see you use two invertible operations for single GPU optimizations. One is related to the softmax, accessible through deepspeed.ops.sparse_attention.Softmax. The other is related to Layer Normalization. However, it seems there is no way to access the latter optimization. I want to use DeepSpeed Layer Normalization optimization in a project I am working on. Would that be possible? The code seems to be in a CUDA file (DeepSpeed/csrc/transformer/normalize_kernels.cu) and not in Python.

I really appreciate any help you can provide.

RezaYazdaniAminabadi · 2021-09-01T20:03:19Z

RezaYazdaniAminabadi
Sep 1, 2021

Hi @bpmsilva

Thanks for the interest in accessing these kernels. You are right we have some of the fine-grained modules accessible on PyTorch in python land. We can certainly bring both of the invertible functions you mentioned as separate individual modules. For that purpose, we need to just create the higher level APIs both on the C++ and PyTorch levels. I can help with designing such APIs to give proper access to such kernels, and if possible I will need your help to verify them and adding some unit tests for them.

Thanks,
Reza

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to use the Layer Normalization optimization out of the transformer layer? #1335

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Is it possible to use the Layer Normalization optimization out of the transformer layer? #1335

bpmsilva Sep 1, 2021

Replies: 1 comment

RezaYazdaniAminabadi Sep 1, 2021

bpmsilva
Sep 1, 2021

RezaYazdaniAminabadi
Sep 1, 2021