feat(bench): Add features and fix some bugs for pipeline flashattention. #31

KuangjuX · 2025-01-06T08:30:34Z

No description provided.

KuangjuX · 2025-01-06T08:41:28Z

The pseudocode for FlashAttention-2 can be represented in the following form:

Iterate (k, v) in K, V :
  qk = dot(q, k) (1)
  mij = max(max(qk), lsei) (2)
  p = exp(qk − mij ) (3)
  lij = sum(p) (4)
  // renormalize o
  acc_o_scale = exp(mi − mij ) (5)
  acc_o = acc_o_scale ∗ acc_o (6)
  acc_o = acc_o + dot(p, v) (7)
  // update statistics
  mi = mij (8)
  li_new = exp (lsei − mij ) + lij (9)
  lsei = mij + log(li_new) (10)
// o_scale is the denominator of the softmax function
o_scale = exp(mi − lsei) (11)
acc_o = acc_o ∗ o_scale (12)

In the implementation of FractalTensor, some implementation details seem to differ, and I am not sure if this is compliant with the specifications. Below, I will point them out one by one.

Can you help me check these issues? @lcy-seso

KuangjuX · 2025-01-06T08:48:37Z

benchmarks/cpp/flashattention/cutlass_fa.cuh

+
+        // Compute `lse_i = m_ij + log(l_i_new)`.
+        for (int ax0 = 0; ax0 < size<0>(m_new); ++ax0) {
+            m_new(ax0) = m_new(ax0) * softmax_scale + log(lse_new(ax0));


In FractalTensor, the update of LSE is performed outside the loop(https://github.com/microsoft/FractalTensor/blob/artifact/artifacts/FractalTensor/benchmarks/multi-head_attention/fractaltensor/kernel.h#L278), whereas in the pseudocode, the update of LSE is done inside the loop.

KuangjuX · 2025-01-06T08:49:16Z

benchmarks/cpp/flashattention/cutlass_fa.cuh

+        // float scale = 1 / lse_new(ax0);
+        float o_scale = exp(m_new(ax0) - lse_new(ax0));
+        // TODO(KuangjuX): Move this code into loop?
+        // lse_new(ax0) = m_new(ax0) * softmax_scale + log(lse_new(ax0));


I moved the update of LSE inside the loop.

KuangjuX · 2025-01-06T08:52:07Z

benchmarks/cpp/flashattention/cutlass_fa.cuh

+        // TODO(KuangjuX): fix the following code? -> `o_scale = exp(m_i -
+        // lse_i)`.
+
+        // float scale = 1 / lse_new(ax0);


In the pseudocode, o_scale = exp(mi − lsei) is used to rescale the final result, while in FractalTensor, the operation is performed using 1 / lse_new(ax0)(https://github.com/microsoft/FractalTensor/blob/artifact/artifacts/FractalTensor/benchmarks/multi-head_attention/fractaltensor/kernel.h#L277).

I would like to give quick answers to your two questions and can provide detailed explanations for the reasons. First of all, I have carefully derived it, and the pseudocode is correct. I believe you can follow it. The normalization factor is outside the loop intentionally to reduce computational complexity.

KuangjuX added 5 commits January 6, 2025 01:23

Add some comments.

3341d2c

Normalize the attention block.

7000e7a

Add template testcase.

f69a4ec

fix some code.

5b072ee

Add case for kK != kTK.

338051a

KuangjuX marked this pull request as draft January 6, 2025 08:30

fix codespell error.

50900bc

KuangjuX commented Jan 6, 2025

View reviewed changes

KuangjuX changed the title ~~feat(bench): Add feature and fix some bugs for pipeline flashattention.~~ feat(bench): Add features and fix some bugs for pipeline flashattention. Jan 6, 2025

Add debug flag

81d70b3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bench): Add features and fix some bugs for pipeline flashattention. #31

feat(bench): Add features and fix some bugs for pipeline flashattention. #31

KuangjuX commented Jan 6, 2025

KuangjuX commented Jan 6, 2025 •

edited

Loading

KuangjuX Jan 6, 2025

KuangjuX Jan 6, 2025

KuangjuX Jan 6, 2025

haruhi55 Jan 7, 2025

feat(bench): Add features and fix some bugs for pipeline flashattention. #31

Are you sure you want to change the base?

feat(bench): Add features and fix some bugs for pipeline flashattention. #31

Conversation

KuangjuX commented Jan 6, 2025

KuangjuX commented Jan 6, 2025 • edited Loading

KuangjuX Jan 6, 2025

Choose a reason for hiding this comment

KuangjuX Jan 6, 2025

Choose a reason for hiding this comment

KuangjuX Jan 6, 2025

Choose a reason for hiding this comment

haruhi55 Jan 7, 2025

Choose a reason for hiding this comment

KuangjuX commented Jan 6, 2025 •

edited

Loading