From 449bce5befee2cc853314bbf6668a4c03acd34e1 Mon Sep 17 00:00:00 2001 From: jlzhou Date: Mon, 10 Nov 2025 00:18:23 +0800 Subject: [PATCH] docs: fix symbols --- docs/source/algo/block_diffusion.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/source/algo/block_diffusion.rst b/docs/source/algo/block_diffusion.rst index f5da772..495a384 100644 --- a/docs/source/algo/block_diffusion.rst +++ b/docs/source/algo/block_diffusion.rst @@ -83,17 +83,17 @@ To train the model efficiently, Block Diffusion employs a clever unified attenti :align: center :width: 80% - The Block Diffusion Training Attention Mask (for `block_size=2`) + The Block Diffusion Training Attention Mask (for ``block_size=2``) This specialized mask consists of three distinct components that control the attention patterns: -* **Block Diagonal Mask (:math:`M_{BD}`)**: +* **Block Diagonal Mask** :math:`\mathcal{M}_{BD}`: Allows each token in the noised block :math:`\boldsymbol{x}_t` to attend only to other tokens *within the same block*. This constitutes the intra-block self-attention for the denoising task. -* **Offset Block Causal Mask (:math:`M_{OBC}`)**: +* **Offset Block Causal Mask** :math:`\mathcal{M}_{OBC}`: Allows tokens in a noised block in :math:`\boldsymbol{x}_t` to attend to all preceding *clean* blocks in :math:`\boldsymbol{x}_0`. This provides the essential conditional context required for denoising. -* **Block Causal Mask (:math:`M_{BC}`)**: +* **Block Causal Mask** :math:`\mathcal{M}_{BC}`: Applies a standard causal mask to the clean sequence :math:`\boldsymbol{x}_0`, ensuring each token can only attend to itself and preceding tokens. This part is responsible for computing the KV-cache. Helper Function to Create Block Diffusion Mask