Skip to content

Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models. #6468

Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models.

Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models. #6468