Skip to content

Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models. #6481

Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models.

Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models. #6481