build

Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models. #6468

Job	Run time
bazel	1m 47s
ubuntu-latest (make) Release	2m 9s
macos-latest (make) Release	1m 59s
windows-latest (windows) Release	9m 28s
	15m 23s

Provide feedback