Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models. #6483
Annotations
1 error and 1 warning
|
windows-latest (windows) Release
Process completed with exit code 1.
|
|
bazel
Failed to restore: Cache service responded with 400
|
Artifacts
Produced during runtime
| Name | Size | Digest | |
|---|---|---|---|
|
gemma-macos-latest-make-Release
|
2.05 MB |
sha256:c1e21eee6688984a4d8f5793744cff1eda90aac191b1cbdec776844b39f048ae
|
|
|
gemma-ubuntu-latest-make-Release
|
5.87 MB |
sha256:336bf0c35479bd2c17ed1322cd48f76e34905ba2eb57fcba36fe451e3a1f65b9
|
|