-
Notifications
You must be signed in to change notification settings - Fork 48
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Llama] Use rocm ukernel when available + use num_layer for pkv. (#381)
Use ukernel to improve perf + fix #380. Additionally, added fix to stateless llama to handle non 32 size layer. Seems like currently our PKV value is based on number of attention head. This currently work because number of attn head happens to be number of layer for many models we are looking at. But once that assumption breaks, we will run into some issues with stateless llama. This PR also introduces fix for this minor bug.
- Loading branch information
1 parent
c1dc94c
commit da57fe3
Showing
2 changed files
with
41 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.