[TEST ONLY]Paged attn #1025

jiqing-feng · 2024-11-25T06:59:37Z

### FOR CI TESTS ONLY. Please not review this PR

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* refine class IPEXPagedCache's update method Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * replace tensor on xpu to List to avoid memory copy Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * split IPEXPagedCache's update function into `update_for_prefill` and `update_for_decode` Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> --------- Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* enable qkv * split key value into 2 lists

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

huggingface#979) * enable gpt2, falcon has core dump error in PagedAttention.single_query_cached_kv_attention * enable new_decoder_arch falcon * only keep 1 config * rm autocast

…ace#992) * fix bug when run IPEXCausalModel forward directly; fix bug when using `save_pretrain` Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * add LinearGelu Op support for XPU Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * fix unit test error Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * adjust unit test case Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * fix bug Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> --------- Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

…gingface#998) * skip assited decoding unit test for models using paged attention Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * XPU CI tests get almost all passed Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> --------- Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix ci config * fix test versions * fix ipex version Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* use python3.9 test Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* change ipex transformers limited verison in setup * fix inc tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* fix bert and vit patch * fix vit and bert save Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

HuggingFaceDocBuilderDev · 2024-11-25T07:04:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* fix reorder cache for non-patch models Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * disable torch < 2.3 tests, we won't use torch < 2.4 Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix test beam serach Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix cache selection Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * upgrad to transformers4.46 Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * change ipex test yaml transformers version to 4.46 Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* set device as the same as origin model * fix device Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* simplify forward and save pretrained since no jit support * fix format * rm warmup because no jit mode anymore * simplify forward for causal lm model * fix paged pkv forward * disable use_cache when just run forward --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* nice code * device type adjustment Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* enable compile for non-generation tasks * add no_grad in forward * warmup compiled model * disable compile not ready models * set system level optimize for torch.compile * fix typo * add comments * set torch minimum version for compiling Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix readme and push to hub support Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * rm export in tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * test with torch 2.5.* Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix tests * fix typo * add patched tests * change forward to generate * fix tests * fix test model name --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix forward without pkv * patch gpt2 block forward * fix typo * revert causal lm tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

sywangyi and others added 21 commits October 8, 2024 22:57

add page attention implementation remove jit logic

1c35c4f

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

add support in transformers 4.45

973e034

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

fix congif (huggingface#935)

8b574d0

move patch model to init

541a236

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

fix bug when doing beam search (huggingface#954)

80e8071

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

enable qkv concat layer (huggingface#958)

184faea

* enable qkv * split key value into 2 lists

add xpu cache optimiztion

b341db6

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

xpu mlp optimization

34ce74d

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

optimize cache ops in xpu, improve for beam search

45130c9

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

enable gpt2, falcon has core dump error in PagedAttention.single_quer… (

74eec8b

huggingface#979) * enable gpt2, falcon has core dump error in PagedAttention.single_query_cached_kv_attention * enable new_decoder_arch falcon * only keep 1 config * rm autocast

Merge branch 'main' into paged_attn

459c78c

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

fix ci config (huggingface#1010)

1ab0233

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Fix tests versions (huggingface#1011)

b0cd5db

* fix ci config * fix test versions * fix ipex version Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix torch test version (huggingface#1012)

e31e6d4

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

use python3.9 test (huggingface#1013)

ed35ffc

* use python3.9 test Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

change ipex transformers limited verison in setup (huggingface#1015)

a5c48a8

* change ipex transformers limited verison in setup * fix inc tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

add XPU LinearAddAdd op (huggingface#1017)

388265f

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

fix bert and vit patch (huggingface#1022)

ad9b795

* fix bert and vit patch * fix vit and bert save Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

IlyasMoutawwakil and others added 2 commits November 25, 2024 10:21

Merge branch 'main' into paged_attn

0d7f8b6

jiqing-feng force-pushed the paged_attn branch from b020125 to b48192b Compare November 26, 2024 02:26

set device as the same as origin model (huggingface#1031)

8a8e7e3

* set device as the same as origin model * fix device Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng force-pushed the paged_attn branch from f0ba16a to 8a8e7e3 Compare November 26, 2024 06:39

jiqing-feng and others added 3 commits November 26, 2024 16:37

nice code (huggingface#1035)

51030e5

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

Paged attn (huggingface#1036)

587837e

* nice code * device type adjustment Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

jiqing-feng force-pushed the paged_attn branch 2 times, most recently from 9938a52 to 2902247 Compare November 27, 2024 03:21

jiqing-feng force-pushed the paged_attn branch from 7717572 to 6ddf93e Compare December 2, 2024 01:34

sywangyi and others added 2 commits December 2, 2024 10:11

Merge branch 'main' into paged_attn

52f8d32

jiqing-feng force-pushed the paged_attn branch from 116d0ae to 4737459 Compare December 3, 2024 02:58

Fix tests (huggingface#1047)

b84274c

* fix tests * fix typo * add patched tests * change forward to generate * fix tests * fix test model name --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng force-pushed the paged_attn branch from e33b7b4 to b84274c Compare December 4, 2024 09:23

Patch gpt2 block forward for passing input_lens. (huggingface#1050)

d8251d1

* fix forward without pkv * patch gpt2 block forward * fix typo * revert causal lm tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng force-pushed the paged_attn branch from 9d0d9f2 to d8251d1 Compare December 5, 2024 03:01

jiqing-feng closed this Dec 9, 2024

jiqing-feng deleted the paged_attn branch December 9, 2024 02:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TEST ONLY]Paged attn #1025

[TEST ONLY]Paged attn #1025

jiqing-feng commented Nov 25, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 25, 2024

[TEST ONLY]Paged attn #1025

[TEST ONLY]Paged attn #1025

Conversation

jiqing-feng commented Nov 25, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Nov 25, 2024

jiqing-feng commented Nov 25, 2024 •

edited

Loading