Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEST ONLY]Paged attn #1025

Closed
wants to merge 32 commits into from
Closed

Conversation

jiqing-feng
Copy link
Collaborator

@jiqing-feng jiqing-feng commented Nov 25, 2024

### FOR CI TESTS ONLY. Please not review this PR

sywangyi and others added 21 commits October 8, 2024 22:57
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* refine class IPEXPagedCache's update method

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* replace tensor on xpu to List to avoid memory copy

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* split IPEXPagedCache's update function into `update_for_prefill` and `update_for_decode`

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
* enable qkv

* split key value into 2 lists
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
huggingface#979)

* enable gpt2, falcon has core dump error in PagedAttention.single_query_cached_kv_attention

* enable new_decoder_arch falcon

* only keep 1 config

* rm autocast
…ace#992)

* fix bug when run IPEXCausalModel forward directly; fix bug when using `save_pretrain`

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* add LinearGelu Op support for XPU

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* fix unit test error

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* adjust unit test case

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* fix bug

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
…gingface#998)

* skip assited decoding unit test for models using paged attention

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* XPU CI tests get almost all passed

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix ci config

* fix test versions

* fix ipex version

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* use python3.9 test

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* change ipex transformers limited verison in setup
* fix inc tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
* fix bert and vit patch
* fix vit and bert save


Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

IlyasMoutawwakil and others added 2 commits November 25, 2024 10:21
* fix reorder cache for non-patch models

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* disable torch < 2.3 tests, we won't use torch < 2.4

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix test beam serach

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix cache selection

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* upgrad to transformers4.46

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* change ipex test yaml transformers version to 4.46

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* set device as the same as origin model
* fix device

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
jiqing-feng and others added 3 commits November 26, 2024 16:37
* simplify forward and save pretrained since no jit support

* fix format

* rm warmup because no jit mode anymore

* simplify forward for causal lm model

* fix paged pkv  forward

* disable use_cache when just run forward

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
* nice code
* device type adjustment

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
@jiqing-feng jiqing-feng force-pushed the paged_attn branch 2 times, most recently from 9938a52 to 2902247 Compare November 27, 2024 03:21
* enable compile for non-generation tasks
* add no_grad in forward
* warmup compiled model
* disable compile not ready models
* set system level optimize for torch.compile
* fix typo
* add comments
* set torch minimum version for compiling

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
sywangyi and others added 2 commits December 2, 2024 10:11
* fix readme and push to hub support

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* rm export in tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* test with torch 2.5.*

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix tests
* fix typo
* add patched tests

* change forward to generate

* fix tests

* fix test model name


---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix forward without pkv
* patch gpt2 block forward
* fix typo
* revert causal lm tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants