Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipex 2.3 released #725

Merged
merged 50 commits into from
Jun 6, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
5351f4a
ipex 2.3 released
jiqing-feng May 23, 2024
1f98d6d
skip tests
jiqing-feng May 27, 2024
b2b93bb
skip testing without pkv
jiqing-feng May 27, 2024
64dcde4
add tests skip
jiqing-feng May 27, 2024
945f6b6
only llama2 with at least 64 head size support IAKV
jiqing-feng May 27, 2024
c8922f3
cannot assert same outputs cause do_sample=True
jiqing-feng May 27, 2024
2ddfa7a
rm tiny-llama model testing cause it not work for IAKV
jiqing-feng May 27, 2024
f4e887d
fix code style
jiqing-feng May 28, 2024
d96ea58
fix style
jiqing-feng May 28, 2024
ec24d5a
rm tiny llama on test pipeline
jiqing-feng May 28, 2024
871de7b
fix tests
jiqing-feng May 30, 2024
d0c8951
support use_cache=False
jiqing-feng May 30, 2024
537f0aa
rm use_cache in model_kwargs
jiqing-feng May 30, 2024
5a71790
set use_cache
jiqing-feng May 30, 2024
bde814e
Update optimum/intel/ipex/modeling_base.py
jiqing-feng May 31, 2024
4a81ea9
fix spelling error
jiqing-feng May 31, 2024
3a61e84
fix style
jiqing-feng May 31, 2024
fd69407
add transformers version warning
jiqing-feng May 31, 2024
1032a26
add compare resultes
jiqing-feng May 31, 2024
c8e7969
add warning
jiqing-feng May 31, 2024
afdc8d7
set pad_token_id
jiqing-feng May 31, 2024
1d1df34
limited transformers
jiqing-feng Jun 3, 2024
aaaa4c3
fix transformers version
jiqing-feng Jun 3, 2024
f6b8010
update transformers version
jiqing-feng Jun 4, 2024
51e47b6
fix version
jiqing-feng Jun 4, 2024
5204b24
temporary fix for multi-query model
jiqing-feng Jun 4, 2024
8f2f025
fix code styke
jiqing-feng Jun 4, 2024
8dc5ad5
add transformers version tests
jiqing-feng Jun 4, 2024
e482e58
Update .github/workflows/test_ipex.yml
jiqing-feng Jun 5, 2024
d366b80
check geenration method
jiqing-feng Jun 5, 2024
3948cad
Update optimum/intel/ipex/modeling_base.py
jiqing-feng Jun 5, 2024
d1b63ef
fix use_cache
jiqing-feng Jun 5, 2024
ea4d3e2
add hidden size limitation for patch
jiqing-feng Jun 5, 2024
bcb2b5a
add llama in tests
jiqing-feng Jun 5, 2024
f5f1af8
add re-load tests
jiqing-feng Jun 5, 2024
c08c957
fix hidden size check
jiqing-feng Jun 5, 2024
51e6f3d
rm norm config
jiqing-feng Jun 5, 2024
d06123b
add version variable
jiqing-feng Jun 5, 2024
641e8f9
fix import
jiqing-feng Jun 5, 2024
50c1059
rm useless logger
jiqing-feng Jun 5, 2024
a961746
rm useless logging
jiqing-feng Jun 5, 2024
c2253a8
fix last round review
jiqing-feng Jun 6, 2024
e29ea58
Merge branch 'huggingface:main' into rename
jiqing-feng Jun 6, 2024
caa27c3
Update .github/workflows/test_ipex.yml
echarlaix Jun 6, 2024
78498ab
Update optimum/intel/ipex/modeling_base.py
echarlaix Jun 6, 2024
97f7876
Update optimum/intel/ipex/modeling_base.py
echarlaix Jun 6, 2024
cf3525a
Update setup.py
echarlaix Jun 6, 2024
8ba602d
Update optimum/exporters/ipex/modeling_utils.py
echarlaix Jun 6, 2024
f15a1f5
fix
jiqing-feng Jun 6, 2024
36ae751
limit the new tokens of assisted decoding tests
jiqing-feng Jun 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions optimum/exporters/ipex/model_patcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,18 +62,18 @@ def patch_op(m, target_m, new_op_name, new_op):


def _patch_llama_model(model):
if is_ipex_version("<", "2.5.0"):
raise ImportError("Only ipex version > 2.3.0 supports RotaryEmbedding and IndirectAccessKVCache")
if is_ipex_version("<", "2.3.0"):
echarlaix marked this conversation as resolved.
Show resolved Hide resolved
raise ImportError("Only ipex version >= 2.3.0 supports RotaryEmbedding and IndirectAccessKVCacheAttention")

from intel_extension_for_pytorch.llm.modules import IndirectAccessKVCache, RotaryEmbedding
from intel_extension_for_pytorch.llm.modules import IndirectAccessKVCacheAttention, RotaryEmbedding

ipex_rope = RotaryEmbedding(
model.config.max_position_embeddings,
model.config.hidden_size // model.config.num_attention_heads,
model.config.rope_theta,
model.config.architectures[0],
)
ipex_scale_dot_product = IndirectAccessKVCache(text_max_length=model.config.max_position_embeddings)
ipex_scale_dot_product = IndirectAccessKVCacheAttention(text_max_length=model.config.max_position_embeddings)
patch_op(model, LlamaAttention, "ipex_rope", ipex_rope)
patch_op(model, LlamaAttention, "ipex_scale_dot_product", ipex_scale_dot_product)

Expand Down
17 changes: 10 additions & 7 deletions optimum/exporters/ipex/modeling_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ def _llama_model_forward(
# Adapted from https://github.com/huggingface/transformers/blob/v4.38.2/src/transformers/models/llama/modeling_llama.py#L694
class _IPEXLlamaDecoderLayerRef(nn.Module):
def __init__(self, module, config, distributed=False):
if is_ipex_version("<", "2.5.0"):
if is_ipex_version("<", "2.3.0"):
raise ImportError("Only ipex version > 2.3.0 supports Linear2SiluMul and LinearAdd")

from intel_extension_for_pytorch.llm.modules import Linear2SiluMul, LinearAdd
Expand Down Expand Up @@ -278,7 +278,7 @@ def forward(
output_attentions=output_attentions,
use_cache=use_cache,
)
if not self.distributed:
if hasattr(self, "mha_linear_add"):
hidden_states = self.mha_linear_add(hidden_states, residual)
else:
hidden_states = self.self_attn.o_proj(hidden_states)
Expand All @@ -288,12 +288,15 @@ def forward(
residual = hidden_states
hidden_states = self.post_attention_layernorm(hidden_states)

mlp_gate = self.linear_silu_mul(hidden_states)

if not self.distributed:
hidden_states = self.mlp_linear_add(mlp_gate, residual)
if hasattr(self, "linear_silu_mul"):
mlp_gate = self.linear_silu_mul(hidden_states)
if hasattr(self, "mlp_linear_add"):
hidden_states = self.mlp_linear_add(mlp_gate, residual)
else:
hidden_states = self.mlp.down_proj(mlp_gate)
hidden_states = residual + hidden_states
jiqing-feng marked this conversation as resolved.
Show resolved Hide resolved
else:
hidden_states = self.mlp.down_proj(mlp_gate)
hidden_states = self.mlp(hidden_states)
hidden_states = residual + hidden_states

outputs = (hidden_states,)
Expand Down
2 changes: 1 addition & 1 deletion optimum/intel/ipex/modeling_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@


def _is_patched_with_ipex(model, task):
if is_ipex_version("<", "2.5.0"):
if is_ipex_version("<", "2.3.0"):
return False

if isinstance(model, torch.jit.ScriptModule):
Expand Down
32 changes: 14 additions & 18 deletions tests/ipex/test_modeling.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,14 +171,13 @@ class IPEXModelForCausalLMTest(unittest.TestCase):
"gpt2",
"gpt_neo",
"gpt_neox",
"llama",
"llama2",
"mistral",
# "phi",
"mpt",
"opt",
)
IPEX_PATCHED_SUPPORTED_ARCHITECTURES = ("llama",)
IPEX_PATCHED_SUPPORTED_ARCHITECTURES = ("llama2",)
GENERATION_LENGTH = 100
SPEEDUP_CACHE = 1.0

Expand Down Expand Up @@ -220,6 +219,10 @@ def test_pipeline(self, model_arch):
self.assertTrue(all("This is a sample" in item["generated_text"] for item in outputs))

@parameterized.expand(SUPPORTED_ARCHITECTURES)
@unittest.skipIf(
is_ipex_version(">=", "2.3.0"),
reason="CPU IPEXModel does not support assisted decoding when ipex version >= 2.3.0",
)
jiqing-feng marked this conversation as resolved.
Show resolved Hide resolved
def test_assisted_decoding(self, model_arch):
model_id = MODEL_NAMES[model_arch]
tokenizer = AutoTokenizer.from_pretrained(model_id)
Expand All @@ -235,21 +238,12 @@ def test_assisted_decoding(self, model_arch):
self.assertTrue(torch.equal(ipex_output, ipex_output_assisted))
self.assertTrue(torch.equal(transformers_output, transformers_output_assisted))

@parameterized.expand(
grid_parameters(
{
"model_arch": IPEX_PATCHED_SUPPORTED_ARCHITECTURES,
"use_cache": [True, False],
}
)
)
@unittest.skipIf(is_ipex_version("<", "2.5.0"), reason="Only ipex version > 2.3.0 supports ipex model patching")
def test_ipex_patching_beam_search(self, test_name, model_arch, use_cache):
@parameterized.expand(IPEX_PATCHED_SUPPORTED_ARCHITECTURES)
@unittest.skipIf(is_ipex_version("<", "2.3.0"), reason="Only ipex version >= 2.3.0 supports ipex model patching")
jiqing-feng marked this conversation as resolved.
Show resolved Hide resolved
def test_ipex_patching_beam_search(self, model_arch):
model_id = MODEL_NAMES[model_arch]
set_seed(SEED)
model = IPEXModelForCausalLM.from_pretrained(model_id, export=True, use_cache=use_cache)
self.assertEqual(model.use_cache, use_cache)
trasnformers_model = AutoModelForCausalLM.from_pretrained(model_id)
model = IPEXModelForCausalLM.from_pretrained(model_id, export=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
# Test with batch_size is 1 and 2.
Expand All @@ -259,17 +253,19 @@ def test_ipex_patching_beam_search(self, test_name, model_arch, use_cache):
GenerationConfig(max_new_tokens=4, num_beams=4, do_sample=True),
GenerationConfig(max_new_tokens=4, num_beams=8, do_sample=True),
GenerationConfig(max_new_tokens=4, num_beams=32, do_sample=True),
GenerationConfig(max_new_tokens=4, do_sample=not use_cache, top_p=1.0, top_k=5, penalty_alpha=0.6),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IPEXModel is not supported _contrastive_search for now, we will try to enable it in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, could we then add a warning to state this is not supported (at least for transformers >= v4.39.0) and then upgrade it in the setup.py maybe ?

GenerationConfig(max_new_tokens=4, do_sample=True, top_p=1.0, top_k=5, penalty_alpha=0.6),
GenerationConfig(max_new_tokens=4, do_sample=True, top_p=0.9, top_k=0),
)
for text in texts:
tokens = tokenizer(text, padding=True, return_tensors="pt")
for generation_config in generation_configs:
outputs = model.generate(**tokens, generation_config=generation_config)
transformers_outputs = trasnformers_model.generate(**tokens, generation_config=generation_config)
self.assertIsInstance(outputs, torch.Tensor)
self.assertEqual(outputs, transformers_outputs)

jiqing-feng marked this conversation as resolved.
Show resolved Hide resolved
@unittest.skipIf(
is_ipex_version(">=", "2.3.0"),
reason="CPU IPEXModel only supports with past_key_values for ipex version >= 2.3.0",
)
def test_compare_with_and_without_past_key_values(self):
model_id = "echarlaix/tiny-random-gpt2-torchscript"
tokenizer = AutoTokenizer.from_pretrained(model_id)
Expand Down
Loading