-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add position_ids in forward #456
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the addition @jiqing-feng, you're right we need to add the support of position_ids
now that it has been integrated in optimum
optimum/intel/generation/modeling.py
Outdated
@@ -88,7 +89,7 @@ def jit_trace(model: PreTrainedModel, task: str, use_cache: bool = False): | |||
traced_model(**model_inputs) | |||
traced_model(**model_inputs) | |||
|
|||
return traced_model | |||
return traced_model, has_position_ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep jit_trace
as it is
return traced_model, has_position_ids | |
return traced_model | |
optimum/intel/generation/modeling.py
Outdated
@@ -116,6 +118,7 @@ def __init__( | |||
self._device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") | |||
self.normalized_config = NormalizedConfigManager.get_normalized_config_class(config.model_type)(config) | |||
self.model_dtype = kwargs.get("model_dtype", None) | |||
self.has_position_ids = has_position_ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to have an attribute, we can use MODEL_TYPES_REQUIRING_POSITION_IDS
directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will use it.
optimum/intel/generation/modeling.py
Outdated
position_ids = kwargs.get("position_ids", None) | ||
if self.has_position_ids and position_ids is not None: | ||
inputs.update({"position_ids": position_ids}) | ||
elif self.has_position_ids and position_ids is None: | ||
seq_length = input_ids.shape[-1] | ||
if not self.use_cache: | ||
past_key_values_length = 0 | ||
else: | ||
past_key_values_length = past_key_values[0][1].shape[-2] | ||
position_ids = torch.arange( | ||
past_key_values_length, seq_length + past_key_values_length, dtype=torch.long, device=self._device | ||
).unsqueeze(0) | ||
inputs.update({"position_ids": position_ids}) | ||
elif not self.has_position_ids and position_ids is not None: | ||
logger.warning("You miss the position_ids in the inputs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should generate the position_ids
here as you already added it in prepare_inputs_for_generation
, I would just give it when needed by checking the graph as done in https://github.com/huggingface/optimum/blob/e7bd60dd2c1e295263ba57a4e468a62ab5b179e8/optimum/onnxruntime/modeling_decoder.py#L229-L232
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is more reasonable. However, for generation tasks, different decoding way will cause different inputs. For example, llama in greedy_search contains position_ids
in inputs but assisted_decoding only have input_ids
. Besides, we already generate attention_mask
in the forward. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, I'm ok with the modification but think we need to add a test for every architecture to verify we create it correctly. For example is past_key_values_length = past_key_values[0][1].shape[-2]
for every architecture ? (looks like it from the empty pkv generation above but would like to verify, also to make sure this is compatible in case we add support for new architectures)
Hi @echarlaix . I have fixed all tests, would you please help me review these changes? Thx! |
Hi @echarlaix , could you have a look at this change? Thx 😄 ! |
Hi @echarlaix . Would you please help me review these changes? This change could avoid forward failure because of the |
optimum/intel/generation/modeling.py
Outdated
if has_position_ids and position_ids is not None: | ||
inputs.update({"position_ids": position_ids}) | ||
elif has_position_ids and position_ids is None: | ||
seq_length = input_ids.shape[-1] | ||
if not self.use_cache: | ||
past_key_values_length = 0 | ||
else: | ||
past_key_values_length = ( | ||
past_key_values[0].shape[-2] | ||
if model_type.replace("-", "_") in MULTI_QUERY_ATTN_MODELS | ||
else past_key_values[0][1].shape[-2] | ||
) | ||
position_ids = torch.arange( | ||
past_key_values_length, seq_length + past_key_values_length, dtype=torch.long, device=self._device | ||
).unsqueeze(0) | ||
inputs.update({"position_ids": position_ids}) | ||
elif not has_position_ids and position_ids is not None: | ||
logger.warning("You miss the position_ids in the inputs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to check directly in the model graph to check whether position_ids
is one of the model's expected input if that can be done. If not this new addition will create issues for all the previously exported models (the ones that were exported without any position_ids)for all architectures from
MODEL_TYPES_REQUIRING_POSITION_IDS`
if has_position_ids and position_ids is not None: | |
inputs.update({"position_ids": position_ids}) | |
elif has_position_ids and position_ids is None: | |
seq_length = input_ids.shape[-1] | |
if not self.use_cache: | |
past_key_values_length = 0 | |
else: | |
past_key_values_length = ( | |
past_key_values[0].shape[-2] | |
if model_type.replace("-", "_") in MULTI_QUERY_ATTN_MODELS | |
else past_key_values[0][1].shape[-2] | |
) | |
position_ids = torch.arange( | |
past_key_values_length, seq_length + past_key_values_length, dtype=torch.long, device=self._device | |
).unsqueeze(0) | |
inputs.update({"position_ids": position_ids}) | |
elif not has_position_ids and position_ids is not None: | |
logger.warning("You miss the position_ids in the inputs") | |
if "position_ids" in self.input_names: | |
if position_ids is None: | |
position_ids = ... | |
inputs["position_ids"] = position_ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also concerning the position_ids
(and the attention_mask
) computation I think we should do as follow :
optimum-intel/optimum/intel/openvino/modeling_decoder.py
Lines 365 to 383 in 819e3e8
if "attention_mask" in self.input_names or "position_ids" in self.input_names: | |
if attention_mask is not None: | |
attention_mask = np.array(attention_mask) | |
else: | |
attention_mask = np.ones( | |
(input_ids.shape[0], input_ids.shape[1] + past_len), dtype=inputs["input_ids"].dtype | |
) | |
if "attention_mask" in self.input_names: | |
inputs["attention_mask"] = attention_mask | |
if "position_ids" in self.input_names: | |
if position_ids is not None: | |
position_ids = np.array(position_ids) | |
else: | |
position_ids = np.cumsum(attention_mask, axis=1) - 1 | |
position_ids[attention_mask == 0] = 1 | |
if past_key_values: | |
position_ids = np.expand_dims(position_ids[:, -1], axis=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid there is no input_names
attr in this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes could this be added you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can try it
optimum/intel/generation/modeling.py
Outdated
@@ -264,6 +262,7 @@ def forward( | |||
} | |||
|
|||
model_type = self.config.model_type.replace("_", "-") | |||
has_position_ids = True if model_type in MODEL_TYPES_REQUIRING_POSITION_IDS else False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has_position_ids = True if model_type in MODEL_TYPES_REQUIRING_POSITION_IDS else False | |
has_position_ids = model_type in MODEL_TYPES_REQUIRING_POSITION_IDS | |
Hi @echarlaix . I am afraid I think it would be better to keep this way and don't use |
I was suggesting to check the model graph directly as done in https://github.com/huggingface/optimum-intel/blob/v1.12.1/optimum/intel/openvino/modeling_base.py#L82 (to check whether position_ids is one of the model's expected input) If that can't be done, this PR might results in issues for all the previously exported models (the ones that were exported without any position_ids) for all architectures from |
Hi @echarlaix . I think I got what you mean. The forward inputs were checked by the graph model inputs, could you please help me to review these changes? Thx! |
Hi @echarlaix . Sorry for the misunderstanding. I just found that there is no way to get the input names from a Torch Script model, so I can only get the input names when tracing the model. Would like to hear your opinion. Thx! |
I was able to have something with : input_names = [inputs.debugName() for inputs in model.graph.inputs()] can you check it out ? |
Hi @echarlaix . Thanks for your advice, it perfectly fixed my problem. Would you please review these changes? Thx! And the failed CIs are not related to my changes |
Added updates in jiqing-feng#2, can you take a look ? |
Also could you add a test before we can merge ? |
add input names
Hi @echarlaix . I have merged your changes and also added the tests. Would you please help to review the test function? Thx! BTW, failed CIs seem not related to our changes. |
* add position_ids in forward * check if jit model need position_ids * use MODEL_TYPES_REQUIRING_POSITION_IDS * fix has_position_ids * fix position_ids length * rm useless params * check model inputs by input names * fix format * check input names in graph model * fix style * consider eager model in input_names * add input names * add text input names * fix styl;e * Update optimum/intel/generation/modeling.py * fix format * Update optimum/intel/generation/modeling.py --------- Co-authored-by: Ella Charlaix <ella@huggingface.co> Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>
Hi @echarlaix
Do you think that we should add
position_ids
in the forward of the generation model? Theoptimum
has supported to generateposition_ids
in this PR.cc @changwangss