You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following columns in the training set don't have a corresponding argument in `EncoderDecoderModel.forward` and have been ignored: token_type_ids. If token_type_ids are not expected by `EncoderDecoderModel.forward`, you can safely ignore this message.
***** Running training *****
Num examples = 10
Num Epochs = 3000
Instantaneous batch size per device = 16
Total train batch size (w. parallel, distributed & accumulation) = 16
Gradient Accumulation steps = 1
Total optimization steps = 3000
Number of trainable parameters = 0
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
xxx/adapters/lib/python3.10/site-packages/transformers/models/encoder_decoder/modeling_encoder_decoder.py:654: FutureWarning: Version v4.12.0 introduces a better way to train encoder-decoder models by computing the loss inside the encoder-decoder framework rather than in the decoder itself. You may observe training discrepancies if fine-tuning a model trained with versions anterior to 4.12.0. The decoder_input_ids are now created based on the labels, no need to pass them yourself anymore.
warnings.warn(DEPRECATION_WARNING, FutureWarning)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[22], line 1
----> 1 train_results = trainer.train()
File ~/.conda/envs/adapters/lib/python3.10/site-packages/transformers/trainer.py:1543, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1538 self.model_wrapped = self.model
1540 inner_training_loop = find_executable_batch_size(
1541 self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size
1542 )
-> 1543 return inner_training_loop(
1544 args=args,
1545 resume_from_checkpoint=resume_from_checkpoint,
1546 trial=trial,
1547 ignore_keys_for_eval=ignore_keys_for_eval,
1548 )
File ~/.conda/envs/adapters/lib/python3.10/site-packages/transformers/trainer.py:1791, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
1789 tr_loss_step = self.training_step(model, inputs)
1790 else:
-> 1791 tr_loss_step = self.training_step(model, inputs)
1793 if (
1794 args.logging_nan_inf_filter
1795 and not is_torch_tpu_available()
1796 and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
1797 ):
1798 # if loss is nan or inf simply add the average of previous logged losses
1799 tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)
File ~/.conda/envs/adapters/lib/python3.10/site-packages/transformers/trainer.py:2549, in Trainer.training_step(self, model, inputs)
2546 loss = loss / self.args.gradient_accumulation_steps
2548 if self.do_grad_scaling:
-> 2549 self.scaler.scale(loss).backward()
2550 elif self.use_apex:
2551 with amp.scale_loss(loss, self.optimizer) as scaled_loss:
File ~/.conda/envs/adapters/lib/python3.10/site-packages/torch/_tensor.py:488, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
478 if has_torch_function_unary(self):
479 return handle_torch_function(
480 Tensor.backward,
481 (self,),
(...)
486 inputs=inputs,
487 )
--> 488 torch.autograd.backward(
489 self, gradient, retain_graph, create_graph, inputs=inputs
490 )
File ~/.conda/envs/adapters/lib/python3.10/site-packages/torch/autograd/__init__.py:197, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
192 retain_graph = create_graph
194 # The reason we repeat same the comment below is that
195 # some Python versions print out the first line of a multi-line function
196 # calls in the traceback and some print out the last line
--> 197 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
198 tensors, grad_tensors_, retain_graph, create_graph, inputs,
199 allow_unreachable=True, accumulate_grad=True)
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
This issue is also present in transformers, where a change of the optimizer is discussed as a solution. Changing the optimizer in the Seq2SeqAdapterTrainer did not work:
Thanks for reporting these issues and sorry for not getting back to you earlier. Unfortunately, our current encoder-decoder implementation is very hacky and has all sorts of issues currently. We'll try to look into this.
Environment info
adapter-transformers
version: 3.2.1Details
I'm trying to train a EncoderDecoder Adapter with BertGeneration using the Seq2SeqAdapterTrainer:
This results in a RuntimeError:
This issue is also present in transformers, where a change of the optimizer is discussed as a solution. Changing the optimizer in the Seq2SeqAdapterTrainer did not work:
Training with the same settings without adapter and Seq2SeqTrainer does work.
The text was updated successfully, but these errors were encountered: