Setfit does not work with transformers dev-version with error "'CallbackHandler' object has no attribute 'tokenizer'" #564

WeichenXu123 · 2024-10-08T06:14:44Z

Repro:

transformers: pip install git+https://github.com/huggingface/transformers
setfit: pip install setfit==1.1.0

code:

    dataset = load_dataset("sst2")

    train_dataset = sample_dataset(dataset["train"], label_column="label", num_samples=8)
    eval_dataset = dataset["validation"]

    model = SetFitModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")

    training_args = SetFitTrainingArguments(
        loss=CosineSimilarityLoss,
        batch_size=16,
        num_iterations=5,
        num_epochs=1,
        report_to="none",
    )

    # TODO: Remove this once https://github.com/huggingface/setfit/issues/512
    #   is resolved. This is a workaround during the deprecation of the
    #   evaluation_strategy argument is being addressed in the SetFit library.
    training_args.eval_strategy = training_args.evaluation_strategy

    trainer = SetFitTrainer(
        model=model,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        metric="accuracy",
        column_mapping={"sentence": "text", "label": "label"},
        args=training_args,
    )

SetFitTrainer constructor Raises error:

AttributeError: 'CallbackHandler' object has no attribute 'tokenizer'

Stack trace:

/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/setfit/trainer.py:328: in __init__
    self.st_trainer = BCSentenceTransformersTrainer(
        args       = TrainingArguments(output_dir='checkpoints', max_steps=-1, sampling_strategy='oversampling', num_iterations=5, head_lea...al_delay=0, eval_max_steps=-1, save_strategy='steps', save_steps=500, save_total_limit=1, load_best_model_at_end=False)
        callbacks  = [<setfit.model_card.ModelCardCallback object at 0x7fdd193972b0>]
        column_mapping = {'label': 'label', 'sentence': 'text'}
        eval_dataset = Dataset({
    features: ['feat_idx', 'text', 'label'],
    num_rows: 872
})
        metric     = 'accuracy'
        metric_kwargs = None
        model      = <setfit.modeling.SetFitModel object at 0x7fdd5a0bf9d0>
        model_init = None
        self       = <setfit.trainer.Trainer object at 0x7fdd54daf940>
        train_dataset = Dataset({
    features: ['feat_idx', 'text', 'label'],
    num_rows: 16
})
/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/setfit/trainer.py:94: in __init__
    self.callback_handler.on_init_end(self.args, self.state, self.control)
        __class__  = <class 'setfit.trainer.BCSentenceTransformersTrainer'>
        callback   = <setfit.model_card.ModelCardCallback object at 0x7fdd193972b0>
        callbacks  = [<setfit.model_card.ModelCardCallback object at 0x7fdd193972b0>]
        kwargs     = {}
        overwritten_call_event = <function BCSentenceTransformersTrainer.__init__.<locals>.overwritten_call_event at 0x7fdd1938eca0>
        self       = <setfit.trainer.BCSentenceTransformersTrainer object at 0x7fdd19397a60>
        setfit_args = TrainingArguments(output_dir='checkpoints', max_steps=-1, sampling_strategy='oversampling', num_iterations=5, head_lea...al_delay=0, eval_max_steps=-1, save_strategy='steps', save_steps=500, save_total_limit=1, load_best_model_at_end=False)
        setfit_model = <setfit.modeling.SetFitModel object at 0x7fdd5a0bf9d0>
/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/transformers/trainer_callback.py:464: in on_init_end
    return self.call_event("on_init_end", args, state, control)
        args       = SentenceTransformerTrainingArguments(output_dir='checkpoints', overwrite_output_dir=False, do_train=False, do_eval=Fal....BATCH_SAMPLER: 'batch_sampler'>, multi_dataset_batch_sampler=<MultiDatasetBatchSamplers.PROPORTIONAL: 'proportional'>)
        control    = TrainerControl(should_training_stop=False, should_epoch_stop=False, should_save=False, should_evaluate=False, should_log=False)
        self       = <transformers.trainer_callback.CallbackHandler object at 0x7fdd04d53280>
        state      = TrainerState(epoch=None, global_step=0, max_steps=0, logging_steps=500, eval_steps=500, save_steps=500, train_batch_si..., 'should_epoch_stop': False, 'should_save': False, 'should_evaluate': False, 'should_log': False}, 'attributes': {}}})
/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/setfit/trainer.py:87: in <lambda>
    self.callback_handler.call_event = lambda *args, **kwargs: overwritten_call_event(
        args       = ('on_init_end', SentenceTransformerTrainingArguments(output_dir='checkpoints', overwrite_output_dir=False, do_train=Fa...ntrol(should_training_stop=False, should_epoch_stop=False, should_save=False, should_evaluate=False, should_log=False))
        kwargs     = {}
        overwritten_call_event = <function BCSentenceTransformersTrainer.__init__.<locals>.overwritten_call_event at 0x7fdd1938eca0>
        self       = <setfit.trainer.BCSentenceTransformersTrainer object at 0x7fdd19397a60>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <transformers.trainer_callback.CallbackHandler object at 0x7fdd04d53280>
event = 'on_init_end'
args = SentenceTransformerTrainingArguments(output_dir='checkpoints', overwrite_output_dir=False, do_train=False, do_eval=Fal....BATCH_SAMPLER: 'batch_sampler'>, multi_dataset_batch_sampler=<MultiDatasetBatchSamplers.PROPORTIONAL: 'proportional'>)
state = TrainerState(epoch=None, global_step=0, max_steps=0, logging_steps=500, eval_steps=500, save_steps=500, train_batch_si..., 'should_epoch_stop': False, 'should_save': False, 'should_evaluate': False, 'should_log': False}, 'attributes': {}}})
control = TrainerControl(should_training_stop=False, should_epoch_stop=False, should_save=False, should_evaluate=False, should_log=False)
kwargs = {}
callback = <transformers.trainer_callback.DefaultFlowCallback object at 0x7fdd04e541c0>

    def overwritten_call_event(self, event, args, state, control, **kwargs):
        for callback in self.callbacks:
            result = getattr(callback, event)(
                self.setfit_args,
                state,
                control,
                model=self.setfit_model,
                st_model=self.model,
                st_args=args,
>               tokenizer=self.tokenizer,
                optimizer=self.optimizer,
                lr_scheduler=self.lr_scheduler,
                train_dataloader=self.train_dataloader,
                eval_dataloader=self.eval_dataloader,
                **kwargs,
            )
E           AttributeError: 'CallbackHandler' object has no attribute 'tokenizer'

The text was updated successfully, but these errors were encountered:

harupy · 2024-10-08T06:38:19Z

@WeichenXu123 full traceback?

WeichenXu123 · 2024-10-08T06:53:38Z

added full stack trace.

xklong1202 · 2024-10-26T16:35:33Z

Hi, I also encountered the same issue. Could you let me know if you managed to solve this?

AmoghM · 2024-10-27T09:15:35Z

+1 Same issue

xklong1202 · 2024-10-27T09:30:03Z

I found a way to solve this problem. I ran the code using Colab and didn’t encounter the issue. I also built a virtual environment on my local computer and reinstalled all the packages and Python using the same versions as in Colab. However, I still encountered the issue. I’m wondering whether this is related to my OS (I’m using MacOS)

WeichenXu123 mentioned this issue Oct 8, 2024

Fix transformer CI mlflow/mlflow#13342

Merged

39 tasks

WeichenXu123 changed the title ~~Setfit does not work with transformers-dev version~~ Setfit does not work with transformers dev-version with error "'CallbackHandler' object has no attribute 'tokenizer'" Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setfit does not work with transformers dev-version with error "'CallbackHandler' object has no attribute 'tokenizer'" #564

Setfit does not work with transformers dev-version with error "'CallbackHandler' object has no attribute 'tokenizer'" #564

WeichenXu123 commented Oct 8, 2024 •

edited

Loading

harupy commented Oct 8, 2024

WeichenXu123 commented Oct 8, 2024

xklong1202 commented Oct 26, 2024

AmoghM commented Oct 27, 2024

xklong1202 commented Oct 27, 2024

Setfit does not work with transformers dev-version with error "'CallbackHandler' object has no attribute 'tokenizer'" #564

Setfit does not work with transformers dev-version with error "'CallbackHandler' object has no attribute 'tokenizer'" #564

Comments

WeichenXu123 commented Oct 8, 2024 • edited Loading

harupy commented Oct 8, 2024

WeichenXu123 commented Oct 8, 2024

xklong1202 commented Oct 26, 2024

AmoghM commented Oct 27, 2024

xklong1202 commented Oct 27, 2024

WeichenXu123 commented Oct 8, 2024 •

edited

Loading