Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metric_for_best_model is not working with summarization script? #520

Closed
hoangthangta opened this issue Mar 16, 2023 · 3 comments
Closed
Labels
question Further information is requested Stale

Comments

@hoangthangta
Copy link

hoangthangta commented Mar 16, 2023

Environment info

  • adapter-transformers version: 3.2.0
  • Platform: Linux-5.13.0-40-generic-x86_64-with-glibc2.10
  • Python version: 3.8.5
  • Huggingface_hub version: 0.12.1
  • PyTorch version (GPU?): 1.13.1+cu117 (True)
  • Tensorflow version (GPU?): 2.11.0 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes

Information

Model I am using (Bert, XLNet ...): BART
Language I am using the model on (English, Chinese ...): English
Adapter setup I am using (if any): all

Details

I follow the guideline https://github.com/adapter-hub/adapter-transformers/blob/master/examples/pytorch/summarization/run_summarization.py here.
When I use training_args.metric_for_best_model = 'eval_rouge1', it is not working. The model only loads the metric of the first epoch as the best.

Here is the trainer_state.json file:

{ "best_metric": 52.1229, "best_model_checkpoint": "output/adapter/checkpoint-250", "epoch": 2.0, "global_step": 500, "is_hyper_param_search": false, "is_local_process_zero": true, "is_world_process_zero": true, "log_history": [ { "epoch": 1.0, "eval_gen_len": 30.52, "eval_loss": 1.9380815029144287, **"eval_rouge1": 52.1229,** "eval_rouge2": 31.6943, "eval_rougeL": 46.4212, "eval_rougeLsum": 52.1207, "eval_runtime": 85.5511, "eval_samples_per_second": 1.169, "eval_steps_per_second": 0.152, "step": 250 }, { "epoch": 2.0, "learning_rate": 1.6866666666666666e-05, "loss": 2.6408, "step": 500 }, { "epoch": 2.0, "eval_gen_len": 36.4, "eval_loss": 1.8335657119750977, "eval_rouge1": 52.2133, "eval_rouge2": 31.6728, "eval_rougeL": 45.8727, "eval_rougeLsum": 52.2295, "eval_runtime": 109.2482, "eval_samples_per_second": 0.915, "eval_steps_per_second": 0.119, "step": 500 } ], "max_steps": 750, "num_train_epochs": 3, "total_flos": 430166620692480.0, "trial_name": null, "trial_params": null }

@hoangthangta hoangthangta added the question Further information is requested label Mar 16, 2023
@hoangthangta
Copy link
Author

hoangthangta commented Mar 16, 2023

It may be my mistake when using this parameter. I will set load_best_model_at_end=False and training_args.greater_is_better = True to let it works.

Use in conjunction with `load_best_model_at_end` to specify the metric to use to compare two different
            models. Must be the name of a metric returned by the evaluation with or without the prefix `"eval_"`. Will
            default to `"loss"` if unspecified and `load_best_model_at_end=True` (to use the evaluation loss).

@adapter-hub-bert
Copy link
Member

This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.

@adapter-hub-bert
Copy link
Member

This issue was closed because it was stale for 14 days without any activity.

@adapter-hub-bert adapter-hub-bert closed this as not planned Won't fix, can't repro, duplicate, stale Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

2 participants