Hyperparam tuning #54

tpritsky · 2023-03-16T03:30:44Z

Hi guys, I've made some edits to hyperparameter tuning with raytune. Would love to get your feedback.

Some notes:

There is a risk of running out of memory since ray saves large checkpoint files in ray_results. Potentially deleting this directory between runs may help
Please feel free to adjust the ranges for the hyperparameters (line 915 in search space)
I had to comment out tensorboard logging due to pickling errors with raytune
I had to add a home_dir path in line 778 to work with my local directory (please feel free to edit accordingly)

…for learning rate hyperparameter tuning

…illeRoberts/CS224N-Project-BERT-MultiTask into hyperparam_tuning

JosselinSomervilleRoberts

Can you paste the error with tensorboard and also which line you comment to solve the issue? Have you checked that this PR does not brake our existing build? Maybe add a parameter --ray_tune so that if we don't specify it, the behavior is the same as currently (I'm a bit skeptical regarding the new parameters).

Also could you give us some instructions on how to install raytune (It seems like you struggled, so I would love to get your opinion on how to do this :) )

multitask_classifier.py

tpritsky · 2023-03-16T18:26:59Z

To address your points:

Can you paste the error with tensorboard and also which line you comment to solve the issue? Have you checked that this PR does not brake our existing build? Maybe add a parameter --ray_tune so that if we don't specify it, the behavior is the same as currently (I'm a bit skeptical regarding the new parameters).
Yes, here's the tensorboard error:
`
Serializing 'writer' <torch.utils.tensorboard.writer.SummaryWriter object at 0x7f1c93b6efd0>...
!!! FAIL serialization: cannot pickle 'tensorflow.python.lib.io._pywrap_file_io.WritableFile' object
Serializing '_annotated' FunctionTrainable...
================================================================================
Variable:

    FailTuple(writer [obj=<torch.utils.tensorboard.writer.SummaryWriter object at 0x7f1c93b6efd0>, parent=<function train_multitask at 0x7f1c93b644c0>])

was found to be non-serializable. There may be multiple other undetected variables that were non-serializable.
Consider either removing the instantiation/imports of these variables or moving the instantiation into the scope of the function/class.
`

Once I commented out all references to writer (eg. writer.add_scalar) in train_multitask and model_eval_multitask (which is called by train_multitask), the code runs fine.

Also could you give us some instructions on how to install raytune (It seems like you struggled, so I would love to get your opinion on how to do this :) )

Raytune was actually simple enough to install on AWS with pip. But here's the command:
pip install -U "ray[tune]" # installs Ray + dependencies for Ray Tune

multitask_classifier

marie-huynh

Is it possible to refactor the tuning code in another file than the multitask_classifier in the main ?

…param_tuning

tpritsky · 2023-03-18T18:55:29Z

Hi @marie-huynh raytune serves as a wrapper function and requires the training function (train_multitask) to be modified to accept a config dictionary passed by ray and use the arguments within it. So I don't think I can run raytune without directly editing multitask_classifier.

However, I have not made functional edits to train_multitask (only editing the input argument to be a config dictionary passed by ray instead of args). Internally, I've also edited train_multitask to use arguments from this config dictionary.

Finally, I've added two new command line flags:

--tune_hyperparameters: if this isn't provided, training runs normally
--num_tuning_runs: Sets the number of hyperparameter training experiments to run

tpritsky · 2023-03-18T19:01:02Z

I've updated my script to avoid out of memory errors by saving directly to the disk rather than RAM. As long as you update your disk size, it should run ok.

tpritsky and others added 5 commits March 14, 2023 21:56

Added raytune code. Will continue to edit; this is just a first pass …

a420c78

…for learning rate hyperparameter tuning

Updated multitask classifier to work with hyperparam tuning

d0e0d29

Merge branch 'hyperparam_tuning' of https://github.com/JosselinSomerv…

bb98e1c

…illeRoberts/CS224N-Project-BERT-MultiTask into hyperparam_tuning

Added more hyperparameters to raytune

96d7d77

Updated multitask_classifier to only use ray while training

69d7118

tpritsky requested review from JosselinSomervilleRoberts and marie-huynh and removed request for JosselinSomervilleRoberts March 16, 2023 03:37

pritskyt added 2 commits March 15, 2023 21:13

Update .gitignore

9dbd6db

Merge branch 'main' into hyperparam_tuning

ab9e3bf

JosselinSomervilleRoberts requested changes Mar 16, 2023

View reviewed changes

multitask_classifier.py Outdated Show resolved Hide resolved

multitask_classifier.py Show resolved Hide resolved

multitask_classifier.py Outdated Show resolved Hide resolved

multitask_classifier.py Show resolved Hide resolved

JosselinSomervilleRoberts linked an issue Mar 16, 2023 that may be closed by this pull request

Implement Hyperparameter tuning #34

Open

tpritsky and others added 4 commits March 16, 2023 18:58

Updated hyperparam tuning and commented out tensorboard references in

ce9a56c

multitask_classifier

Adding back tensorboard

acc3d04

Fix undefined s

1a35ef1

Fixed print_subset_of_args() to handle dicts

46cf6f1

marie-huynh reviewed Mar 16, 2023

View reviewed changes

JosselinSomervilleRoberts and others added 4 commits March 16, 2023 20:28

Disable tensorboard

f808c4e

Fixed bug with lr

9f39ebb

Update hyperparameter tuning to mitigate out of memory errors

f736f41

Merge remote-tracking branch 'origin/hyperparam_tuning_v2' into hyper…

6a4638c

…param_tuning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyperparam tuning #54

Hyperparam tuning #54

tpritsky commented Mar 16, 2023

JosselinSomervilleRoberts left a comment

tpritsky commented Mar 16, 2023

marie-huynh left a comment

tpritsky commented Mar 18, 2023 •

edited

Loading

tpritsky commented Mar 18, 2023

Hyperparam tuning #54

Are you sure you want to change the base?

Hyperparam tuning #54

Conversation

tpritsky commented Mar 16, 2023

JosselinSomervilleRoberts left a comment

Choose a reason for hiding this comment

tpritsky commented Mar 16, 2023

marie-huynh left a comment

Choose a reason for hiding this comment

tpritsky commented Mar 18, 2023 • edited Loading

tpritsky commented Mar 18, 2023

tpritsky commented Mar 18, 2023 •

edited

Loading