Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed typo in training rules #549

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions training_rules.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,7 @@ The MLPerf verifier scripts checks all hyperparameters except those with names m
|===
|Model |Optimizer |Name |Constraint |Definition |Reference Code |Latest version available

|bert |lamb |global_batch_size |unconstrained |The glboal batch size for training. |--train_batch_size |v4.1
|bert |lamb |global_batch_size |unconstrained |The global batch size for training. |--train_batch_size |v4.1
|bert |lamb |opt_base_learning_rate |unconstrained |The base learning rate. |--learning_rate |v4.1
|bert |lamb |opt_epsilon |unconstrained |adam epsilon |link:https://github.com/mlperf/training/blob/fb058e3849c25f6c718434e60906ea3b0cb0f67d/language_model/tensorflow/bert/optimization.py#L75[reference code] |v4.1
|bert |lamb |opt_learning_rate_training_steps |unconstrained |Step at which your reach the lowest learning late |link:https://github.com/mlperf/training/blob/master/language_model/tensorflow/bert/run_pretraining.py#L64[reference code] |v4.1
Expand Down Expand Up @@ -319,7 +319,7 @@ The MLPerf verifier scripts checks all hyperparameters except those with names m
|llama2_70b_lora |adamw |opt_learning_rate_warmup_ratio | unconstrained |ratio of steps out of training for linear warmup during initial checkpoint generation. This only affects the learning rate curve in the benchmarking region. |See PR (From Habana, TODO Link) |v4.1
|llama2_70b_lora |adamw |opt_learning_rate_training_steps | unconstrained |Step when the end of cosine learning rate curve is reached. Learning rate cosine decay is in range (opt_learning_rate_warmup_steps + 1,opt_learning_rate_decay_steps]. |See PR (From Habana, TODO Link) |v4.1
|llama2_70b_lora |adamw |opt_base_learning_rate |unconstrained | base leraning rate |See PR (From Habana, TODO Link) |v4.1
|stable diffusion |adamw |global_batch_size |unconstrained |The glboal batch size for training |link:https://github.com/mlcommons/training/blob/master/stable_diffusion/main.py#L633[reference code] |v4.1
|stable diffusion |adamw |global_batch_size |unconstrained |The global batch size for training |link:https://github.com/mlcommons/training/blob/master/stable_diffusion/main.py#L633[reference code] |v4.1
|stable diffusion |adamw |opt_adamw_beta_1 |0.9 |coefficients used for computing running averages of gradient and its square |link:https://github.com/mlcommons/training/blob/master/stable_diffusion/ldm/models/diffusion/ddpm.py#L1629[reference code] |v4.1
|stable diffusion |adamw |opt_adamw_beta_2 |0.999 |coefficients used for computing running averages of gradient and its square |link:https://github.com/mlcommons/training/blob/master/stable_diffusion/ldm/models/diffusion/ddpm.py#L1630[reference code] |v4.1
|stable diffusion |adamw |opt_adamw_epsilon |1e-08 |term added to the denominator to improve numerical stability |link:https://github.com/mlcommons/training/blob/master/stable_diffusion/ldm/models/diffusion/ddpm.py#L1631[reference code] |v4.1
Expand Down Expand Up @@ -756,4 +756,4 @@ MLPerf recommends calculating _utilization_ as `model_tensor_flops / (peak_syste

Use of `hardware_tensor_flops` (defined as model_tensor_flops plus operations added due to activation recomputation), instead of `model_tensor_flops` is strongly discouraged because those are not useful flops for the model. If `hardware_tensor_flops` are used for calculating utilization, it is recommended to also provide an accompanying calculation with `model_tensor_flops`.

Note _utilization_ is not an official MLPerf metric.
Note _utilization_ is not an official MLPerf metric.
Loading