-
What is the motivation for this task?When running Anomalib/EfficientAD, it seems to only log the training dataset metrics. I am using an AWS Sagemaker Estimator in my pipeline and I would like to connect it's metrics_definitions which uses regex to search the logs. I can parse out the loss during training but anomalib seems to not print the metrics during validation for the validation dataset. Additionally, is there a way to control the logging to only print once per epoch, or perhaps once per configurable number of steps or percentage of progress per epoch? When training my data on a large dataset for many epochs for 24 hours my cloud logs are very verbose due to tqdm printing every step when used in the cloud, as opposed to local usage where the progress bar updates in place. Describe the solution you'd likeAn option to print validation metrics and losses Additional contextNo response |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments
-
Validation metrics are printed next to loss in tqdm progress bar. |
Beta Was this translation helpful? Give feedback.
-
Which metrics are the validation metrics in the progress bar during training. Here's a snippet from my cloud logs for an efficient-ad model training:
|
Beta Was this translation helpful? Give feedback.
-
They should be added at the end of validation. |
Beta Was this translation helpful? Give feedback.
-
Hmmm, Perhaps I am misunderstanding the logs and the validation logging. Can you point to me what lines specifies the validation loss from the following logs? Here's a snippet from the end of 1 training epoch to the next with the validation Quantiles calculation and the validation data loader lines.
|
Beta Was this translation helpful? Give feedback.
-
I'm not sure. When I run efficient Ad with default config I get the following:
as you can see pixel_F1 and pixel_AUROC are logged at the end of validation inside tqdm progress bar. |
Beta Was this translation helpful? Give feedback.
-
Hmm, perhaps it's because I removed the image and pixel metrics from the config! I should probably add those back now that I have a end-to-end flow mostly working in AWS. Here's what my metrics looked like from my earlier logs:
I will share my entire config below. From what you are showing me above, are steps 210 to 215 validation? Their step numbers is apart of the training steps? I still don't quite understand where the validation loss is? Is it just the "loss" being re-purposed for those specific steps? All the other metrics you have in your logs above aside from the loss, pixel_AUROC and pixel_F1Score have the train_ prefix. Are the "metrics" in the config just for validation? Can I include a metric that is the loss for the efficientad model? I am used to Keras flows which typically have an explicit val_loss metric during validation.
|
Beta Was this translation helpful? Give feedback.
-
Hello. Yes, you will need to have metrics in config.
yes
Yes, it might be on my system, but the metrics are displayed in training progress bar (validation bar disappears at the end of val)
Validation doesn't have a loss in this case. anomalib/src/anomalib/models/efficient_ad/lightning_model.py Lines 255 to 260 in 1f50c95 then validation step only does forward pass: anomalib/src/anomalib/models/efficient_ad/lightning_model.py Lines 262 to 275 in 1f50c95 which then calculates metrics at the end of val epoch: anomalib/src/anomalib/models/components/base/anomaly_module.py Lines 139 to 148 in 1f50c95
loss is only calculated while training, and metrics are calculated in validation and test.
This way of validation is mostly anomalib specific, so you could also include validation loss, but I think that would require rewriting validation step to also calculate that loss. |
Beta Was this translation helpful? Give feedback.
Hello. Yes, you will need to have metrics in config.
yes
Yes, it might be on my system, but the metrics are displayed in training progress bar (validation bar disappears at the end of val)
Validation doesn't have a loss in this case.
At the start of val, anomaly map quantiles are calculated:
anomalib/src/anomalib/models/efficient_ad/lightning_model.py
Lines 255 to 260 in 1f50c95