Skip to content

Evaluator

iluise edited this page Apr 17, 2024 · 1 revision

Evaluation

The model evaluation is ran through the following command:

srun python atmorep/core/evaluate.py

Please note that evaluate.py is a wrapper for atmorep/core/evaluator.py, which contains a function for each supported evaluation option.

Supported evaluation options

BERT

The option BERT evaluates the model in the so called BERT mode, so randomly masking some of the tokens within the loaded hypercube source, with random choices spanning both the space and time dimensions.
example:

mode, options = 'BERT', {'years_test' : [2021], 'fields[0][2]' : [123], 'attention' : False}

this examples runs the 'BERT' mode for 2021, model level 123, without storing the attention maps (generally very time/memory consuming).

Forecast

The option forecast evaluates the model for the forecasting task, so completely masking the last N tokens in time, with N defined by forecast_num_tokens.
example:

mode, options = 'forecast', {'forecast_num_tokens' : 1} #, 'fields[0][2]' : [123, 137], 'attention' : False }

this examples is for a 3h forecasting window (1 masked token assuming token_size of 3 in time), model levels 123 and 137, and attention maps are not stored. For longer time-windows e.g. 12h forecast: forecast_num_tokens = 4.

Note: Please remember that these are masked tokens within the pre-loaded source hypercube, so forecast_num_tokens < num_tokens[0]. The case forecast_num_tokens = num_tokens[0] works, but is not meaningful either, as all the source cube would be masked.

Global Forecast

The option global_forecast runs the model in forecast mode, but tiling the globe so the tokens form a global forecast (-90,90 lat - 0,360 lon).
example:

mode, options = 'global_forecast', { 'fields[0][2]' : [123, 137],
                                        'dates' : [[2021, 2, 10, 12]],
                                        'token_overlap' : [0, 0],
                                        'forecast_num_tokens' : 1,
                                        'attention' : False }

this examples runs a global forecast of 3h ('forecast_num_tokens' : 1), with lead times specified in dates and for just model levels 123 and 137. token_overlap specifies the spatial overlap between two adjacent tiles (tokens), to catch for example fast moving waves and increase the forecasting accuracy. The suggested values are [0,0] (no overlap) or [2, 6], expressed in grid points.


In the following we report less used evaluation options, which might not work out of the box due to backward compatibility issues:

Fixed location

The option fixed_location evaluates the model fixing the center of the tokens to a specific location in space, and randomising over the time dimension.

Temporal Interpolation

The option temporal_interpolation evaluates the model for the temporal interpolation task, so masking the middle token of the loaded hypercube as described in the "Trainer" page.

Global Forecast within a spatial sub-region

the option global_forecast_range evaluates the model in global forecast (see above), but for N consecutive steps starting from a single lead_time defined by cur_date. The number of steps is defined by the num_steps parameter.

Output

The output of the evaluation step is a set of .zarr files. Example:

  • model_idc96xrbip.json = model settings used in the evaluation phase
  • results_idc96xrbip_epoch00000_pred.zarr = file storing all predicted tokens
  • results_idc96xrbip_epoch00000_target.zarr = file storing the masked target tokens (aka ground truth) in the same format as predictions
  • results_idc96xrbip_epoch00000_source.zarr = file storing all the loaded tokens (masked tokens are stored as zeros)
  • results_idc96xrbip_epoch00000_ens.zarr = file storing the ensemble predictions
  • (optional) results_idc96xrbip_epoch00000_attention.zarr = file storing the attention scores. Written only if attention = True at evaluation stage.