Evaluate code for univariate forcasting task maybe is not right. There is information leaking. #35

wiwi · 2023-05-26T00:20:46Z

As title said that the results of univariate forcasting provide by this repository and the related paper maybe is not right because the evaluate code of univariate forcasting leads to information leaking. The following figure described the evaluate code of univariate forcasting in this repository.

The main problem is that eval_forecasting() encode data through sliding window(sliding step=1) and then split the training/validation/test data set. Hence, it will lead to information leaking. The general way to do this is splitting the data first and then encodes them respectively.

This bug is found in another repository which inherit evaluate code of TS2VEC, and we found the unfair evaluate code would contribute about 10% improvement on Electricity dataset about such project. Then, we checked the evaluate code of univariate forcasting in this repository and found the same operation.

So, I hope authors check the evaluate code of univariate forcasting, especially the result presented in the TS2VEC paper because maybe such results are wrong. If this problem is confirmed, I hope the authors will revise the results of the paper to avoid unfair comparisons.

zhihanyue · 2023-05-26T02:13:59Z

The figure you provided is wrong. We perform causal inference. For timestamp t, the sliding window is from t-T+1 to t. Therefore, there is no information leakage. The following is the corrected figure.

zhihanyue · 2023-05-26T02:41:16Z

I have checked your another issue xingyu617/SimTS_Representation_Learning#5 and seem to understand your misunderstanding. The code you provided is inappropriate.

# original code
# all_repr = model.encode(
#     data[:, :, n_covariate_cols:],
#     casual=True,
#     sliding_length=1,
#     sliding_padding=padding,
#     batch_size=128
# )
# print("all_repr",data.shape, all_repr.shape)
# train_repr = all_repr[:, train_slice]
# valid_repr = all_repr[:, valid_slice]
# test_repr = all_repr[:, test_slice]

# your code
train_data = data[:, train_slice, n_covariate_cols:]
valid_data = data[:, valid_slice, n_covariate_cols:]
test_data = data[:, test_slice, n_covariate_cols:]
print("data shape:", train_data.shape, valid_data.shape, test_data.shape)

train_repr = model.encode(
    train_data,
    casual=True,
    sliding_length=1,
    sliding_padding=padding,
    batch_size=128
)
valid_repr = model.encode(
    valid_data,
    casual=True,
    sliding_length=1,
    sliding_padding=padding,
    batch_size=128
)
test_repr = model.encode(
    test_data,
    casual=True,
    sliding_length=1,
    sliding_padding=padding,
    batch_size=128
)

For example, we assume t is the first timestamp in test set. For the first sample in the test set, the original input is [t-padding, t]. However, your input is [t, t], which feeds only one timestamp as input, resulting in poor performance and biased distribution.

wiwi · 2023-05-26T04:20:43Z

I have checked your another issue xingyu617/SimTS_Representation_Learning#5 and seem to understand your misunderstanding. The code you provided is inappropriate.

# original code
# all_repr = model.encode(
#     data[:, :, n_covariate_cols:],
#     casual=True,
#     sliding_length=1,
#     sliding_padding=padding,
#     batch_size=128
# )
# print("all_repr",data.shape, all_repr.shape)
# train_repr = all_repr[:, train_slice]
# valid_repr = all_repr[:, valid_slice]
# test_repr = all_repr[:, test_slice]

# your code
train_data = data[:, train_slice, n_covariate_cols:]
valid_data = data[:, valid_slice, n_covariate_cols:]
test_data = data[:, test_slice, n_covariate_cols:]
print("data shape:", train_data.shape, valid_data.shape, test_data.shape)

train_repr = model.encode(
    train_data,
    casual=True,
    sliding_length=1,
    sliding_padding=padding,
    batch_size=128
)
valid_repr = model.encode(
    valid_data,
    casual=True,
    sliding_length=1,
    sliding_padding=padding,
    batch_size=128
)
test_repr = model.encode(
    test_data,
    casual=True,
    sliding_length=1,
    sliding_padding=padding,
    batch_size=128
)

For example, we assume t is the first timestamp in test set. For the first sample in the test set, the original input is [t-padding, t]. However, your input is [t, t], which feeds only one timestamp as input, resulting in poor performance and biased distribution.

You means that you use padding to seperate training set and testing set. But, I think I didnot misunderstand because the whole seqence are feed into eval_forecasting().
In the train.py:

elif task_type == 'forecasting':
            out, eval_res = tasks.eval_forecasting(model, data, train_slice, valid_slice, test_slice, scaler, pred_lens, n_covariate_cols)

Here, 'data' is came from:

elif args.loader == 'forecast_csv_univar':
        task_type = 'forecasting'
        data, train_slice, valid_slice, test_slice, scaler, pred_lens, n_covariate_cols = datautils.load_forecast_csv(args.dataset, univar=True)
        train_data = data[:, train_slice]

If you still think I misunderstand, please give me more detail information about which code to make sure the training set and testing set are independent. Thanks.

zhihanyue · 2023-05-26T05:19:48Z

Please read the code of model.encode() carefully, and you can confirm that the real behavior is our corrected figure.
I've already explained why your code is inappropriate. The following is what I think you misunderstood:

For example, we assume t is the first timestamp in test set. For the first sample in the test set, the original input is [t-padding, t]. However, your input is [t, t], which feeds only one timestamp as input, resulting in poor performance and biased distribution.

We confirm that our code ensures the following behavior:

Behavior 1: In our implementation, assuming t denotes the first timestamp in test set, the input of the first sample in test set is [t-padding, t], and the label of the last sample in valid set is [t-pred_len, t-1]. Only the information in "the input" is used to generate the predicted values.

From Behavior 1, we conclude that there is no data leakage. If you think Behavior 1 is not true, please show the evidence rather than asking for code explaining. I'm sorry I don't have that much time.

Note that we are responsible for Behavior 1. This means that if you find Behavior 1 is not true under specific cases, please reopen the issue. Thanks.

zhihanyue closed this as completed May 26, 2023

xcvil mentioned this issue May 26, 2023

For forecasting tasks in this repository, eval_forecasting() has bugs can lead to unacceptable better forecasting results xingyu617/SimTS_Representation_Learning#5

Closed

zhihanyue reopened this Jun 1, 2023

zhihanyue mentioned this issue Jun 1, 2023

Evaluate code for univariate forecasting task maybe there is information leaking. #36

Open

zhihanyue pinned this issue Jun 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate code for univariate forcasting task maybe is not right. There is information leaking. #35

Evaluate code for univariate forcasting task maybe is not right. There is information leaking. #35

wiwi commented May 26, 2023 •

edited

Loading

zhihanyue commented May 26, 2023 •

edited

Loading

zhihanyue commented May 26, 2023

wiwi commented May 26, 2023

zhihanyue commented May 26, 2023 •

edited

Loading

Evaluate code for univariate forcasting task maybe is not right. There is information leaking. #35

Evaluate code for univariate forcasting task maybe is not right. There is information leaking. #35

Comments

wiwi commented May 26, 2023 • edited Loading

zhihanyue commented May 26, 2023 • edited Loading

zhihanyue commented May 26, 2023

wiwi commented May 26, 2023

zhihanyue commented May 26, 2023 • edited Loading

wiwi commented May 26, 2023 •

edited

Loading

zhihanyue commented May 26, 2023 •

edited

Loading

zhihanyue commented May 26, 2023 •

edited

Loading