-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.Rmd
283 lines (212 loc) · 8.25 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
dev = "svg",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# tscv
<!-- badges: start -->
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
[![Licence](https://img.shields.io/badge/licence-GPL--3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0.en.html)
<!-- badges: end -->
The package `tscv` provides a collection of functions and tools for time series analysis and forecasting as well as time series cross-validation. This is mainly a set of wrapper and helper functions as well as some extensions for the packages `tsibble`, `fable` and `fabletools` that I find useful for research in the area of time series forecasting.
***Disclaimer:*** The `tscv` package is highly experimental and it is very likely that there will be (substantial) changes in the near future.
## Installation
You can install the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("ahaeusser/tscv")
```
## Example
```{r packages, message = FALSE, warning = FALSE}
# Load relevant packages
library(tscv)
library(tidyverse)
library(tsibble)
library(fable)
library(feasts)
```
```{r abbreviations, echo=FALSE, warning=FALSE, message=FALSE, results='hide'}
Sys.setlocale("LC_TIME", "C")
```
## Data preparation
The data set `elec_price` is a `tibble` with day-ahead electricity spot prices in [EUR/MWh] from the ENTSO-E Transparency Platform. The data set contains hourly time series data from 2019-01-01 to 2020-12-31 for 8 European bidding zones (BZN):
* DE: Germany (including Luxembourg)
* DK: Denmark
* ES: Spain
* FI: Finland
* FR: France
* NL: Netherlands
* NO1: Norway 1 (Oslo)
* SE1: Sweden 1 (Lulea)
In this vignette, we will use only four time series to demonstrate the functionality of the package (the data set is filtered to the bidding zones Germany, France, Norway and Sweden). You can use the function `plot_line()` to visualize the four time series. The function `summarise_data()` is used to explore the structure (start date, end date, number of observations and the number missing and zero values). The function `summarise_stats()` calculates descriptive statistics for each time series.
```{r clean_data}
series_id = "bidding_zone"
value_id = "value"
index_id = "time"
context <- list(
series_id = series_id,
value_id = value_id,
index_id = index_id
)
# Prepare data set
main_frame <- elec_price %>%
filter(bidding_zone %in% c("DE", "FR", "NO1", "SE1"))
main_frame
main_frame %>%
plot_line(
x = time,
y = value,
color = bidding_zone,
facet_var = bidding_zone,
title = "Day-ahead Electricity Spot Price",
subtitle = "2019-01-01 to 2020-12-31",
xlab = "Time",
ylab = "[EUR/MWh]",
caption = "Data: ENTSO-E Transparency"
)
summarise_data(
.data = main_frame,
context = context
)
summarise_stats(
.data = main_frame,
context = context
)
```
## Split data into training and testing
To prepare the data set for time series cross-validation (TSCV), you can use the function `make_split()`. This function splits the data into several slices for training and testing (i.e. partitioning into time slices) for time series cross-validation. You can choose between `stretch` and `slide`. The first is an expanding window approach, while the latter is a fixed window approach. Furthermore, we define the (initial) window size for training and testing via `n_init` and `n_ahead`, as well as the step size for increments via `n_skip`. Further options for splitting the data are available via `type` (see function reference for more details).
```{r split_data}
# Setup for time series cross validation
type = "first"
value = 2400 # size for training window
n_ahead = 24 # size for testing window (= forecast horizon)
n_skip = 23 # skip 23 observations
n_lag = 0 # no lag
mode = "slide" # fixed window approach
exceed = FALSE # only pseudo out-of-sample forecast
split_frame <- make_split(
main_frame = main_frame,
context = context,
type = type,
value = value,
n_ahead = n_ahead,
n_skip = n_skip,
n_lag = n_lag,
mode = mode,
exceed = exceed
)
# For illustration, only the first 50 splits are used
split_frame <- split_frame %>%
filter(split %in% c(1:50))
split_frame
```
## Training and forecasting
The training and test splits are prepared within `split_frame` and we are ready for forecasting. The function `slice_train()` slices the data `main_frame` according to the splits within `split_frame`. As we are using forecasting methods from the packages `fable` and `fabletools`, we have to convert the data set `main_frame` from a `tibble` to a `tsibble`. Due to the sample size and computation time, only very simple benchmark methods are used:
* `SNAIVE`: Seasonal naive model with weekly seasonality (from package `fable`)
* `STL-NAIVE`: STL-decomposition model and naive forecast. The series is decomposed via STL and the seasonal adjusted series is predicted via the naive approach. Afterwards, seasonal component is added to the forecasts (from packages `fable` and `feasts`)
* `SNAIVE2`: Variation of the seasonal naive approach. Mondays, Saturdays and Sundays are treated with a weekly lag. Tuesdays, Wednesdays, Thursdays and Fridays are treated with a daily lag.
* `SMEDIAN`: Seasonal median model
The functions `SMEDIAN()` and `SNAIVE2()` are extensions to the `fable` package
```{r train_models}
# Slice training data from main_frame according to split_frame
train_frame <- slice_train(
main_frame = main_frame,
split_frame = split_frame,
context = context
)
train_frame
# Convert tibble to tsibble
train_frame <- train_frame %>%
as_tsibble(
index = time,
key = c(bidding_zone, split)
)
train_frame
# Model training via fabletools::model()
model_frame <- train_frame %>%
model(
"SNAIVE" = SNAIVE(value ~ lag("week")),
"STL-NAIVE" = decomposition_model(STL(value), NAIVE(season_adjust)),
"SNAIVE2" = SNAIVE2(value),
"SMEDIAN" = SMEDIAN(value ~ lag("week"))
)
model_frame
# Forecasting via fabletools::forecast()
fable_frame <- model_frame %>%
forecast(h = n_ahead)
fable_frame
# Convert fable_frame (fable) to future_frame (tibble)
future_frame <- make_future(
fable = fable_frame,
context = context
)
future_frame
```
## Evaluation of forecast accuracy
To evaluate the forecast accuracy, the function `make_accuracy()` is used. You can define whether to evaluate the accuracy by `horizon` or by `split`. Several accuracy metrics are available:
* `ME`: mean error
* `MAE`: mean absolute error
* `MSE`: mean squared error
* `RMSE`: root mean squared error
* `MAPE`: mean absolute percentage error
* `sMAPE`: scaled mean absolute percentage error
* `MPE`: mean percentage error
* `rMAE`: relative mean absolute error (relative to some user-defined benchmark method)
### Forecast accuracy by forecast horizon
```{r accuracy_horizon}
# Estimate accuracy metrics by forecast horizon
accuracy_horizon <- make_accuracy(
future_frame = future_frame,
main_frame = main_frame,
context = context,
dimension = "horizon"
)
accuracy_horizon
# Visualize results
accuracy_horizon %>%
filter(metric == "MAE") %>%
plot_line(
x = n,
y = value,
facet_var = bidding_zone,
facet_nrow = 1,
color = model,
title = "Evaluation of forecast accuracy by forecast horizon",
subtitle = "Mean absolute error (MAE)",
xlab = "Forecast horizon (n-step-ahead)",
caption = "Data: ENTSO-E Transparency, own calculation"
)
```
### Forecast accuracy by split
```{r accuracy_split}
# Estimate accuracy metrics by forecast horizon
accuracy_split <- make_accuracy(
future_frame = future_frame,
main_frame = main_frame,
context = context,
dimension = "split"
)
accuracy_split
# Visualize results
accuracy_split %>%
filter(metric == "MAE") %>%
plot_line(
x = n,
y = value,
facet_var = bidding_zone,
facet_nrow = 1,
color = model,
title = "Evaluation of forecast accuracy by split",
subtitle = "Mean absolute error (MAE)",
xlab = "Split",
caption = "Data: ENTSO-E Transparency, own calculation"
)
```