-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path13_time_series.Rmd
799 lines (547 loc) · 36.2 KB
/
13_time_series.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
# Time series
```{julia chapter_13_libraries, cache = TRUE , results = FALSE, echo=FALSE}
cd("./13_time_series/")
import Pkg
Pkg.activate(".")
using Markdown
using InteractiveUtils
using Plots
using Optim
```
## Predicting the future
Let's imagine this situation for a moment. We are sitting quietly in our house thinking about the beauty of data analysis, when an old friend calls us: Terry. We haven't seen him for a long time, but he tells us that he is forming an important investment fund for which he needs to get some experts in statistics.
The idea is simple, he needs to generate predictive models of the supply and demand of certain commodities so that later, with this precious information, he can determine if the prices of those products are going to go down or up. Then you simply buy long or short positions (you don't need to know much about that, basically you are betting on that prices are going to go up or down, respectively) on the stock market, hoping to make a lot of money.
With respect to the data, what you have available are long series with all the values that were taken by the supply and demand over time. In other words, we have time series.
This type of data has the particularity that its values are correlated since they are values of the same variable that changes over time. For example, the total amount of energy demanded by a nation today is closely related to yesterday's demand, and so on.
For a moment we are paralyzed and realize that we never encounter this type of problem. We take a breath for a few seconds and remember everything we learned about Bayesianism, about neural networks, about dynamic systems. It may be complex, but we will certainly succeed.
We tell Terry we're in. He smiles and gives us the first series: Peanut Chocolate.
```{julia chap_13_plot_1}
begin
chocolate_sales = [445.36,453.20,454.41,422.38,456.04,440.39,425.19,486.21,500.43,521.28,508.95,488.89,509.87,456.72,473.82,525.95,549.83,542.34]
plot(chocolate_sales, title="Daily peanut chocolate sales", legend=false, ylabel="units", xlabel="days")
end
```
As we just said: Notice that the variables they are asking us to work with are values that are evolving over time, so they take the name of **Time Series**. This kind of variables are the ones we are going to deal with in this chapter and have the particularity that, as they evolve in time, the values they take are related to the previous ones. So can you think of something to solve the problem of predicting the next value the series is going to have?
Many ideas must be coming to your mind. Surely some of them thought of taking as a forecast value the average of all the previous ones and some others have thought of taking directly the last value of the series, justifying that the very old values do not affect the next ones. That is to say:
$y_{T+1|T}^- =\frac{1}{T} * \sum_{t=1}^{T}y_t$
Where T is the number of periods for which we have data. Or:
$y_{T+1|T}^- = y_T$
Where we would always take the last value in the series to predict the next.
If we observe this carefully, we might realize that these are two extreme cases of the same methodology: assigning "weights" to the previous values to predict the next one. Basically this is the way we have to indicate how much we are interested in the old observations and how much in the new ones.
In the case of simple averaging, we would be saying that we care exactly the same about all the observations, since from the first to the last they are all multiplied by the same weights:
$y_{T+1|T}^- =\frac{1}{T} * y_1 + \frac{1}{T} * y_2 + ... + \frac{1}{T} * y_T$
$\sum_{t=1}^{T}\frac{1}{T} = 1$
The second equation is only telling us that the sum of all the weights is equal to 1.
Which also happens in the other case, where we assign the weight of the last value equal to 1 and leave all the others at zero:
$y_{T+1|T}^- =0 * y_1 + 0 * y_2 + ... + 1 * y_T = y_T$
But like many things in life, the optimal solution is usually somewhere in between the extremes.
So, a better solution would be one that allows us to take into account all the values of the series, but that the importance we are going to assign to it will decrease as we move away from older values. But how do we do this?
## Exponential Smoothing
A very interesting idea to implement our idea of finding a method that allows us to take into account the most distant values, but assigning them a lesser weight, is that of the **Exponential Smoothing**.
The name may sound very complicated or crazy to you. The reality is that it is very simple. Basically, what we propose is to assign weights that are decreasing exponentially as the observations are getting older, getting to give a preponderant value to the closest values, but without resigning the valuable information that the previous values offer us:
$y_{T+1|T}^- = α*y_T + α(1-α)*y_{T-1} + α(1-α)^2*y_{T-2} + ... + α(1-α)^{T-1}*y_1$
That is:
$y_{T+1|T}^- = \sum_{i=0}^{T-1}α(1-α)^j * y_{T-i}$
Watch these two formulas for a while until you make sure you understand that they are the same :)
This way of writing the method is especially useful because it allows us to regulate how much weight we want to assign to the past values. What does this mean? That we can control how quickly the value of the weights will decline over time. As a general rule, with alpha values close to 1, a lot of weight will be given to the close values and the weights will decay very quickly, and with alpha getting close to 0 the decay will be smoother. Let´s see it:
```{julia}
begin
alphas = [0.2, 0.5, 0.8]
results = []
for alpha in alphas
weights = []
for i in 0:10
weight = alpha*(1-alpha)^i
push!(weights, weight)
end
push!(results, weights)
end
results
end
```
```{julia chap_13_plot_2}
begin
scatter(results[1], label="α=0.2", title="Weight evolution with different values of alpha")
plot!(results[1], label=false, color="blue")
scatter!(results[2], label="α=0.5")
plot!(results[2], label=false, color="green")
scatter!(results[3], label="α=0.8", color="red")
plot!(results[3], label=false, color="red", xlabel="Time periods", ylabel="Weight")
end
```
As we said, as alpha gets closer to one, the importance of the closer values is greater and greater and when it gets closer to zero the opposite happens, that is, the importance is "more distributed" among all the observations. For example, in the case of choosing alpha = 0.5 we would have:
$y_{T+1|T}^- = 0.5*y_T + 0.25*y_{T-1} + 0.125*y_{T-2} + 0.0625*y_{T-3} + ...$
Well, then it seems we have a method that is good for estimating this kind of time series. But you're probably wondering... How do we choose the optimal alpha?
The basic strategy is to go through the entire time serie predicting each of the values it has with the previous ones and then look for the alpha that minimizes the difference between them. Let's see it:
For this, we have to introduce two new ways of writing the same method.
### Weighted average and Component form
Another way to write the same method and that will help us later with the choice of the alpha that best describes our data series is the **Weighted average form**. It simply proposes that the next value is a weighted average between the last value in the time series and the last prediction made:
$y_{t+1|t}^- = αy_t + (1-α)*y_{t|t-1}^-$
Notice that now the sub-index is changed from T to t, denoting that we are referring to any point of the time series and that, to calculate the prediction we are going to use the previous points of the series.
This is really very useful, since we can in this way go through all the time series and generate predictions for each point of it:
First, defining the prediction for the first value of the series as
$y_{1|0}^- = lo$
$y_{2|1}^- = αy_1 + (1-α)*lo$
$y_{3|2}^- = αy_2 + (1-α)*y_{2|1}^-$
$...$
$y_{T+1|T}^- = αy_T + (1-α)*y_{T|T-1}^-$
And if we substitute each equation within the other, we get the equation for prediction:
$y_{3|2}^- = αy_1 + (1-α)*y_{2|1}^-$
$y_{3|2}^- = αy_1 + (1-α)*(αy_1 + (1-α)*lo)$
$y_{3|2}^- = αy_1 + (1-α)*αy_1 + (1-α)^2*lo$
$...$
$y_{T+1|T}^- = \sum_{i=0}^{T-1}α(1-α)^j * y_{T-i} + (1-α)^T*lo$
And as $(1-\alpha)^T$ decays exponentially, for not very high values of T this term already becomes zero. So we obtain the same predicting formula as before.
Well, this is very good! We already have a way to go through the whole time series and make the predictions. This will be very useful since we can make these predictions for different alpha values and observe which one best approximates the whole series. Once this optimal alpha is obtained in this fitting process, we will be able to make the prediction for the future period.
Finally, there is one last way to define our models called **Component form**
$y_{t+h|t}^- = l_t$
$l_t = α*y_t + (1 - α)*l_{t-1}$
In the case of the **Simple Exponential Smoothing** it is identical to the **Weighted average form**, but it will make our work easier later when we want to make the analysis more complex.
### Optimization (or Fitting) Process
Let's start by seeing how using different alpha values we obtain different predicted value curves, defining a function that will make us the predictions of each value in the time series given a starting point (l0) and an alpha:
```{julia}
function SES_weight(α, l0, time_serie)
N = length(time_serie)
y_pred = 0
pred = []
for i in 1:(N)
if i == 1
y_pred = l0
else
y_pred = time_serie[i - 1] * α + y_pred * (1 - α)
end
push!(pred, y_pred)
end
return pred
end
```
Let's read the above algorithm together to make sure we understand that you are applying the formula explained above.
The function SES_weight receives three parameters: The alpha to make the calculation, the value of the first lo prediction, and the time series in question.
The algorithm begins by obtaining the number of points that the time series has and defining a vector in which we will be depositing all the predicted values of each point of it. Then it begins to iterate on the time series applying the formula mentioned above:
For the first predicted value $y_{1|0}^- = y_{pred} = lo$ and for the following ones we apply the formula $y_{t+1|t}^- = α*y_t + (1-α)*y_{t|t-1}^-$. Makes sense, right?
Then let's get to work and see how our prediction curve fits the real data.
```{julia chap_13_predictions , cache = TRUE, results = FALSE}
begin
pred_1 = SES_weight(0.2, 445.36, chocolate_sales)
pred_2 = SES_weight(0.5, 445.36, chocolate_sales)
pred_3 = SES_weight(0.8, 445.36, chocolate_sales)
end
```
```{julia chap_13_plot_3}
begin
plot(chocolate_sales, label="Data", legend=:topleft)
p1 = plot!(pred_1, label="α=0.2 fit")
plot(chocolate_sales, label="Data", legend=:topleft)
p2 = plot!(pred_2, label="α=0.5 fit")
plot(chocolate_sales, label="Data", legend=:topleft)
p3 = plot!(pred_3, label="α=0.8 fit")
plot(p1, p2, p3)
end
```
This is a very cool graphic to look at. It illustrates very clearly how small alphas look mostly like an average of the time series values and, as it starts to get closer to 1, it looks more like taking the last value in the series as a future prediction. As we said at the beginning of the chapter, this method allows us to find intermediate points between the two extremes.
It's very nice to see how the graphs change as the alpha does... but how do we find the best alpha and l0 so that the fit is the best?
#### Loss functions
Somehow we have to be able to quantify how close the predicted curve is to the actual data curve. A very elegant way to do it is defining an error function or loss function which will return, for a certain value of the parameters to be optimized, a global number that tells us just how similar both curves are.
For example, one could go point by point looking at the real value of the time series and subtract the predicted value for that same point: $y_{t} - y_{t|t-1}^-$. This subtraction is usually called a residual or error, so:
$e = y_{t} - y_{t|t-1}^-$
Then you add up all those differences and get the overall value of difference between the two series, right?
Well, actually there is one more step to go. As it can happen that sometimes the predicted value is greater than the value of the time series, the subtraction of these would be negative and in the same way, if the predicted value is lesser the subtraction will take a positive value. This is a complication when adding up these errors since they could cancel each other out.
To solve this, we usually take the absolute value of the subtraction, or the square values. Now the sum of the residuals will not be cancelled, no matter if it is positive or negative:
$SSE = \sum_{i=1}^{T}(y_{t} - y_{t|t-1}^-)^2 = \sum_{i=1}^{T}e^2$
$SAE = \sum_{i=1}^{T}|y_{t} - y_{t|t-1}^-| = \sum_{i=1}^{T}|e|$
Great! Now that we have a logical and quantitative methodology to determine how well our model is fitting the data, all that remains is to implement it. Let's go for it!
```{julia}
function SES_weight_loss(α, l0, time_serie)
loss = 0
N = length(time_serie)
y_pred = 0
for i in 1:(N)
if i == 1
y_pred = l0
else
y_pred = time_serie[i - 1] * α + y_pred * (1 - α)
end
loss += (time_serie[i] - y_pred)^2
end
return loss
end
```
As you can see in the code we take the sum of errors squared as an error function. This algorithm is very similar to the previous one (with which we obtained each predicted point) only that now instead of saving the value, we compute its residual with the real data and we add it to the variable "loss".
As a result we have a function that has as a parameter a time series, alpha and lo; and returns a number that is telling us the general error. Obviously that is the number we want to minimize.
Let's see how the function behaves if we leave fixed lo (actually it is a value that we have to find with the minimization, but we know that it has to be near to the first value of the series) and we change alpha.
```{julia chap_13_plot_4}
begin
aplhas = collect(0:0.01:1)
a = Array{Float64}(undef, length(aplhas))
for i in 1:length(aplhas)
a[i] = SES_weight_loss(aplhas[i], 445.36, chocolate_sales)
end
plot(0:0.01:1, a, xlabel="Alpha", ylabel="Loss", legend=false, title="Loss function for different values of alpha")
end
```
It is really very subtle, but the error function is not strictly decreasing. In fact, somewhere between 0.8 and 0.9 the function starts to grow again, so the minimum is there.
For this kind of problems Julia has a package to optimize functions:
```{julia chap_13_loss,cache = TRUE}
function SES_loss_(params, time_serie=chocolate_sales)
return SES_weight_loss(params[1], params[2], time_serie)
end
```
```{julia}
begin
lower = [0., 400.]
upper = [1., 500.]
initial_x = [0.6, 450.]
res = optimize(SES_loss_,lower, upper, initial_x)
end
```
To use this function more efficiently it is necessary to define a range for the parameters in which the algorithm will perform the search and also a starting point (obviously within that range).
Also, one trick to keep in mind is that this package accepts "univariate" functions, that is, the function you enter only has to have one parameter to optimize. This is not entirely true since, although only one parameter has to be passed, it can be a vector, so that several parameters can be optimized. This is why we define a wrapper function `SES\_loss\_` that facilitates the calculation.
With everything ready, let's look for the values of alpha and lo that minimize our error function:
```{julia}
optim = Optim.minimizer(res)
```
And this is how we came to fit our model, obtaining the best parameters to try to predict the next sales of our beloved peanut chocolate
```{julia}
function SES_weight_forecast(α, l0, time_serie, n_pred)
N = length(time_serie)
y_pred = 0
pred = []
for i in 1:(N)
if i == 1
y_pred = l0
else
y_pred = time_serie[i - 1] * α + y_pred * (1 - α)
end
push!(pred, y_pred)
end
y_pred = time_serie[N] * α + y_pred * (1 - α)
for j in 1:n_pred
push!(pred, y_pred)
end
return pred
end
```
As you can see, the simple exponential smoothing method only gives us a forward value. In other words, it predicts a constant value into the future. This happens because these types of time series have no latent variables defined, such as trend or seasonality. These are variables that add information to the model and allow us to make different predictions for each time we want to predict.
But do not worry about that for now, we will study it well in a short time. For now, let's see how the prediction would look like.
```{julia ,results = FALSE}
forecast = SES_weight_forecast(0.833784, 446.573, chocolate_sales, 5)
```
```{julia chap_13_plot_5}
begin
plot(1:18, chocolate_sales, label="Data", legend=:topleft)
plot!(1:18, forecast[1:18], label="Fitted")
plot!(18:23, forecast[18:end], label="Forecast")
end
```
Perfect! We already have an initial model to attack problems where there is a lot of variability. But this is not always the case.
That same night we talked to Terry, showed him the progress and he loves the direction we are going. He tells us that he has another time series to analyze and that he has a different behavior than the previous one, apparently this one shows a trend in the values...
Let's see:
### Trend Methods
Now that we have built up an intuition of how Simple Exponential Smoothing logic works, wouldn't it be good to have some additional method that allows us to learn if our time series has a certain tendency?
But what does this mean in the first place? It means that there are some processes that inherently, by their nature, follow a marked tendency beyond random noise
For example, if we take the amount of AirPassenger in Australia from 1990 to 2016, we find this graph:
```{julia , results = FALSE}
begin
time_ = collect(1990:2016)
data = [17.55,21.86,23.89,26.93,26.89,28.83,30.08,30.95,30.19,31.58,32.58,33.48,39.02,41.39,41.60, 44.66,46.95,48.73,51.49,50.03,60.64,63.36,66.36,68.20,68.12,69.78,72.60]
end
```
```{julia chap_13_plot_6}
plot(time_, data, label=false)
```
In this type of problem, it would not make any sense for all the predictions to be constant. Here is more latent information that we can get from the data: The trend.
This will be key since, once obtained, it will allow us to generate new values as we want to make more distant forecasts in time.
But how do we include this in our exponential smoothing model?
#### Holt’s linear trend method
This method for making predictions with time series that have a trend consists of a prediction equation and two smoothing equations, one to determine the level and another for the slope (or trend):
$y_{T+h|T}^- = l_t + hb_t$
$l_t = αy_t + (1 - α)(l_{t-1} + b_{t-1})$
$b_t = β*(l_t - l_{t-1}) + (1 - β)b_{t-1}$
In this way, the equation for making predictions is simply to take the last predicted level (up to here it is equal to the Simple Exponential Smoothing) and add as many "slopes" as periods ahead we want to predict. The value of the slope to make the forecast is also the last predicted one.
As you can see, the values of alpha and beta are going to weigh each of the Smoothing equations.
On the one hand, alpha will weight the values of the previous real observation $y_{t-1}$ and the predicted one using the prediction equation with the values of l and b above, to obtain the actual value of the level $l_{t}$
On the other hand, the beta value will be telling us how much we are going to let the value of the slope be modified. This value has the function of weighting between the current slope found $l_t - l_{t-1}$ against the estimated slope in the previous point, to calculate the estimation of the slope in the current period.
In this way, small beta values indicate that the slope is unlikely to change over time, and high values allow the slope to change freely (the value of the "current" slope $l_t - l_{t-1}$ becomes preponderant in the estimation).
With this method, then, forecasts stop being flat to become trended. With this idea in mind, let's translate the math into code again:
```{julia HLT ,cache = TRUE}
function HLT_loss(time_serie, α, β, l0, b0)
N = length(time_serie)
l_t = 0
b_t = 0
l_t_ = 0 #Variable to save l(t-1)
loss = 0
for i in 1:(N)
if i == 1
l_t = l0
b_t = b0
else
l_t = time_serie[i - 1] * α + (l_t + b_t) * (1 - α) #b_t is taking b(t-1) value
b_t = β * (l_t - l_t_) + (1 - β) * b_t
end
l_t_ = l_t
y_pred = l_t + b_t
loss += (time_serie[i] - y_pred)^2
end
return loss
end
```
This function is doing exactly the same as `SES\_weight\_loss`, which we defined earlier. It is good to clarify that, like it, the method needs an initial slope to make the estimation of the first value of the time series. Let's see which parameters optimize the model with the data we have!
```{julia}
function HLT_loss_(params, time_serie=data)
return HLT_loss(time_serie, params[1], params[2], params[3], params[4])
end
```
```{julia}
begin
lower_ = [0., 0., 10., 1.]
upper_ = [1., 1.,30., 3]
initial_x_ = [0.5, 0.5, 15., 2.]
res1 = optimize(HLT_loss_, lower_, upper_, initial_x_)
end
```
As with the "SES" we define a wrapper function to be able to perform the optimization. The minimum and maximum values of lo are obtained with the same criterion: The optimal value has to be close to the first value of the time series, since it is going to be its estimation. With respect to the minimum and maximum value of the slope the criterion is again looking into the data: If you see the graph of the time series you can clearly see that the slope has to be between 1 and 3 (you can see that every 5 years that pass, the number of passengers increases by 10, more or less).
Let´s see the optimal values for the parameters:
```{julia}
optim1 = Optim.minimizer(res1)
```
Perfect! Now that we have the optimal parameters to perform the forecast, we just need to define a function that performs it. For example:
```{julia}
function HLT_forecast(time_serie, α, β, l0, b0, n_pred)
N = length(time_serie)
l_t = 0
b_t = 0
l_t_ = 0
pred = []
for i in 1:(N)
if i == 1
l_t = l0
b_t = b0
else
l_t = time_serie[i - 1] * α + (l_t + b_t) * (1 - α) #b_t "is" b(t-1)
b_t = β * (l_t - l_t_) + (1 - β) * b_t
end
l_t_ = l_t
end
l_t = time_serie[end] * α + (l_t + b_t) * (1 - α)
b_t = β * (l_t - l_t_) + (1 - β) * b_t
for i in 1:n_pred
y_pred = l_t + b_t * i
push!(pred, y_pred)
end
return vcat(time_serie, pred)
end
```
As you can see in the function, the first part of it (the first for) goes through the entire time series using the parameters already optimized and making the best predictions for each point.
Then, when we reach the end of the time series, the second "for" begins (it will iterate the amount of periods we want to predict, value that we enter as "$n\_pred$") to make now forecasts of periods that have not yet happened. To do this, it simply uses the last "level" that was estimated for the last value of the time series, and adds up as many slopes as periods we want: $y_{pred} = l_t + b_t * i$
Finally, it returns a concatenation of the time series plus the values we ask it to predict.
```{julia , results = FALSE}
data_forecasted = HLT_forecast(data, 0.8321, 0.0001, 15.57, 2.102, 15)
```
```{julia chap_13_plot_7}
begin
plot(time_, data_forecasted[1:length(data)], label="Data", legend=:topleft)
plot!(time_[end]:(time_[end]+15), data_forecasted[length(data):end], label="Forecast", title="Forecast for 15 periods ahead")
end
```
And so it is. We already built a tool that allows us to make predictions for variables that show a trend.
But surely you are thinking that assuming that the trend is going to be maintained during all the years that we are forecasting is a bit excessive? And it's true that it is.
It is known that this type of methods usually overestimate the values of the variable to predict, exactly because they suppose that the tendency continues.
A improvement of this method that helps to deal with this problem it is the **Damped trend methods**. Basically, what it does is add a coefficient that flattens the curve as we want to make more distant predictions in time. This improvement makes better predictions than the common trend methods, leaving the formulas as:
$y_{T+h|T}^- = l_t + (ϕ + ϕ^2 + ... + ϕ^h)b_t$
$l_t = αy_t + (1 - α)(l_{t-1} + ϕ*b_{t-1})$
$b_t = β*(l_t - l_{t-1}) + (1 - β)*ϕ*b_{t-1}$
In this way, at the time of making the predictions and as we want to make them over more distant periods, instead of adding a unit of b_t to each one, we add a smaller fraction each time until the sum is practically constant.
For example, before, with the simple trend method, if we want to estimate the value of the next period the count to be done was (assuming, for example, that $l_t$ and $b_t$ in the last period of the time series were worth 300 and 1.5):
$y_{T+1|T}^- = 300 + 1.5 = 301.5$
and if we wanted to make a forecast for the next period to the one just calculated, it would be:
$y_{T+2|T}^- = 300 + 1.5 + 1.5 = 300 + 1.5*2 = 303$
That is, a constant unit of $b_t$ is added for each new period that is predicted.
On the other hand, when we use the "damped method" the fraction of $b_t$ that we are adding falls exponentially, since we are multiplying it by a number less than 1 and that is rising to an increasing power. Following the same example, and adding a damping parameter equal to 0.9, we would obtain:
$y_{T+1|T}^- = 300 + 0.9*1.5 = 300 + 1.35 = 301.35$
$y_{T+2|T}^- = 300 + 0.9*1.5 + 0.9^2 * 1.5 = 300 + 0.9*1.5 + 0.81 * 1.5 = 302.565$
The process continues in this way until the value of $ϕ^t$ is zero, that is to say that although it continues going further in the periods, to the result of the forecast no longer is added any significant term. This is why it is said that the damped method tends to make flat forecasts in the long term.
Finally, let's note that the damping parameter takes values between 1 and 0. being completely identical to the simple trend method for a value of 1 and completely flat for a value of 0. Let´s see:
```{julia}
function Damped_HLT_forecast(time_serie, α, β, l0, b0, ϕ, n_pred)
N = length(time_serie)
l_t = 0
b_t = 0
l_t_ = 0
pred = []
for i in 1:(N)
if i == 1
l_t = l0
b_t = b0
else
l_t = time_serie[i - 1] * α + (l_t + b_t) * (1 - α) #b_t "is" b(t-1)
b_t = β * (l_t - l_t_) + (1 - β) * b_t
end
l_t_ = l_t
y_pred = l_t + b_t
end
l_t = time_serie[end] * α + (l_t + b_t) * (1 - α)
b_t = β * (l_t - l_t_) + (1 - β) * b_t
phi = 0
for i in 1:n_pred
phi += ϕ^i
y_pred = l_t + b_t * phi
push!(pred, y_pred)
end
return vcat(time_serie, pred)
end
```
```{julia damped ,cache = TRUE, results = FALSE}
damped_forecast = Damped_HLT_forecast(data, 0.8321, 0.0001, 15.57, 2.102, 0.96 , 15)
```
```{julia chap_13_plot_8}
begin
plot(time_, damped_forecast[1:length(data)], label="Data", legend=:topleft)
plot!(time_[end]:(time_[end]+15), data_forecasted[length(data):end], label="Holt´s Forecast")
plot!(time_[end]:(time_[end]+15), damped_forecast[length(data):end], label="Damped Forecast", title="Damped forecast with ϕ=0.96")
end
```
Incredible! Terry calls us and tells us that using the "damped" model was key to a trade he made. He is very excited about our work and gives us one last challenge.
He tells us that he is trying to enter more complex markets, particularly tourism, but that he doesn't understand how to approach this type of series since they show a lot of variability. For example, we start analysing the visitor nights in Australia spent by international tourists:
```{julia , results = FALSE}
begin
australia_tourist = [42.20566, 24.64917, 32.66734, 37.25735, 45.24246, 29.35048, 36.34421, 41.78208, 49.27660, 31.27540, 37.85063, 38.83704, 51.23690, 31.83855, 41.32342, 42.79900, 55.70836, 33.40714, 42.31664, 45.15712, 59.57608, 34.83733, 44.84168, 46.97125, 60.01903, 38.37118, 46.97586, 50.73380, 61.64687, 39.29957, 52.67121, 54.33232, 66.83436, 40.87119, 51.82854, 57.49191, 65.25147, 43.06121, 54.76076, 59.83447, 73.25703, 47.69662, 61.09777, 66.05576]
time = collect(2005:0.25:2015.75)
end
```
```{julia chap_13_plot_9}
plot(time, australia_tourist, xlabel="Year", ylabel="""Nigth visitors (millions)""", legend=false)
```
As you can see, the series shows a lot of variability. The values go up and down constantly.
After being a long time with Terry looking for ideas to address this type of data we realize that these ups and downs are not random, indeed, the form is repeated year after year! We realize that we are facing a problem with seasonality.
### Seasonality Methods
For this type of model, in addition to the level and trend parameters, it is necessary to add another component that captures the time of year (actually it can be any other period in which seasonality occurs) in which we are and somehow influences the predicted outcome.
In this way, the level and trend parameters will behave as we have already studied them, but the seasonality parameter will be adjusting (adding and subtracting or multiplying and dividing, depending on the method) the predicted level.
In the same line with the Exponential smoothing that we have been studying, the model can be made more complex to add an additive or multiplicative seasonality. Here we will stop at the additive:
#### Holt-Winters’ seasonal additive method
For this method, we will need to add a smoothing equation for the seasonality parameters. In addition, we will have to indicate the frequency with which the seasonality occurs in the analyzed period and denote it with the letter $m$. For example, for a monthly seasonality $m = 12$ and for a semester one $m = 2$.
So, the model becomes like this:
$y_{T+h|T}^- = l_t + hb_t + s_{t+h-m(k+1)}$
$l_t = α(y_t - s_{t-m}) + (1 - α)(l_{t-1} + b_{t-1})$
$b_t = β*(l_t - l_{t-1}) + (1 - β)b_{t-1}$
$s_t = γ(y_t - l_{t-1} - b_{t-1}) + (1 - γ)s_{t-m}$
As we had already anticipated, the seasonality term is added to the forecast equation. The term $k$ is the integer part of $(h-1)/m$ and its function is to ensure that we always make predictions with the values of the parameters in the last year of the time series we have as data.
The level equation is still a weighted average, only now the seasonal component is added to the observation $(y_t - s_{t-m})$. The other part of the average is the non-seasonal forecast $(l_{t-1} + b_{t-1})$.
The trend equation remains the same, and the seasonality equation also represents a weighted average between the current $(y_t - l_{t-1} - b_{t-1})$ and previous year's index for the same season $s_{t-m}$. This average is weighted by the γ parameter.
Now so, let's put these weird equations into code:
```{julia}
function HW_Seasonal(time_serie, α, β, γ, l0, b0, s0, m)
N = length(time_serie)
l_t = 0
b_t = 0
l_t_ = 0 #Variable to save l(t-1)
b_t_ = 0 #Variable to save b(t-1)
s_ = 0
s = s0
pred = []
for i in 0:(N - 1)
if i == 0
l_t = l0
b_t = b0
else
l_t = (time_serie[i] - s_) * α + (l_t_ + b_t_) * (1 - α)
b_t = β * (l_t - l_t_) + (1 - β) * b_t_
end
l_t_ = l_t
b_t_ = b_t
s_ = s[i%m + 1]
y_pred = l_t + b_t + s[i%m + 1]
push!(pred, y_pred)
s[i%m + 1] = γ * (time_serie[i + 1] - l_t_ - b_t_) + (1 - γ) * s[i%m + 1]
end
return pred
end
```
What this code does is, given the optimal values of each parameter, it makes the prediction of the next point (h = 1) and stores them in the "pred" array. Staying a while analyzing how the algorithm is working is an excellent way to ensure full understanding of the method :)
To obtain these parameters it is necessary to write the loss function as we have been doing in the previous ones. As a challenge for the reader, we propose to write this loss function using this one as a basis. To help you, you can look at the intimate relationship between the functions that are storing the predictions and the loss functions already written in the previous methods.
In this particular case, in which our data is quarterly, m = 4. Doing the same procedure as always to optimize the function is obtained:
```{julia,results = FALSE}
begin
α = 0.262198
β = 2.45705e-15
γ = 0.454665
l0 = 32.4906
b0 = 0.701097
s0 = [9.20323, -9.18886, -2.13981, 1.35917]
end;
```
It is interesting to stop and look at these values that we obtained.
First of all it is remarkable how, for the first time, the parameter $α$ is taking a relatively low. This makes perfect sense, as the values are now much more connected to values further away than the immediate previous one, for example. They are connected precisely because of their seasonality.
It is also interesting to note how for the initial values of the seasonality not only one value is needed, but also 4. As a general case, as much as m, the frequency of the seasonality, will be needed. This can be seen as we now need to have a whole "year 0" to make the estimates for the first year of the time series.
Now, let's see the function in action:
```{julia , results = FALSE}
season_fitted = HW_Seasonal(australia_tourist, 0.262198, 2.45705e-15, 0.454665, 32.4906, 0.701097, [9.20323, -9.18886, -2.13981, 1.35917], 4)
```
```{julia chap_13_plot_10}
begin
plot(time, season_fitted, label = "fitted")
plot!(time, australia_tourist, label = "data", legend=:topleft)
end
```
As you can see, the fit is very good. It is also interesting how you can appreciate the exponential smooth in the step between the valley and the peak: It starts without having it and therefore the adjusted series neither, but as it appears, the model learns it and also starts to forecast it.
Excellent! Now that we have our model, it's time to use it and call Terry to tell him what actions to take in his trading strategy:
```{julia}
function HW_Seasonal_forecast(time_serie, α, β, γ, l0, b0, s0, m, n_pred)
N = length(time_serie)
l_t = 0
b_t = 0
l_t_ = 0 #Variable to save l(t-1)
b_t_ = 0 #Variable to save b(t-1)
s_ = 0
s = s0
pred = []
for i in 0:(N - 1)
if i == 0
l_t = l0
b_t = b0
else
l_t = (time_serie[i] - s_) * α + (l_t_ + b_t_) * (1 - α)
b_t = β * (l_t - l_t_) + (1 - β) * b_t_
end
l_t_ = l_t
b_t_ = b_t
s_ = s[i%m + 1]
y_pred = l_t + b_t + s[i%m + 1]
push!(pred, y_pred)
s[i%m + 1] = γ * (time_serie[i + 1] - l_t_ - b_t_) + (1 - γ) * s[i%m + 1]
end
l_t = (time_serie[end] - s_) * α + (l_t + b_t) * (1 - α)
b_t = β * (l_t - l_t_) + (1 - β) * b_t_
for i in N:(N+n_pred - 1) #sino hace una pred de mas
y_pred = l_t + b_t*(i-N+1) + s[i%m + 1]
#The trend has to be added as many times as periods we want to forecast.
push!(pred, y_pred)
end
return pred
end
```
```{julia seasonal, cache = TRUE, results = FALSE}
season_forecast = HW_Seasonal_forecast(australia_tourist, 0.262198, 2.45705e-15, 0.454665, 32.4906, 0.701097, [9.20323, -9.18886, -2.13981, 1.35917], 4, 8)
```
```{julia chap_13_plot_11}
begin
plot(time, australia_tourist, label="Data", legend=:topleft)
plot!(time, season_forecast[1:44], label="Fitted")
plot!(time[end]:0.25:time[end]+2, season_forecast[44:end], label="Forecast")
end
```
Well, good! We started this chapter not knowing how to tackle time series forecasting problems and ended up building a wide variety of models for different types of data, all while making our friend Terry a lot of money!
As a final summary, when dealing with a time series it is very important to be able to define if it has any latent variables such as trend or seasonality. Once we can find that underlying information, we will be able to generate forecasts with confidence. We just need to look deeper.
## Summary
In this chapter, we learned the basic foundations of time series analysis.
We defined what a time series is and delved into a particular method, the exponential smoothing, that allows us to take into account the most distant values of our data.
Finally, we explained more complex versions of the method and used them to make predictions in different kinds of scenarios.
When the processes followed a marked tendency, we used the trend method and the damped trend method to make long term predictions.
When the process exhibited seasonal trends, we utilized the Holt-Winters’ seasonal method.
## References
- [Forecasting: Principles and Practice, Chap 7](https://otexts.com/fpp2/expsmooth.html)
## Give us feedback
This book is currently in a beta version.
We are looking forward to getting feedback and criticism:
- Submit a GitHub issue [**here**](https://github.com/unbalancedparentheses/data_science_in_julia_for_hackers/issues).
- Mail us to [**martina.cantaro\@lambdaclass.com**](mailto:martina.cantaro@lambdaclass.com)
Thank you!