-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path13-time-series-analysis.Rmd
2981 lines (2442 loc) · 113 KB
/
13-time-series-analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Time Series Analysis
##### [[\[book\]]{.underline}](pdf/book13.pdf){target="_blank"}
**CHAPTER OBJECTIVES**
In this chapter, we study data observed over time, time series, and
introduce about:.
\-\-- What is time series analysis and what are types of time series
models?\
\-\-- How to smooth a time series.\
\-\-- How to transform a time series.\
\-\-- Prediction method using regression model.\
\-\-- Prediction method using exponential smoothing model.\
\-\-- Prediction method for seasonal time series.
We will be mainly focused on descriptive methods and simple models, and
discussion of the Box-Jenkins model and other theoretical models will
not be discussed.
:::
:::
## What is Time Series Analysis?
::: mainTable
**Time series** refers to data recorded according to changes in time. In
general, observations are made at regular time intervals such as year,
season, month, or day, and this is called a **discrete time series**.
There may be time series that are continuously observed, but this book
will only deal with the analysis of discrete time series.
An example of a discrete time series is the population of Korea as shown
in [Table 13.1.1]{.table-ref}. This data is from the census conducted every five
years in Korea from 1925 to 2020 (except for 1944 and 1949).
::: textLeft
[Table 13.1.1]{.table-ref} Population of Korea
:::
-----------------------------------------------------------------------
Year Population
----------------------------------- -----------------------------------
1925\ 19020030\
1930\ 20438108\
1935\ 22208102\
1940\ 23547465\
1944\ 25120174\
1949\ 20166756\
1955\ 21502386\
1960\ 24989241\
1966\ 29159640\
1970\ 31435252\
1975\ 34678972\
1980\ 37406815\
1985\ 40419652\
1990\ 43390374\
1995\ 44553710\
2000\ 45985289\
2005\ 47041434\
2010\ 47990761\
2015\ 51069375\
2020\ 51829136\
-----------------------------------------------------------------------
As shown in the table above, it is not easy to understand the overall
shape of the time series displayed in numbers. The first step in time
series analysis is to observe the time series by drawing a time series
plot with the X axis as time and the Y axis as time series values. For
example, the time series plot of the total population in Korea is shown
in [Figure 13.1.1]{.figure-ref}.
+-----------------------------------------------------------------------+
| ![](Figure/Fig130101.png){.imgFig600400} |
| |
| ::: figText |
| [Figure 13.1.1]{.figure-ref}Time Series of Korea Population |
| ::: |
+-----------------------------------------------------------------------+
Observing this figure, Korea's population has an overall increasing
trend, but the population decreased sharply in 1944-1949 due to World
War II. It can be seen that the population expanded rapidly after the
Korean war in 1953 and slowed since 1990. It can be seen that the growth
has slowed further in the last 10 years. By observing the time series in
this way, trends, change points, and outliers can be observed, and it is
helpful in selecting an analysis model or method suitable for the data.
Time series that we frequently encounter include monthly sales of
department stores and companies, daily composite stock index, annual
crop production, yearly export and import time series, and yearly
national income and economic growth rate, and so on.
[Table 13.1.2]{.table-ref} shows the percent increase in monthly sales of the US
toy/game industry for the past 6 years, and [Figure 13.1.2]{.figure-ref} is a plot
of this time series. As it is the rate of change from the previous
month, it can be observed that it is seasonal data showing a large
increase in November and December every year, moving up and down based
on 0. However, May 2020 is an extreme with an increase rate of 211%
unlike other years. For time series, you can better examine the
characteristics of the data by converting the raw time series into the
rate of change.
::: textLeft
[Table 13.1.2]{.table-ref} Percent Increase, Monthly Sales of Toy/Game in US(%)
(Source: Bureau of Census, US)
:::
-----------------------------------------------------------------------
Year.month Percent Increase
----------------------------------- -----------------------------------
2016.01\ -66.7\
2016.02\ 2.5\
2016.03\ 12.5\
2016.04\ -9.0\
2016.05\ -0.6\
2016.06\ -4.4\
2016.07\ 4.3\
2016.08\ 0.0\
2016.09\ 6.1\
2016.10\ 8.6\
2016.11\ 56.4\
2016.12\ 53.6\
2017.01\ -65.6\
2017.02\ -0.1\
2017.03\ 14.7\
2017.04\ -5.7\
2017.05\ -2.4\
2017.06\ -5.5\
2017.07\ 1.3\
2017.08\ 4.2\
2017.09\ 8.4\
2017.10\ 7.2\
2017.11\ 54.9\
2017.12\ 45.5\
2018.01\ -63.6\
2018.02\ 3.6\
2018.03\ 39.8\
2018.04\ -21.0\
2018.05\ 5.9\
2018.06\ -12.4\
2018.07\ -16.9\
2018.08\ 5.2\
2018.09\ 7.5\
2018.10\ 8.5\
2018.11\ 54.9\
2018.12\ 5.8\
2019.01\ -46.2\
2019.02\ -3.8\
2019.03\ 16.3\
2019.04\ -8.4\
2019.05\ 6.6\
2019.06\ -5.3\
2019.07\ 0.8\
2019.08\ 7.7\
2019.09\ -1.2\
2019.10\ 12.2\
2019.11\ 46.7\
2019.12\ 11.7\
2020.01\ -49.1\
2020.02\ 2.2\
2020.03\ -28.2\
2020.04\ -58.2\
2020.05\ 211.1\
2020.06\ 26.8\
2020.07\ -0.8\
2020.08\ 7.0\
2020.09\ 4.9\
2020.10\ 5.8\
2020.11\ 44.1\
2020.12\ 8.5\
2021.01\ -37.1\
2021.02\ -12.2\
2021.03\ 37.0\
2021.04\ -10.3\
2021.05\ -0.5\
2021.06\ -2.0\
2021.07\ 4.6\
2021.08\ 1.8\
2021.09\ 5.2\
2021.10\ 6.4\
2021.11\ 40.0\
2021.12\ 10.6\
-----------------------------------------------------------------------
+-----------------------------------------------------------------------+
| ![](Figure/Fig130102.png){.imgFig600400} |
| |
| ::: figText |
| [Figure 13.1.2]{.figure-ref}Percent Increase, Monthly Sales of Toy/Game in US(%) |
| ::: |
+-----------------------------------------------------------------------+
Most time series have four components: trend, seasonal, cycle, and other
irregular factors. **Trend** is a case in which a time series has a
certain trend, such as a line or a curved shape as time elapses, and
there are various types of trends. Trends can be understood as a
consumption behavior, population variations, and inflation that appear
in time series over a long period of time. **Seasonal** factors are
short-term and regular fluctuation factors that exist quarterly,
monthly, or by day of the week. Time series such as monthly rainfall,
average temperature, and ice cream sales have seasonal factors. Seasonal
factors generally have a short cycle, but fluctuations when the cycle
occurs over a long period of time rather than due to the season is
called a **cycle** factor. By observing these cyclical factors, it is
possible to predict the boom or recession of a periodic economy.
[Figure 13.1.3]{.figure-ref} shows the US S&P 500 Index from 1997 to 2016, and a
six-year cycle can be observed.
+-----------------------------------------------------------------------+
| ![](Figure/Fig130103.png){.imgFig600400} |
| |
| ::: figText |
| [Figure 13.1.3]{.figure-ref}\] US S&P500 Index (1997- 2016) |
| ::: |
+-----------------------------------------------------------------------+
Other factors that cannot be explained by trend, season, or cyclical
factors are called **irregular** or **random** factors, which refer to
variable factors that appear due to random causes regardless of regular
movement over time.
:::
### Time Series Model
By observing the time series, you can predict how this time series will
change in the future by building a time series model that fits the
probabilistic characteristics of this data. Because the time series
observed in reality has a very diverse form, the time series model is
also very diverse, from simple to very complex. In general, time series
models for a single variable can be divided into the following four
categories.
##### A. Regression Model
A model that explains data or predicts the future by expressing a time
series in the form of a function related to time is the most intuitive
and easy to understand model. That is, when a time series is an
observation of a random variable, $Y_1 , Y_2 , ... , Y_n$, it is
expressed as the following model: $$
Y_t \;=\; f(t) \;+\; \epsilon _ t , \,\, t=1,2, ... , n
$$ Here $\epsilon_t$ is the error of the time series that cannot
be explained by a function $f(t)$. In general $\epsilon_t$ is assumed
independent, $E(\epsilon_t ) = 0$ , and $Var(\epsilon_t ) = \sigma^2)$
which is called a white noise. For example, the following model can be
applied to a time series in which the data is horizontal or has a linear
trend.
$\qquad \text{Horizontal:} \qquad Y_t \;=\; \mu \;+\; \epsilon _ t$\
$\qquad \text{Linear Trend:}\quad Y_t \;=\; a \;+\; b\, t \;+\; \epsilon _ t$
##### B. Decomposition Model
The model that decomposes the time series into four factors, i.e.,
trend($T_t$), cycle($C_t$), seasonal($S_t$), and irregular($I_t$), is an
analysis method that has been used for a long time based on empirical
facts. It can be divided into additive model and multiplicative model.
$\qquad \text{Additive Model:} \qquad \qquad Y_t \;=\; T_t \;+\; C_t \;+\; S_t \;+\; I_t$\
$\qquad \text{Multiplicative Model:}\qquad Y_t \;=\; T_t \;\times\; C_t \;\times\; S_t \;\times \; I_t$
Here $T_t$, $C_t$, $S_t$ are deterministic function, $I_t$ is a random
variable. If we take the logarithm of a multiplicative model, it becomes
an additive model. If the number of data is not enough, the cycle factor
can be omitted in the model.
##### C. Exponential Smoothing Model
Time series data are often more related to recent data than to past
data. The above two types of models are models that do not take into
account the relationship between the past time series data and the
recent time series data. Models using moving averages and exponential
smoothing are often used to explain and predict data using the fact that
time series forecasting is more related to recent data.
##### D. Box-Jenkins ARIMA Model
The above models are not methods that can be applied to all types of
time series, and the analyst selects and applies them according to the
type of data. Box and Jenkins presented the following general ARIMA
model that can be applied to all time series of stationary or
nonstationary type as follows: $$
Y _{t} \,=\, \mu \,+\, \phi_{1} \, Y _{t-1} \,+\, \phi _{2} \, Y_{t-2} \,+\, \cdots \,+\, \epsilon _{t} \,+\, \theta _{1} \, \epsilon _{t-1} \,+\, \theta _{2} \, \epsilon _{t-2} \,+\, \cdots
$$ The ARIMA model considers the observed time series as a sample
extracted from a population time series, studies the probabilistic
properties of each model, and establishes an appropriate time series
model through parameter estimation and testing. For the ARIMA model,
autocorrelation coefficients between time lags are used to identify a
model. The ARIMA model is beyond the scope of this book, so interested
readers are encouraged to consult the bibliography.
In the above time series model, the regression model and ARIMA model are
systematic models based on statistical theory, and the decomposition
model and exponential smoothing model are methods based on experience
and intuition. In general, regression models using mathematical
functions and models using decomposition are known to be suitable for
predicting slow-changing time series, whereas exponential smoothing and
ARIMA models are known to be effective in predicting very rapidly
changing time series.
For all time series models, it is impossible to predict due to sudden
changes. And because time series has so many different forms, it cannot
be said that one time series model is always superior to another.
Therefore, rather than applying only one model to a time series, it is
necessary to establish and compare several models, combine different
models, or make an effort to determine the final model by combining
opinions of experts familiar with the time series.
### Evaluation of Time Series Model
Let the time series be the observed values of the random variables
$Y_1 , Y_2 , ... , Y_n$ and $\hat Y_1 , \hat Y_2 , ... , \hat Y_n$ be
the values predicted by the model. If the model agrees exactly, the
observed and predicted values are the same, and the model error
$\epsilon_t$ is zero. In general, it is assumed that the error
$\epsilon_t$'s of the time series model are independent random variables
which follow the same normal distribution with a mean of 0 and a
variance of $\sigma^2$. The accuracy of a time series model can be
evaluated using residual, $Y_t \,-\, {\hat Y}_t$, which is a measure by
subtracting the predicted value from the observed value. In general, the
following mean squared error (MSE) is commonly used for the accuracy of
a model and the smaller the MSE value, the more appropriate the
predicted model is judged. $$
{MSE} \,=\, \frac{ \sum_{t=1}^n \, ( Y_t\,-{\hat Y}_t \,)^{2} } {n}
$$ The mean square error is used as an estimator for the variance
$\sigma^2$ of the error $\epsilon_t$. Since MSE can have a large value,
the root mean squared error (RMSE) is often used. $$
{RMSE} \,=\, \sqrt{MSE }
$$
:::
:::
## Smoothing of Time Series
::: mainTable
Original time series data can be used to make a time series model by
observing trends, but in many cases, time series can be observed after
smoothing to unerstand better. In a time series such as stock price, it
is often difficult to find a trend because of temporary or short-term
fluctuations due to accidental coincidences or cyclical factors. In this
case, smoothing techniques are used as a method to effectively grasp the
overall long-term trend by removing temporary or short-term
fluctuations. The centered moving average method and the exponential
smoothing method are widely used.
:::
### Centered Moving Average
::: mainTable
The time series in [Table 13.2.1]{.table-ref} is the world crude oil price based
on the closing price every year from 1987 to 2022. Looking at \<Figure
13.2.1\>, it can be seen that the short-term fluctuations in the time
series are large. However, causes such as oil shocks are short-term and
not continuous, so if we are interested in the long-term trend of
gasoline consumption, it would be more effective to look at the
fluctuations caused by short-term causes.
::: textLeft
[Table 13.2.1]{.table-ref} Price of Crude Oil (End of Year Price, US\$) and
5-point Centered Moving Average
:::
-----------------------------------------------------------------------
Year Price of Oil 5-point Centered Moving
Average
----------------------- ----------------------- -----------------------
1987\ 16.74\ \
1988\ 17.12\ \
1989\ 21.84\ 20.666\
1990\ 28.48\ 21.216\
1991\ 19.15\ 20.630\
1992\ 19.49\ 19.816\
1993\ 14.19\ 18.028\
1994\ 17.77\ 19.378\
1995\ 19.54\ 19.010\
1996\ 25.90\ 18.600\
1997\ 17.65\ 20.198\
1998\ 12.14\ 21.634\
1999\ 25.76\ 20.446\
2000\ 26.72\ 23.158\
2001\ 19.96\ 27.232\
2002\ 31.21\ 30.752\
2003\ 32.51\ 37.620\
2004\ 43.36\ 45.798\
2005\ 61.06\ 58.746\
2006\ 60.85\ 61.164\
2007\ 95.95\ 68.370\
2008\ 44.60\ 74.434\
2009\ 79.39\ 82.030\
2010\ 91.38\ 81.206\
2011\ 98.83\ 91.920\
2012\ 91.83\ 86.732\
2013\ 98.17\ 75.882\
2014\ 53.45\ 66.866\
2015\ 37.13\ 60.592\
2016\ 53.75\ 49.988\
2017\ 60.46\ 51.526\
2018\ 45.15\ 53.804\
2019\ 61.14\ 58.096\
2020\ 48.52\ 67.394\
2021\ 75.21\ \
2022\ 106.95\ \
-----------------------------------------------------------------------
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[134])" src="QR/eStatU330_TimeseriesSmoothing.svg" type="image"/>
</div>
<div>
![](Figure/Fig130201.png){.imgFig600400}
::: figText
[Figure 13.2.1]{.figure-ref} Price of Crude Oil and 5-point Moving Average
:::
</div>
</div>
The N-point **centered moving average** of a time series refers to the
average of N data from a single point in time. For example, in crude oil
price data, the value of the five-point moving average for a specific
year is the average of the data for two years before the specific year,
that year, and the data for the next two years. Expressed as an
expression, if $M_t$ is a moving average in time $t$, the 5-point
centered moving average is as follows: $$
M_t = \frac{Y_{t-2} \,+\, Y_{t-1} \,+\, Y_{t} \,+\, Y_{t+1} \,+\, Y_{t+2} } {5 }
$$ For example, the 5-point centered moving average for 1989 is as
follows.
$\qquad M_{1989} \,=\, \frac {Y_{1987} + Y_{1988} +Y_{1989} + Y_{1990} + Y_{1991} } {5 }$\
$\qquad \qquad \quad =\, \frac {16.74 + 17.12 + 21.84 + 28.48 + 19.15} {5} \,=\, 20.6660$
[Table 13.2.2]{.table-ref} shows the values of all 5-points centered moving
averages obtained in this way and [Figure 13.2.1]{.figure-ref} is the graph of
5-points moving average. Note that the moving averages for the first two
years and the last two years cannot be obtained here. It can be seen
that the graph of the moving average is better for grasping the
long-term trend than the graph of the original data because short-term
fluctuations are removed.
The choice of a value N for the N-point moving average is important. A
large value of N will provide a smoother moving average, but it has the
disadvantage of losing more points at both ends and insensitive to
detecting important trend changes. On the other hand, if you choose
small N, you will lose less data at both ends, but you may not be able
to get the smoothing effect because you will not sufficiently eliminate
short-term fluctuations. In general, try a few values N to reflect
important changes that should not be missed, while achieving a smoothing
effect and balancing the points not to lose too much at both ends.
If the value of N is an even number, there is a difficulty in obtaining
a central moving average with the same number of data on both sides of
the base year. For example, the center of the four-point moving average
from 1987 to 1990 is between 1988 and 1989. If you denote this as
$M_{1988.5}$, it can be calculated as follows:
$\qquad M_{1988.5} \,=\, \frac {Y_{1987} + Y_{1988} +Y_{1989} + Y_{1990} } {4 }$\
$\qquad \qquad \quad \,=\, \frac {16.74 + 17.12 + 21.84 + 28.48 } {4} \,=\, 21.045$
The 4-point moving average obtained in this way is called a non-central
4-points moving average. In the case of this even number N, the
non-central moving average does not match the observation year of the
original data, which is inconvenient. In the case of this even number N,
it is calculated as the average of the noncentral moving average values
of two adjacent non-central moving averages. In other words, the central
four-point moving average in 1989 is the average of $M_{1988.5}$ and
$M_{1989.5}$ as follows:
$\qquad M_{1989} \,=\, \frac {M_{1988.5} \,+\, M_{1989.5} } {2 }$\
$\qquad \qquad \quad =\, \frac {21.0450 \,+\, 21.6475 } {2} \,=\, 21.3463$
If the time series is quarterly or monthly, a 4-point central moving
average or a 12-point central moving average is an average of one year,
so it is often used to observe data without seasonality.
:::
### Exponential Smoothing
::: mainTable
3-point moving average can be considered the weighted average of three
data with each weight $\frac{1}{3}$ as follows: $$
M_t \,=\, \frac{Y_{t-1} \,+\, Y_{t} \,+\, Y_{t+1} } {3 } \,=\, \frac{1}{3}Y_{t-1} \,+\, \frac{1}{3}Y_{t} \,+\, \frac{1}{3}Y_{t+1}
$$ When the weights are $w_1 , w_2 , ... , w_n$, the weighted
moving average $M_t$ of the time series is defined as follows: $$
M_t \,=\, \sum_{i=1}^{n} w_i Y_{i}
$$ where $n$ is the number of data, $w_i \ge 0$ and
$\sum_{i=1}^{n} w_i = 1$.
Various weighted averages with different weights can be used depending
on the purpose. Among them, a smoothing method that gives more weight to
data closer to the present and smaller weights as it is farther from the
present is called **exponential smoothing**. The exponential smoothing
method is determined by an **exponential smoothing constant** $\alpha$
that has a value between 0 and 1. The exponentially smoothed data $E_t$
is calculated as follows:
$\qquad E_{1} \,=\, \alpha \,Y_{1} \,+\, (1- \alpha)\, E_{0}$\
$\qquad E_{2} \,=\, \alpha \,Y_{2} \,+\, (1- \alpha)\, E_{1}$\
$\qquad E_{3} \,=\, \alpha \,Y_{3} \,+\, (1- \alpha)\, E_{2}$\
$\qquad \cdots$\
$\qquad E_{t} \,=\, \alpha \,Y_{t} \,+\, (1- \alpha)\, E_{t}$\
Here, an initial value $E_{0}$ is required, and $Y_1$ is usually used a
lot, and the average value of the data can also be used. The
exponentially smoothed value $E_t$ at the point in time $t$ gives weight
$\alpha$ to the current data, and the $1-\alpha$ weight to the previous
smoothed data is given. The exponentially smoothed value $E_t$ can be
represented with the original data $Y_t$ as follows: $$
E_t \,=\, \alpha Y_t + (1-\alpha) Y_{t-1} + \alpha(1-\alpha)^2 Y_{t-2} + \cdots + \alpha(1-\alpha)^{t-2} Y_2 + (1-\alpha)^{t-1} Y_1
$$ Therefore, the exponential smoothing method uses all data from
the present and the past, but gives the current data the highest weight
α, and gives a lower weight as the distance from the present time
increases.
Exponential smoothing of the crude oil price in [Table 13.2.1]{.table-ref} with
the initial value $E _{1986} = Y _{1987}$ and exponential smoothing
constant $\alpha$ = 0.3 is as follows.
$\qquad E_{1986} \,=\, E_{1987} = 16.74$\
$\qquad E_{1987} \,=\, 0.3 \,Y_{1987} \,+\, (1- 0.3)\, E_{1986} \,=\, (0.3)(16.74)+(0.7)(16.74)=16.74$\
$\qquad E_{1988} \,=\, 0.3 \,Y_{1988} \,+\, (1- 0.3)\, E_{1987} \,=\, (0.3)(17.12)+(0.7)(16.74)=16.854$\
All data exponentially smoothed with $\alpha$ = 0.3 are given in \[Table
13.2.2\]. It can be seen that, in the exponential smoothing method,
there is no loss of data at both ends, unlike the moving average method.
The crude oil price time series and exponentially smoothed data are
shown in [Figure 13.2.2]{.figure-ref}. It can be seen that the smoothed data are
not significantly different from the original data. If the value
of$\alpha$ is small, more weight is given to the past data than to the
present, making it less sensitive to sudden changes in the present data.
Conversely, the closer the value of $\alpha$ is to 1, that is, the more
weight is given to the current data, the more the smoothed data
resembles the original data, and the smoothing effect disappears.
::: textLeft
[Table 13.2.2]{.table-ref} Price of Crude Oil and Exponential Smoothing with α
=0.3
:::
-----------------------------------------------------------------------
Year Price of Oil Exponential Smoothing\
α=0.3
----------------------- ----------------------- -----------------------
1987\ 16.74\ 16.740\
1988\ 17.12\ 16.854\
1989\ 21.84\ 18.350\
1990\ 28.48\ 21.389\
1991\ 19.15\ 20.717\
1992\ 19.49\ 20.349\
1993\ 14.19\ 18.501\
1994\ 17.77\ 18.282\
1995\ 19.54\ 18.659\
1996\ 25.90\ 20.832\
1997\ 17.65\ 19.877\
1998\ 12.14\ 17.556\
1999\ 25.76\ 20.017\
2000\ 26.72\ 22.028\
2001\ 19.96\ 21.408\
2002\ 31.21\ 24.348\
2003\ 32.51\ 26.797\
2004\ 43.36\ 31.766\
2005\ 61.06\ 40.554\
2006\ 60.85\ 46.643\
2007\ 95.95\ 61.435\
2008\ 44.60\ 56.385\
2009\ 79.39\ 63.286\
2010\ 91.38\ 71.714\
2011\ 98.83\ 79.849\
2012\ 91.83\ 83.443\
2013\ 98.17\ 87.861\
2014\ 53.45\ 77.538\
2015\ 37.13\ 65.416\
2016\ 53.75\ 61.916\
2017\ 60.46\ 61.479\
2018\ 45.15\ 56.580\
2019\ 61.14\ 57.948\
2020\ 48.52\ 55.120\
2021\ 75.21\ 61.146\
2022\ 106.95\ 74.888\
-----------------------------------------------------------------------
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[134])" src="QR/eStatU330_TimeseriesSmoothing.svg" type="image"/>
</div>
<div>
![](Figure/Fig130202.png){.imgFig600400}
::: figText
[Figure 13.2.2]{.figure-ref} Price of Crude Oil and Exponential Smoothing with
α=0.3
:::
</div>
</div>
:::
### Filtering by Moving Median
::: mainTable
The N-point **centered moving median** of a time series refers to the
median of N data from a single point in time $t$. For example, in crude
oil price data, the value of a five-point moving median for a specific
year is the median of data for two years before a certain year, that
year, and data for two years thereafter. If data are denoted by
$Y_{t-2} ,Y_{t-1} , Y_{t} , Y_{t+1} , Y_{t+2}$, and the data are sorted
from smallest to largest, and expressed as
$Y_{(t-2)} ,Y_{(t-1)} , Y_{(t)} , Y_{(t+1)} , Y_{(t+2)}$, the median
value is $Moving Median_t \,=\, Y_{(t)}$.
For example, the 1989 5-point central moving median for crude oil prices
in [Table 13.2.3]{.table-ref} is as follows:
$\qquad MovingMedian_{1989} \,=\, median \{ Y_{1987} , Y_{1988} ,Y_{1989} , Y_{1990} , Y_{1991} \}$
$\qquad \qquad \qquad \qquad \qquad \;\;=\, median \{16.74, 17.12 , 21.84 , 28.48,19.15 \} \,=\, 19.15$
[Table 13.2.3]{.table-ref} and [Figure 13.2.3]{.figure-ref} show all the five-point moving
median values obtained in this way and their graphs. Note that the
moving median for the first two years and the last two years are not
available here. Because the centered moving medians remove extreme
values, it is called a filtering and the time series is much smoother
than the original data.
::: textLeft
[Table 13.2.3]{.table-ref} Price of Crude Oil and 5-point Centered Moving Median
:::
-----------------------------------------------------------------------
Year Price of Oil 5-point Centered Moving
Median
----------------------- ----------------------- -----------------------
1987\ 16.74\ \
1988\ 17.12\ \
1989\ 21.84\ 19.15\
1990\ 28.48\ 19.49\
1991\ 19.15\ 19.49\
1992\ 19.49\ 19.15\
1993\ 14.19\ 19.15\
1994\ 17.77\ 19.49\
1995\ 19.54\ 17.77\
1996\ 25.90\ 17.77\
1997\ 17.65\ 19.54\
1998\ 12.14\ 25.76\
1999\ 25.76\ 19.96\
2000\ 26.72\ 25.76\
2001\ 19.96\ 26.72\
2002\ 31.21\ 31.21\
2003\ 32.51\ 32.51\
2004\ 43.36\ 43.36\
2005\ 61.06\ 60.85\
2006\ 60.85\ 60.85\
2007\ 95.95\ 61.06\
2008\ 44.60\ 79.39\
2009\ 79.39\ 91.38\
2010\ 91.38\ 91.38\
2011\ 98.83\ 91.83\
2012\ 91.83\ 91.83\
2013\ 98.17\ 91.83\
2014\ 53.45\ 53.75\
2015\ 37.13\ 53.75\
2016\ 53.75\ 53.45\
2017\ 60.46\ 53.75\
2018\ 45.15\ 53.75\
2019\ 61.14\ 60.46\
2020\ 48.52\ 61.14\
2021\ 75.21\ \
2022\ 106.95\ \
-----------------------------------------------------------------------
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[134])" src="QR/eStatU330_TimeseriesSmoothing.svg" type="image"/>
</div>
<div>
![](Figure/Fig130203.png){.imgFig600400}
::: figText
[Figure 13.2.3]{.figure-ref} Price of Crude Oil and 5-point Centered Moving Median
:::
</div>
</div>
If the value of N is an even number, there is a difficulty in obtaining
the central moving median having the same number of data on both sides
of the base year. For example, the center of the four-point moving
median from 1987 to 1990 is between 1988 and 1989. If you denote this as
$Median_{1988.5}$, it can be calculated as follows:
$\qquad MovingMedian_{1988.5} \,=\, median \{Y_{1987} , Y_{1988} , Y_{1989} , Y_{1990} \}$
$\qquad \qquad \qquad \qquad \qquad \;=\, median \{16.74 , 17.12 , 21.84 , 28.48 \} \,=\, \frac {17.12 +21.84} {2} = 19.48$
The 4-point moving median obtained in this way is called the non-central
4-point moving median. As such, the non-central moving average in the
case of this even number N does not match the observation year of the
original data, which is inconvenient. In the case of this even number,
it is calculated as the average of the values of the two non-central
moving medians that are adjacent to each other. In other words, the
central four-point moving median in 1989 is the mean of
$MovingMedian_{1988.5}$ and $MovingMedian_{1989.5}$.
:::
:::
:::
## Transformation of Time Series
::: mainTable
Time series can be viewed by drawing the raw data directly, but in order
to examine various characteristics, change in percentage increase or
decrease is examined, and an index that is a percentage with respect to
base time is alse examined. In addition, in order to examine the
relation of the previous data, it is compared with a time lag or
converted into horizontal data using the difference. When the variance
of the time series increases with time, it is sometimes converted into a
form suitable for applying the time series model by using logarithmic,
square root, or Box-Cox transformation.
:::
### Percentage Change
::: mainTable
##### A. Percent Change
In a time series, you can examine the increase or decrease of a value,
but you can easily observe the change by calculating the percentage
increase or decrease. When the time series is expressed as
$Y_1 , Y_2 , ... , Y_n$ , the percentage increase or decrease $P_t$
compared to the previous data is as follows. $$
P_{t} \,=\, \frac {Y _{t} - Y_{t-1}} {Y_{t-1}} \times 100 , \quad t=2,3, ... , n
$$ [Table 13.3.1]{.table-ref} shows the number of houses in Korea from 2010
to 2020, and [Figure 13.3.1]{.figure-ref} shows the percentage increase or decrease
compared to the previous data. Looking at this rate of change, it can be
easily observed that the original time series has an overall increasing
trend, but the rate of change of the previous year has many changes. In
other words, it can be observed that there was a 2.23% increase in the
number of houses in 2014 compared to the previous year, and a 2.48%
increase in the number of houses in 2018 as well.
$\qquad P_{2014} \,=\, \frac{19161.2 - 18742.1} {18742.1} \times 100 \,=\, 2.23$
::: textLeft
[Table 13.3.1]{.table-ref} Number of Houses in Korea and Percent Change\
(Korea National Statistical Office, unit 1000)
:::
-----------------------------------------------------------------------
Year Number of Houses \% change
----------------------- ----------------------- -----------------------
2010\ 17738.8\ \
2011\ 18082.1\ 1.93\
2012\ 18414.4\ 1.83\
2013\ 18742.1\ 1.77\
2014\ 19161.2\ 2.23\
2015\ 19559.1\ 2.07\
2016\ 19877.1\ 1.62\
2017\ 20313.4\ 2.19\
2018\ 20818.0\ 2.48\
2019\ 21310.1\ 2.36\
2020\ 21673.5\ 1.70\
-----------------------------------------------------------------------
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[135])" src="QR/eStatU340_TimeseriesTransformation.svg" type="image"/>
</div>
<div>
![](Figure/Fig130301.png){.imgFig600400}
::: figText
[Figure 13.3.1]{.figure-ref} Number of Houses in Korea and Percent Change
:::
</div>
</div>
:::
### B. Simple Index
::: mainTable
Another way to use percentages to easily characterize changes over time
is to calculate an index number. An **index** $I_t$ is a number that
indicates the change over time of a time series. The index number
$Index_t$ of a time series at a certain point in time is the percentage
of the total time series data for a predetermined time point $t_0$
called the base period. $$
Index_{t} \,=\, \frac {Y _{t}} {Y_{t_0}} \times 100 , \quad t=1,2,..., n
$$ The most commonly used indices in the economic field are the
price index and the quantity index. For example, the consumer price
index is a price index indicating the price change of a set of goods
that can reflect the total consumer price, and the index indicating the
change in total electricity consumption every year is the quantity
index. There are several methods of calculating the index, which are
broadly divided into simple index number when the number of items
represented by the index is one, and composite index number when there
are several as in the consumer price index.
[Table 13.3.2]{.table-ref} is a simple index for the number of houses in Korea
from 2010 to 2020, with the base time being 2010. If you look at the
figure for the index, you can see that in this case, there is no
significant change from the original time series and trend. It can be
seen that there is a 22.18% increase in the number of houses in 2020
compared to 2010.
$\qquad Index_{2020} \,=\, \frac{Y _{2020}} {Y_{2010}} \times 100 \,=\, \frac{21673.5} {17738.8} \times 100 \,=\, 122.18$
::: textLeft
[Table 13.3.2]{.table-ref} Simple Index of Number of Houses in Korea\
(Korea National Statistical Office, unit 1000)
:::
-----------------------------------------------------------------------
Year Number of Houses Simple Index\
Base: 2010
----------------------- ----------------------- -----------------------
2010\ 17738.8\ 100.00\
2011\ 18082.1\ 101.94\
2012\ 18414.4\ 103.81\
2013\ 18742.1\ 105.66\
2014\ 19161.2\ 108.02\
2015\ 19559.1\ 110.26\
2016\ 19877.1\ 112.05\
2017\ 20313.4\ 114.51\
2018\ 20818.0\ 117.36\
2019\ 21310.1\ 120.13\
2020\ 21673.5\ 122.18
-----------------------------------------------------------------------
<div>
<div>
<input class="qrBtn" onclick="window.open(addrStr[135])" src="QR/eStatU340_TimeseriesTransformation.svg" type="image"/>
</div>
<div>
![](Figure/Fig130302.png){.imgFig600400}
::: figText
[Figure 13.3.2]{.figure-ref} Simple Index of Number of Houses in Korea
:::
</div>
</div>
:::
### C. Composite Index
::: mainTable
**Composite index** is a method in which the change in price or quantity
of several goods is set at a specific time point as the base period, and
then the data at each time point is calculated as a percentage value
compared to the base period. An example of the most used composite index
is the consumer price index, which reflects price fluctuations of about
500 products in Korea that affect consumer prices. Other commonly used
composite indices include the comprehensive stock index, which examines
the price fluctuations of all listed stocks traded in the stock market.
For the composite index, a weighted composite index that is calculated
by weighting the price of each product with the quantity consumed is
often used. When calculating such a weighted composite index, the case
where the quantity consumption at the base time is used as a weight is
called the **Laspeyres method**, and the case where the quantity
consumption at the current time is used as the weight is called the
**Paasche method**. In general, the Laspeyres method of weighted
composite index is widely used, and the consumer price index is a
representative example. The price index of the Paasche method is used
when the consumption of goods used as weights varies greatly over time,
and can be used only when the consumption at each time point is known.
It is expensive to examine the quantity consumption at each point in
time.
Assuming that $P_{1t} , \cdots , P_{kt}$ are the prices of $k$ number of
products at the time point $t$, and $Q_{1t_0} , \cdots , Q_{kt_0}$ are
the quantities of each product consumpted at the base time, the formula
for calculating each composite index is as follows:
$\qquad \text{Laspeyres Index:} \quad Index_t \,=\, \frac { Q_{1t_0} P_{1t} + \cdots + Q_{kt_0} P_{kt} } {Q_{1t_0} P_{1t_0}+ \cdots + Q_{kt_0} P_{kt_0} } \times 100$\
$\qquad \text{Paasche Index:} \qquad Index_t \,=\, \frac { Q_{1t} P_{1t} + \cdots + Q_{kt} P_{kt} } {Q_{1t} P_{1t_0} + \cdots + Q_{kt} P_{kt_0} } \times 100$
The data in [Table 13.3.3]{.table-ref} shows the price and quantity of three
metals by month in 2020.
::: textLeft
[Table 13.3.3]{.table-ref} Composite Index of three Metal Prices(\$/ton) and
Production Quantity(ton)
:::
--------------------------------------------------------------------------
\ Copper\ Metal\ Lead\ \
Month Price Price Price Laspeyres
Quantity Quantity Quantity Paasche
-------------- -------------- -------------- -------------- --------------
1\ 1361.6 213 4311\ 530.0 46.1\ 100.00
2\ 100.7\ 213 4497\ 520.0 47.0\ 100.00\
3\ 1399.0 213 5083\ 529.0 51.0\ 100.31
4\ 95.1\ 213 5077\ 540.0 23.0\ 100.28\
5\ 1483.6 213 5166\ 531.0 26.5\ 101.13
6\ 104.0\ 213 4565\ 580.0 13.5\ 101.01\
7\ 1531.6 213 4329\ 642.8 27.4\ 101.63
8\ 95.6\ 213 4057\ 602.6 25.8\ 101.35\
9\ 1431.2 213 3473\ 513.6 20.5\ 100.65
10\ 103.3\ 213 3739\ 480.8 24.6\ 100.57\
11\ 1383.8 213 3817\ 528.4 21.5\ 100.42
12\ 106.9\ 213 3694\ 462.2 27.9\ 100.27\
1326.8 100.16
95.9\ 99.98\
1328.8 100.00
96.7\ 99.87\
1307.8 99.43
95.7\ 99.38\
1278.4 99.01
89.1\ 99.07\
1354.2 99.92
100.5\ 99.92\
1305.2 99.18
96.9\ 99.21\
--------------------------------------------------------------------------
In [Table 13.3.3]{.table-ref}, the Laspeyres index for the data for February with
January as the base time is as follows.