-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
1688 lines (1384 loc) · 73.1 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "README"
Author: "Nathan Potgieter"
output:
github_document:
pandoc_args: --webtex
---
```{r setup, include=FALSE}
knitr::opts_knit$set(root.dir = getwd())
knitr::opts_chunk$set(echo = TRUE, fig.width=10, fig.height=8, out.height='80%', out.width='80%')
rm(list = ls()) # Clean your environment:
gc()
```
# MCmarket an R Package Designed for Simulating Asset Market Returns
This is the README for Nathan Potgieter's financial econometrics project. This work sets out a framework for the Monte Carlo simulation of asset markets. This framework is built into the MCmarket package which will be available in on githib under the "Nathan-Potgieter/MCmarket" repository.
## Aim
The aim of this project is to develop a general and easy to use Monte Carlo simulation package designed to generate asset return data, with a pre-specified correlation structure, various asset distribution options and mean and variance time-series properties. Elliptical copulas are used to specify the correlation structure within the simulated data. The option to use the Archmedian Clayton copula is also made available, as this allow for the creation of markets with high left-tail dependence. However, when using this option the correlation structure will not be able to be specified.
## Monte Carlo Framework
The Monte Carlo simulation routine involves the following steps:
This example generates k periods of returns, for D Assets across N markets.
1. Draw a series of k random uniformly distributed numbers (corresponding to k trading periods), across a set of D variables (or D assets), from a multivariate distribution with a given correlation structure.
+ This is accomplished using Euclidean (Gaussian or t-copula) or Archmediean Clayton copulas. In R this can easily be done using the _ellipCopula_, _archmCopula_ and _rcopula_ functions from the _copula_ package.
2. Convert the uniformly distributed data into something that more resembles the distribution of asset returns. MCmarket provides functionality to convert them into normal, student-t or skewed-generalized t distributions.
+ This is done the same way one would convert p-values into test statistics with the dnorm(), dt() and sgt::dsgt() functions.
+ Technically this is accomplished via the inversion of the cumulative distribution function (CDF).
3. Induce mean and variance persistence to the series, by plugging the random numbers resulting from step 2 into a ARMA(1,1) + APARCH(1,1) equation as the innovations.
+ If the parameters are set accordingly the resulting series should possess the volatility clustering observed in empirical asset returns.
4. The final step is to repeat the first 3 steps N times to generate an ensemble of asset markets, each with the same risk characteristics.
#### Loading Packages
```{r loading packages}
library(pacman)
# Loading MCmarket
p_load(MCmarket)
#These packages are used in the Monte Carlo (MC) functions.
p_load(tidyverse, copula, lubridate, sgt, glue)
#These packages are only used in research and demonstration tasks.
p_load(tbl2xts, fGarch, forecast)
```
# The Set up
Since a correlation matrix is the key input in the Monte Carlo simulation framework this section covers the creation of some _ad hoc_ correlation matrices using the MCmarkets::sim_corr function, as well as, the estimation of some empirical correlation matrices using S&P500 data.
## Generating Ad Hoc Covarience matrix
In this section I developed a simple function that allows the user to easily generate a correlation matrix with a desired correlation structure. Note that the majority of the code was provided by Nico Katzke. The function is located in the gen_corr.R code file.
### gen_corr
```{r gen_corr, eval=FALSE, include=TRUE}
#' @title gen_corr
#' @description This function allows users to easily generate a user-defined ad hoc correlation
#' matrix with up to four layers of clusters.
#' @param D the number of variables. The output is a D by D correlation matrix.
#' @param clusters a character string specifying the type of cluster structure.
#' Available options are:
#'
#' "none" for a correlation matrix with no clusters, but significant correlation
#' between constituents.
#'
#' "non-overlapping" for a correlation matrix with one layer of clusters.
#'
#' "overlapping" for a correlation matrix with up to four layers and a set number
#' of clusters per layer.
#' @param num_clusters If clusters = "none", then num_clusters is not used.
#'
#' If clusters = "non-overlapping", then num_clusters is an integer indicating the number of clusters.
#'
#' If clusters = "overlapping", then num_clusters is a vector of length less than or equal to four. The length of
#' num_clusters specifies the number of cluster layers and the integers within the vector specify the number of clusters
#' per layer. It is preferable to arrange the vector in descending order, since failing to do so can result in a
#' unique output, which may not contain the intended number of layers. Additionally, using combinations with repeating
#' numbers of clusters, for example num_clusters = C(10, 10, 5, 5)) will produce fewer layers, but unique intra-cluster
#' correlations (See examples).
#' @return Returns a D by D correlation matrix.
#'
#' @import propagate
#'
#' @examples
#' \dontrun{
#' library(ggcorrplot)
#'
#' ### This generates a 50 by 50 correlation matrix with no clusters.
#' gen_corr(D = 50, clusters = "none) %>%
#' ggcorrplot(title = "no clusters")
#'
#' ### This generates a 50 by 50 correlation matrix with 5 non-overlapping clusters.
#' gen_corr(D = 50, clusters = "non-overlapping) %>%
#' ggcorrplot(title = "non-overlapping clusters")
#'
#' ### This generates a 60 by 60 correlation matrix consisting
#' ### of 4 layers with 10, 5, 3 and 2 clusters respectively.
#' gen_corr(D = 60,
#' clusters = "overlapping",
#' num_clusters = c(10,5,3,2)) %>%
#' ggcorrplot(title = "overlapping clusters")
#'
#' ### Try tinkering with the num_clusters argument in unique ways to effect the
#' ### within cluster correlation coefficients.
#' ### Two cluster layers with low overall correlation.
#' gen_corr(D = 50,
#' clusters = "overlapping",
#' num_clusters = c(10,10,10,2)) %>%
#' ggcorrplot(title = "overlapping clusters") %>%
#' ggcorrplot(title = "overlapping clusters")
#'
#' }
#' @export
gen_corr <- function (D = 50,
clusters = c("none", "non-overlapping", "overlapping"),
num_clusters = NULL) {
Grps <- num_clusters
num_layers <- length(num_clusters)
#set.seed(123)
if (!(clusters %in% c("none", "non-overlapping", "overlapping"))) stop("Please provide a valid clusters argument")
if(clusters == "none"){
# Unclustered covariance matrix
Sigma <- diag(D)
for (i in 1:D) for (j in 1:D) Sigma[i,j] <- 0.9^abs(i-j)
} else
if(clusters == "non-overlapping"){
#----------------------
# distinct non-overlapping clusters:
#----------------------
if(is.null(num_clusters)) stop("Please provide a valid num_clusters argument when using non-overlapping clusters")
if (num_layers>1) stop("Please provede a valid num_clusters argument of length one when using non-overlapping clusters")
Sigma <- matrix(0.6, D, D)
diag(Sigma) <- 1
for (i in 1:Grps) {
ix <- seq((i-1) * D / Grps + 1, i * D / Grps)
Sigma[ix, -ix] <- 0.0001 #think about
}
} else
if(clusters == "overlapping"){
#----------------------
# distinct overlapping clusters:
#----------------------
if(is.null(num_clusters)) stop("Please provide a valid num_clusters argument when using overlapping clusters")
if (num_layers>4) stop("Please provede a valid num_clusters argument of length less than or equal to 4 when using non-overlapping clusters")
Sigma <- matrix(0.7, D, D)
diag(Sigma) <- 1
for (i in 1:Grps[1]) {
ix <- seq((i-1) * D / Grps[1] + 1, i * D / Grps[1])
Sigma[ix, -ix] <- 0.5
}
if(num_layers>=2){
for (i in 1:Grps[2]) {
ix <- seq((i-1) * D / Grps[2] + 1, i * D / Grps[2])
Sigma[ix, -ix] <- 0.3
} }
if(num_layers>=3){
for (i in 1:Grps[3]) {
ix <- seq((i-1) * D / Grps[3] + 1, i * D / Grps[3])
Sigma[ix, -ix] <- 0.15
} }
if(num_layers>=4){
for (i in 1:Grps[4]) {
ix <- seq((i-1) * D / Grps[4] + 1, i * D / Grps[4])
Sigma[ix, -ix] <- 0.05
} }
}
return(Sigma)
}
```
Demonstrating the use of gen_corr
```{r using gen_corr}
gen_corr(D = 60, clusters = "none") %>%
ggcorrplot::ggcorrplot(title = "No Clusters", hc.order = FALSE)
gen_corr(D = 60, clusters = "non-overlapping", num_clusters = c(3)) %>%
ggcorrplot::ggcorrplot(title = "Non-Overlapping Clusters", hc.order = FALSE)
gen_corr(D = 60, clusters = "overlapping", num_clusters = c(10,5,4,2)) %>% ggcorrplot::ggcorrplot(title = "Overlapping Clusters", hc.order = FALSE)
```
## Creating a Dataset of Emperical Correlation Matrices
Since it would also be useful for MCmarkets's users to have easy access to some empirical correlation matrices, a data set containing correlation matrices labeled "stressed", "rally" and "normal" is now constructed.
S&P500 data since 1/01/2000 is used to sample covariance matrices that can be used as inputs in the Monte Carlo procedure. These matrices will be made available as part of the MCmarket package. They can also be used in correlation matrix generative models, like CorrGAN (https://marti.ai). At a later stage MCmarket may incorporate some CorrGAN model, preferably those like the conditional CorrGAN or conditional CorrVAE. this will allow users to randomly simulate different types of realistic correlation matrices.
Note that the code below is not run when building this README, it would simply take too long to knit this document.
#### Getting SNP 500 data since 2000
```{r SNP_data, eval=FALSE, echo=TRUE}
pacman::p_load(tidyverse, tidyquant, lubridate)
# impute missing returns from fin merics tut is used to fill in missing data
source("code/impute_missing_returns.R")
# save current system date to a variable
today <- Sys.Date()
# subtract 90 months from the current date| this is used at a later stage.
#date <- today %m+% months(-90)
# saving from date
date <- as.Date("2000-01-01")
# Getting all tickers in SP500
stock_list <- tq_index("SP500") %>%
arrange(symbol) %>%
mutate(symbol = case_when(symbol == "BRK.B" ~ "BRK-B", #repairing names that cause errors
symbol == "BF.B" ~ "BF-B",
TRUE ~ as.character(symbol))) %>%
pull(symbol)
# This function gets the data for a ticker and date
get_data <- function (ticker, from) {
df <- tq_get(ticker, from = from) %>% mutate(symbol = rep(ticker, length(date)))
return(df)
}
#Getting S&P500 data set
SNP_data <-
1:length(stock_list) %>% map(~get_data(ticker = stock_list[[.]], from = date)) %>% bind_rows()
save(SNP_data, file = "data/SNP_data.Rda")
#Calculating returns and sd
#and imputing missing returns using Nico's impute_missing_returns function
SNP_returns <- left_join(
SNP_data %>%
group_by(symbol) %>%
arrange(date) %>%
mutate(return = log(adjusted/dplyr::lag(adjusted))) %>%
dplyr::filter(date>first(date)) %>%
select(date, symbol, return) %>%
spread(symbol, return) %>%
impute_missing_returns(impute_returns_method = "Drawn_Distribution_Own") %>% #Imputing missing returns
gather(symbol, return, -date) %>%
group_by(symbol) %>%
mutate(sd = sd(return, na.rm = T)*sqrt(252)) %>% # calculate annualized SD
ungroup(),
SNP_data %>%
select(date, symbol, volume, adjusted),
by = c("date", "symbol")
)
save(SNP_returns, file = "data/SNP_returns.Rda")
```
#### Labeling SNP Data into Rally, Normal and Stressed Markets
This chunk calculates the sharp ratios for random equally weighted portfolios of 50 assets over a random time slice of 252 days since 2000/01/01 from the SNP_return data set. The portfolios are then labels as follows:
- ‘stressed market’: A market is ‘stressed’ whenever the equi-weighted basket of stocks has a Sharpe below -0.5 over the year of study (252 trading days).
- ‘rally market’: A market is ‘rallying’ whenever the equi-weighted basket of stocks under has a Sharpe above 2 over the year of study (252 trading days).
- ‘normal market’: A market is ‘normal’ whenever the equi-weighted basket of stocks under has a Sharpe in-between -0.5 and 2 over the year of study (252 trading days).
Note that this methodology is consistent with that used in https://marti.ai/qfin/2020/02/03/sp500-sharpe-vs-corrmats.html.
```{r sharp, eval=FALSE, echo=TRUE}
load("data/SNP_returns.Rda")
#Packages used in chunk
library(pacman)
p_load(tidyverse, furrr, PerformanceAnalytics, tbl2xts, rmsfuns, lubridate, fitHeavyTail)
#This function generates random portfolios with option to supply sharp ratio,
#sharp = TRUE takes substantially longer to calculate.
gen_random_port <- function(dim = 100, sharp = TRUE) {
#list of Assets from which sample
symbols <- SNP_returns %>%
dplyr::filter(date==first(date)) %>%
pull(symbol)
dim <- dim
sample_symbols <- sample(symbols, dim)
#Dates from which to sample
dates <- SNP_returns %>%
dplyr::filter(symbol == "A") %>%
select(date)
indx <- sample.int(nrow(dates) - 252, 1)
start_date <- dates[indx,]
end_date <- dates[indx + 252,]
sample_dates <- dates %>% dplyr::filter(date >= start_date[[1]] &
date <= end_date[[1]]) %>% pull()
if(sharp == FALSE){
training_data <- list(dates = list(sample_dates),
symbols = list(sample_symbols)) %>% as_tibble()
}else
if(sharp == TRUE){
#setting rebalance months
RebMonths <- c(1,4,7,10)
EQweights <-
SNP_returns %>%
dplyr::filter(symbol %in% sample_symbols &
date %in% sample_dates) %>%
select(date, symbol, return) %>%
mutate(Months = as.numeric(format(date, format = "%m")),
YearMonths = as.numeric(format(date, format = "%Y%m"))) %>%
dplyr::filter(Months %in% RebMonths) %>%
group_by(YearMonths, Months, symbol) %>%
dplyr::filter(date == last(date)) %>%
ungroup() %>%
group_by(date) %>%
mutate(weight = 1/n()) %>%
select(date, symbol, weight) %>%
spread(symbol, weight) %>% ungroup() %>%
tbl_xts()
# Return wide data
Returns <-
SNP_returns %>%
dplyr::filter(symbol %in% sample_symbols &
date %in% sample_dates) %>%
select(date, symbol, return) %>%
spread(symbol, return) %>% tbl_xts()
#calculating portfolio returns
EW_RetPort <- rmsfuns::Safe_Return.portfolio(Returns,
weights = EQweights, lag_weights = TRUE,
verbose = TRUE, contribution = TRUE,
value = 100, geometric = TRUE)
Sharp <- SharpeRatio(EW_RetPort$returns, FUN="StdDev", annualize = TRUE)
training_data <- list(dates = list(sample_dates),
symbols = list(sample_symbols),
sharp = Sharp[1]) %>% as_tibble()
}
return(training_data)
}
#------------------------------------------
#First I generate the labeled training data
#------------------------------------------
#Generating N random portfolios, with Sharp ratio's.
# The furrr packages is used to speed up computation
set.seed(5245214)
training_data_sharp <-
1:500 %>% map_dfr(~gen_random_port(dim = 50))
save(training_data_sharp, file = "data/training_data_sharp.Rda")
load("data/training_data_sharp.Rda")
# Separating by sharp ratio into market types.
stressed_market <-
training_data_sharp %>%
dplyr::filter(sharp < -0.5)
rally_market <-
training_data_sharp %>%
dplyr::filter(sharp > 2)
normal_market <-
training_data_sharp %>%
dplyr::filter(sharp >= -0.5 & sharp <= 2)
#This function gathers the SNP data corresponding to the portfolios described above
get_market_data <- function(index_df, i){
SNP_returns %>%
select(date, symbol, return) %>%
dplyr::filter(symbol %in% index_df$symbols[[i]] &
date %in% index_df$dates[[i]])
}
#Separating portfolio by sharp ratio
stressed_market_data <-
1:nrow(stressed_market) %>%
map(~get_market_data(stressed_market, .x))
rally_market_data <-
1:nrow(rally_market) %>%
map(~get_market_data(rally_market, .x))
normal_market_data <-
1:nrow(normal_market) %>%
map(~get_market_data(normal_market, .x))
#Calculating Covariance matrix using the fitHeavyTail method
calc_cov <- function(df, i){
df[[i]] %>% select(date, symbol, return) %>%
spread(symbol, return) %>% select(-date) %>%
fitHeavyTail::fit_mvt() %>% .$cov
}
stressed_market_cov <-
1:length(stressed_market_data) %>%
map(~calc_cov(stressed_market_data, .x))
rally_market_cov <-
1:length(rally_market_data) %>%
map(~calc_cov(rally_market_data, .x))
normal_market_cov <-
1:length(normal_market_data) %>%
map(~calc_cov(normal_market_data, .x))
# Creating and saving a data frame of covariance matrices.
cov_mats <- list(cov_normal = normal_market_cov,
cov_stressed = stressed_market_cov,
cov_rally = rally_market_cov)
save(cov_mats, file = "data/cov_mats.Rda")
# Creating and saving a data frame of correlation matrices.
cor_mats <-
list(
cor_normal = 1:length(normal_market_cov) %>%
map(~cov2cor(normal_market_cov[[.x]])),
cor_stressed = 1:length(stressed_market_cov) %>%
map(~cov2cor(stressed_market_cov[[.x]])),
cor_rally = 1:length(rally_market_cov) %>%
map(~cov2cor(rally_market_cov[[.x]]))
)
save(cor_mats, file = "data/cor_mats.Rda")
#------------------------------------------
#Now to generate unlabelled training data
#Hence why sharp = FALSE in gen_random_port.
# This section has been scrapped for now;
# corrgan may only be implemented at a later stage
#------------------------------------------
training_data_indx <-
1:500 %>% map_dfr(~gen_random_port(dim = 50, sharp = FALSE))
market_data <- 1:nrow(training_data) %>% map(~get_market_data(training_data, .x))
training_data <- 1:length(market_data) %>% map(~calc_cor(market_data, .x))
save(training_data, file = "data/training_data.Rda")
```
#### An Example of Each Type of Emperical Correlation Matrix
Note that they are converted from covariance matrices to correlation matrices before being plotted.
```{r showing some training data, echo=FALSE}
load("data/cov_mats.Rda")
cov_mats$cov_rally[[1]] %>% cov2cor() %>%
ggcorrplot::ggcorrplot(title = "Rally Market", hc.order = TRUE)
cov_mats$cov_normal[[1]] %>% cov2cor() %>%
ggcorrplot::ggcorrplot(title = "Normal Market", hc.order = TRUE)
cov_mats$cov_stressed[[1]] %>% cov2cor() %>%
ggcorrplot::ggcorrplot(title = "Stressed Market", hc.order = TRUE)
```
# Building the Monte Carlo
## Step 1: Draw a series of random uniformly distributed numbers across a set of variables with a specified dependence structure.
### Generating Random Draws Using Various Copulas
This section explores the feasibility of using various types of copulas as part of the Monte Carlo framework.
#### Elliptal copulas
The Gaussian and t copulas from the Elliptal family allow us to specify their correlation matrix before randomly selecting observations from their multivariate distribution. Doing so allows one to produce random draws of uniformly distributed variables that adhere to the user specified correlation structure. The chunk of code below demonstrates this functionality.
```{r Elliptal Copulas, warning=FALSE}
#loading copula package; already loaded
# pacman::p_load(copula)
#generating toy corr matrix
corr <- gen_corr(D = 50, clusters = "overlapping", num_clusters = c(10, 5, 2))
#generating normal and student t copula objects
Ncop <- ellipCopula(family = "normal", dispstr = "un", param = P2p(corr), dim = 50)
Tcop <- ellipCopula(family = "t", dispstr = "un", param = P2p(corr), dim = 50)
#generating 252 random draws for each of the dim = 50 variables.
set.seed(123)
rn <- rCopula(copula = Ncop, n = 252)
rt <- rCopula(copula = Tcop, n = 252)
#Checking if the correlation structure was maintained
p_load(patchwork)
# Original corr
p1 <- ggcorrplot::ggcorrplot(corr, hc.order = TRUE) +
labs(title = "Input Correlation Matrix") +
scale_x_discrete(labels = NULL) + scale_y_discrete(labels = NULL) +
theme(legend.position = "bottom")
# corr from random draws form norm and t copula
p2 <- fitHeavyTail::fit_mvt(rn) %>% .$cov %>% cov2cor() %>%
ggcorrplot::ggcorrplot(hc.order = TRUE) +
labs(subtitle = "Normal Copula") +
scale_x_discrete(labels = NULL) + scale_y_discrete(labels = NULL) +
theme(legend.position = "none")
p3 <- fitHeavyTail::fit_mvt(rt) %>% .$cov %>% cov2cor() %>%
ggcorrplot::ggcorrplot(hc.order = TRUE) +
labs(subtitle = "T-Copula") +
scale_x_discrete(labels = NULL) + scale_y_discrete(labels = NULL) +
theme(legend.position = "none")
#Printing summary output
#Note that all the variables appear uniformly distributed
p_load(printr)
summary(rn)
p1
#Notice that the underlying correlation structure has, for the most part, been maintained.
# Though, some noise has been introduced
(p2+p3) + plot_annotation(title = "Output Correlation Matrices") +
plot_layout(guides='collect') &
theme(legend.position='bottom')
```
### Archimedean Copulas
This work searched for a method to produce simulated markets with high left-tail dependence. This would in theory enable the simulation of markets in which the correlation between assets increases when when overall asset returns are low. (i.e. during crisis periods)
Unfortunately, Elliptal copulas cannot be calibrated to exhibit dynamic correlations (i.e left-tail dependence). Therefore, in this section we examine some properties of Archimedean copulas which do exhibit tail dependence. Unfortunately, Archimedean copulas don't allow us to stipulate their correlation matrix.
Archimedean copulas such as the Clayton, Frank, Gumbel and Joe exhibit higher levels of dependence at the tails of the multivariate distribution. In this section we will examine the Clayton copula due to it exhibiting enhanced left-tail dependencies. Other copulas have been examined, but the Clayton copula is currently the only Archimedean copula, in the copula package, that allows random sampling from multivariate distributions with with large dimensions. Additionally, the other copulas have to be rotated 180 degrees to exhibit left-tail dependence (since they naturally posses right-tail dependence). This rotation step can be quite time consuming and often fails in higher dimensions. Therefore, the choice to focus on only using the Clayton copula was largely due to practical concerns raised when tinkering with alternative copulas. Therefore, in future work it would be beneficial to explore alternative high dimensional left-tail copulas.
```{r Archimedean copula, warning=TRUE}
# first look at at dim=2 to get understanding of what parameter tning does
#Clayton Copula
claycop <- archmCopula(family = "clayton", param = 2, dim = 2)
claycop2 <- archmCopula(family = "clayton", param = 4, dim = 2)
claycop3 <- archmCopula(family = "clayton", param = 6, dim = 2)
#Normal Copula
Ncop <- ellipCopula(family = "normal", dispstr = "un", param = 0.5, dim = 2)
# rCopula(251, claycop)
#note how left tail dependence increases with the parameter along with the overall correlation
#param = 2
persp(claycop, main = "Clayon Copula - param = 2", dCopula, zlim = c(0, 15), theta = 18)
#param = 4
persp(claycop2, main = "Clayon Copula - param = 4", dCopula, zlim = c(0, 15), theta = 18)
#param = 6
persp(claycop3, main = "Clayon Copula - param = 6", dCopula, zlim = c(0, 15), theta = 18)
#Normal Copula
persp(Ncop, main = "Normal Copula - cor = 0.5", dCopula, zlim = c(0, 15), theta = 18)
# Compairing Clayton with Normal copula
bind_rows(rCopula(5000, copula = claycop) %>% as_tibble() %>% mutate(copula = "claycop"),
rCopula(5000, copula = Ncop) %>% as_tibble() %>% mutate(copula = "normal")) %>%
ggplot(aes(x=V1,y=V2)) +
geom_point(alpha=0.5) +
geom_density_2d_filled(alpha=0.7) +
facet_wrap(~copula, nrow = 1) +
labs(title = "2D kernal Density - Clayton vs Normal Copula",
caption = "Clayton parameter = 2, Normal correlation = 0.5") +
theme_bw() +
theme(legend.position = "bottom")
```
Out of interest some code written when experimenting with alternative Archimedian copulas, deemed unusable, is supplied below:
```{r other arch cops, eval=FALSE, include=TRUE}
#Note that the Gumbel and Joe copulas must be rotated 180 degrees to exhibit greater left tail dependence
#-----------------------------------------------
#Gumbel Copula
#This copula function breaks when using dim >= 7
#-----------------------------------------------
gumcop <- archmCopula(family = "gumbel", param = 2, dim = 2) %>% rotCopula()
persp(gumcop, dCopula, zlim = c(0, 10))
rCopula(1000, copula = gumcop) %>% plot(main = "Gumbel Copula")
#-----------------------------------------------------------------
# Joe copula
#This copulas computation time is unacceptable long with dim = 10
#and doesnt work when dim = 50
#----------------------------------------------------------------
joecop <- archmCopula(family = "joe", param = 3, dim = 50) %>% rotCopula()
persp(joecop, dCopula, zlim = c(0, 10))
rCopula(1000, copula = joecop) %>% plot()
#--------------------------------
#Galambos
#This function breaks when dim>2
#--------------------------------
galcop <- evCopula(family = "galambos", param = 2, dim = 2) %>% rotCopula()
persp(galcop, dCopula, zlim = c(0, 10))
rCopula(1000, galcop) %>% plot()
```
## Hybrid Copulas
Since this project wanted to include the left-tail dependence property from the Clayton copula and the ability to specify the correlation matrix from the Elliptal copulas, there was an investigation into the creation of hybrid copulas.
@ruenzi states that flexible copula structures an be created through convex combinations of basic copulas. The authors use the 2D case and Tawn's theorem (below) to construct hybrid bivariate copulas.
Tawn (1988) "shows that a copula is a convex set and every convex combination of existing copula functions is again a copula" [@ruenzi2011, p.8].
Thus, if $C_1(U_N)$ and $C_2(U_N)$ are 2D copulas and $w$ is a weighting variable between 0 and 1, then
<center> $$C(U_N)=w.C_1(U_N)+(1-w)C_2(U_N)$$ </center>
is a unique copula. Therefore, a hybrid copula $C(U_N)$ can be created by linearly weighting an Elliptical and Archimedean copula of the dimension 2.
This work attempted to generalize this finding into higher dimensional copulas. First lets look at some 2D examples. We will investigate if this works in higher dimensions in Section \ref{test_sim_inno}.
### Generatimg Some 2D Hybrid Copulas.
This chunk of code creates and plots some pseudo hybrid copulas by linearly weighting the random numbers generated from a student-t and Clayton copula. Note how when increasing the Clayton weight, observations concentrate at the bottom left of the density plots.
```{r hybrid, eval=FALSE, include=FALSE}
#this chunk takes very long to knit due to large graph size; Therefore {eval=FALSE, include=FALSE} options are used
#Plotting Function
plot_hy_cop <- function (corr, df, clayton_param, w1, subtitle) {
#Elliptical Copulas
Tcop <- ellipCopula(family = "t", dispstr = "un", param = corr, dim = 2, df = df)
#left tail dependence copulas
claycop <- archmCopula(family = "clayton", param = clayton_param, dim = 2)
#Generating Random Draws
data <- (1-w1)*rCopula(10000, Tcop) + w1*rCopula(10000, claycop)
colnames(data) <- glue::glue("V{1:ncol(data)}")
data <- as_tibble(data)
#Plotting 2D Density
data %>% ggplot(aes(x=V1, y=V2)) +
geom_point(alpha = 0.3) +
geom_density_2d_filled(alpha = 0.6) +
coord_cartesian(xlim = c(0,1), ylim = c(0,1)) +
theme_bw() +
labs(subtitle = subtitle) +
theme(legend.position = "none")
}
#looking at different weights
library(patchwork)
set.seed(897454)
#setting default parameters, much easier than changing it every time
params <- list(cor = 0.1, df = 5, clayton_param = 4, w1 = 0.25)
attach(params)
p1 <- seq(0,1, length.out = 6) %>% map(~plot_hy_cop(corr = cor, df = df,
clayton_param = clayton_param, w1 = .x,
subtitle = glue("Clayton Weight = {.x}")))
plot_1 <- (p1[[1]]|p1[[2]])/(p1[[3]]|p1[[4]])/(p1[[5]]|p1[[6]]) +
plot_annotation(title = "2d Density Plot - Normal + Clayton Copula",
caption = glue("Clayton Parameter = {clayton_param}, Correlaltion = {cor}, t-copula df = {df}"))
p2 <- seq(0,1, length.out = 6) %>% map(~plot_hy_cop(corr = .x, df = df,
clayton_param = clayton_param, w1 = w1,
subtitle = glue("Correlation = {.x}")))
plot_2 <- (p2[[1]]|p2[[2]])/(p2[[3]]|p2[[4]])/(p2[[5]]|p2[[6]]) +
plot_annotation(title = "2d Density Plot - Normal + Clayton Copula",
caption = glue("Clayton Weight = {w1}, Clayton Parameter = {clayton_param}, t-copula df = {df}"))
p3 <- seq(1,1000, length.out = 6) %>% map(~plot_hy_cop(corr = cor, df = df,
clayton_param = .x, w1 = w1,
subtitle = glue("Clayton Parameter = {.x}")))
plot_3 <- (p3[[1]]|p3[[2]])/(p3[[3]]|p3[[4]])/(p3[[5]]|p3[[6]]) +
plot_annotation(title = "2d Density Plot - Normal + Clayton Copula",
caption = glue("Clayton Weight = {w1}, Correlaltion = {cor}, t-copula df = {df}"))
p4 <- seq(1,30, length.out = 6) %>% map(~plot_hy_cop(corr = cor, df = .x,
clayton_param = clayton_param, w1 = w1,
subtitle = glue("T Copula df = {.x}")))
plot_4 <- (p4[[1]]|p4[[2]])/(p4[[3]]|p4[[4]])/(p4[[5]]|p4[[6]]) +
plot_annotation(title = "2d Density Plot - Normal + Clayton Copula",
caption = glue("Clayton Weight = {w1}, Clayton Parameter = {clayton_param}, Correlaltion = {cor}"))
#Saving Graphical objects
ggsave(plot_1, filename = "plots/plot_1.png", width = 8, height = 8)
ggsave(plot_2, filename = "plots/plot_2.png", width = 8, height = 8)
ggsave(plot_3, filename = "plots/plot_3.png", width = 8, height = 8)
ggsave(plot_4, filename = "plots/plot_4.png", width = 8, height = 8)
```
![Plot 1.](plots/plot_1.png)
![Plot 2.](plots/plot_2.png)
![Plot 3.](plots/plot_3.png)
![Plot 4.](plots/plot_4.png)
Remember that each variable is still currently uniformly distributed.
```{r echo=FALSE, warning=FALSE}
rCopula(10000, copula = claycop) %>% as_tibble() %>% gather(Key, Value) %>%
ggplot(aes(x = Value, fill = Key)) +
geom_histogram(bins = 30) +
xlim(c(0,1)) +
facet_wrap(~Key, nrow = 2) +
labs(title = "Hiostogram", subtitle = "Variables are Uniformly Distrinuted") +
theme_bw() +
theme(legend.position = "none")
```
This section has successfully demonstrated how to use copulas to simulate random uniformly distributed data, with some that adhered to the given correlation matrix. It also found a potential way to simulate data with hybrid copulas.
# Step 2: Converting the uniformly distributed variables to something that better resembles the distribution of asset returns.
## Looking at options for marginal distributions
This section looks at some candidate pdf for the univariate asset returns. Due to convenience, it has become standard to use the normal, or student-t distribution when simulating asset returns. However, after reading up on numerous possible marginal distributions, I decided that the the Skewed generalized t distribution is a valuable alternative as it allows for the most flexibility. In fact, the skewed generalized t (SGT) distribution nests 12 common probability distribution functions (pdf). The tree diagram below indicates how one can set the SGT parameters to achieve the desired pdf. MCmarkets functions will include arguments to induce the marginals to be uniform, normal, student-t and SGT distributed.
![Plot 5.](plots/SGTtree.png)
### The skewed generalizd t distribution with different parameters
The code below demonstrates how the p, q and $\lambda$ functions influence the SGT distribution. Note how the SGT's parameters allow one to effect the skewness and kurtosis of the distribution.
```{r, echo=FALSE}
library(pacman, tidyverse)
p_load(sgt)
#-----------------------------------------
#Looking at the Skewnes parameter in sgt distribution
#-----------------------------------------
x <- seq(-4, 4, length=100)
hx <- dnorm(x)
degf <- -c(seq(0,0.45,length = 4))
colors <- c("red", "blue", "darkgreen", "gold", "black")
labels <- c(glue::glue("Param = {degf}"), "normal")
plot(x, hx, type="l", lty=2, xlab="x value",
ylab="Density", main="Comparison of SGD Distributions with different lambda's", ylim = c(0,0.5))
for (i in 1:4){
lines(x, dsgt(x,lambda = degf[i]), lwd=2, col=colors[i])
}
legend("topright", inset=.05, title="Distributions",
labels, lwd=2, lty=c(1, 1, 1, 1, 2), col=colors)
#-----------------------------------------
#now looking at p and q parameters (kurtosis) with lambda = 0.2
#-----------------------------------------
#altering p
x <- seq(-4, 4, length=100)
hx <- dnorm(x)
degf <- c(seq(1.01,2,length = 4))
colors <- c("red", "blue", "darkgreen", "gold", "black")
labels <- c(glue::glue("Param = {degf}"), "normal")
plot(x, hx, type="l", lty=2, xlab="x value",
ylab="Density", main="Comparison of SGD Distributions with different p parameters", ylim = c(0,0.8))
for (i in 1:4){
lines(x, dsgt(x,p = degf[i]), lwd=2, col=colors[i])
}
legend("topright", inset=.05, title="Distributions",
labels, lwd=2, lty=c(1, 1, 1, 1, 2), col=colors)
#altering q
x <- seq(-4, 4, length=100)
hx <- dnorm(x)
degf <- round(c(seq(1.5,4,length = 4)), digits = 2)
colors <- c("red", "blue", "darkgreen", "gold", "black")
labels <- c(glue::glue("Param = {degf}"), "normal")
plot(x, hx, type="l", lty=2, xlab="x value",
ylab="Density", main="Comparison of SGD Distributions with different q parameters", ylim = c(0,0.8))
for (i in 1:4){
lines(x, dsgt(x,q = degf[i]), lwd=2, col=colors[i])
}
legend("topleft", inset=.05, title="Distributions",
labels, lwd=2, lty=c(1, 1, 1, 1, 2), col=colors)
```
### Here I demonstrate how uniformly distributed observations can be transformed into other distributions.
```{r}
# Generating a uniform (0,1) series
set.seed(1234)
unif <- as_tibble(runif(1000))
unif %>% ggplot(aes(x = value)) +
geom_histogram() + theme_bw()
# transforming into normal(0,1) distribution
norm <- qnorm(unif$value) %>% as_tibble()
norm %>% ggplot(aes(x = value)) +
geom_histogram() + theme_bw()
# transforming into t distribution with df=3
t <- qt(unif$value, df = 3) %>% as_tibble()
t %>% ggplot(aes(x = value)) +
geom_histogram() + theme_bw()
# transforming into SGT| left skewed with excess kurtosis
sgt <-
qsgt(
unif$value,
mu = 1,
sigma = 1,
lambda = -0.4,
p = 1,
q = 10000
) %>% as_tibble()
sgt %>% ggplot(aes(x = value)) +
geom_histogram() + theme_bw()
```
```{r cleaning workspace, include=FALSE}
rm(list =ls())
gc()
```
# Step 3: Introducing Volatility Persistence
The simulated innovations do not yet demonstrate the mean and/or volatility persistence observed in real asset return series. Therefore, they can be thought of as the innovations of some time-series model/process. In this step I introduce mean and variance persistence using an ARMA(p,q) + APGARCH(q,p) model.
#### Inreresting artical on garch modeling and forecastig
- "The leptokurtosis, clustering volatility and leverage effects characteristics of financial time series justifies the GARCH modelling approach. The non-linear characteristic of the time series is used to check the Brownian motion and investigate into the temporal evolutionary patterns. The non-linear methods of forecasting and signal analysis are gaining popularity in stock market because of their robustness in feature extraction and classification." source: https://towardsdatascience.com/garch-processes-monte-carlo-simulations-for-analytical-forecast-27edf77b2787
## sim_garch
This function introduces mean and variance persistence to a series of i.i.d random numbers. Note that most of the code was re-purposed from fGarch's garchspec and garchsim functions.
```{r sim_garch, eval=FALSE, include=TRUE}
#' @title sim_garch
#' @description This function takes a vector of random numbers, referred to as innovations, and
#' induces mean and variance persistence by inserting them into an ARMA(1,1) + APARCH(1,1) model.
#' @param innovations a vector containing the random numbers or the innovations of the
#' ARIMA + GARCH process.
#' @param omega a positive value defining the coefficient of the variance equation. The default is 5e-04.
#' @param gamma a value defining the APARCH leverage parameter in the variance equation. The default
#' of 0 implies no leverage effect and therefore, corresponds to the standard GARCH model.
#' @param alpha a value defining the autoregressive variance coefficient. The default is 0.
#' @param beta a value defining the variance coefficient. The default is 0.
#' @param mu a value defining the mean. The default is 0.
#' @param ar a value defining the autoregressive ARMA coefficient. The default is 0.
#' @param ma a value defining the moving average ARMA coefficient. The default is 0.
#' @param delta a strictly positive value defining the delta parameter of the APARCH model. The default is 2,
#' which corresponds to the standard GARCH model.
#' @param simple a logical parameter indicating if the output should be a simple vector containing just the
#' resulting ARIMA + GARCH series, or if FALSE a three column dataframe containing z - the innovations, h - the
#' conditional variance and y - ARMA + APARCH series.
#'
#' @return if simple = TRUE a vector of the resulting ARMA + APARCH series.
#'
#' If simple = FALSE a three column dataframe containing z - the innovations, h - the conditional variance and y - ARMA +
#' APARCH series. Note the length of the resulting series will be one observation less than that of the innovations.
#' The ARMA(1,1) + APARCH(1,1) model effectively consumes this lag when producing its first value.
#' @note (1) It is suggested that the randomly distributed numbers have a mean of zero and standard
#' deviation of one, as these attributes can be set within sim_garch.
#'
#' (2) For more information on the ARMA + APARCH parameters see:
#'
#' Ruppert, D. and Matteson, D.S., 2011. Statistics and data analysis for financial engineering (Vol. 13).
#' New York: Springer.
#'
#' @importFrom dplyr tibble
#'
#' @examples
#'
#' \dontrun{
#'
#' library(tidyverse)
#'
#' ### creating series of 501 innovations
#' inno <- rnorm(501)
#'
#' ### This produces a ARMA + APARCH series of length 500.
#' GARCH <- sim_garch(inno,
#' mu = 0.000002,
#' omega = 0.00005,
#' alpha = 0.098839,
#' beta = 0.899506,
#' ar = 0.063666,
#' ma = NULL,
#' gamma = 0.12194,
#' delta = 1.85,
#' simple = FALSE)
#'
#' ### Visualising the resulting series
#' GARCH %>% mutate(period = 1:n()) %>%
#' gather(key, value, -period) %>%
#' ggplot() +
#' geom_line(aes(x = period, y = value, color = key)) +
#' facet_wrap(~key, scales = "free_y", ncol = 1)
#'
#' }
#' @export
sim_garch <- function(innovations,
omega = NULL, alpha = NULL, gamma = NULL, beta = NULL,
mu = NULL, ar = NULL, ma = NULL, delta = NULL,
simple = TRUE) {
# Creating model list with defaults
model <- list(omega = ifelse(is.null(omega), 5e-04, omega),
alpha = ifelse(is.null(alpha), 0, alpha),
gamma = ifelse(is.null(gamma), 0, gamma),
beta = ifelse(is.null(beta), 0, beta),
mu = ifelse(is.null(mu), 0, mu), #changed form NULL to 0
ar = ifelse(is.null(ar), 0, ar),
ma = ifelse(is.null(ma), 0, ma),
delta = ifelse(is.null(delta), 2, delta))
#obtaining parameters and lag orders from model object
mu <- model$mu
ar <- model$ar
ma <- model$ma
omega <- model$omega
alpha <- model$alpha
gamma <- model$gamma
beta <- model$beta
delta <- model$delta
deltainv <- 1/delta
order.ar <- length(ar)
order.ma <- length(ma)
order.alpha <- length(alpha)
order.beta <- length(beta)
max.order <- max(order.ar, order.ma, order.alpha, order.beta)
n <- length(innovations) - 1
#Generating innovations
z_length <- n + max.order
z <- c(innovations)[(2-max.order):length(innovations)] #z is the vector of random innovation
h <- c(rep(model$omega/(1 - sum(model$alpha) - sum(model$beta)),
times = max.order), rep(NA, n)) #h is the conditional standard deviation
y <- c(rep(model$mu/(1 - sum(model$ar)), times = max.order), rep(NA, n)) #Observations from the resulting garch process
m <- max.order
#Inducing ARIMA + GARCH
eps <- h^deltainv * z #this part often breaks depending on GARCH parameters chosen (must be stationary)
for (i in (m + 1):(n + m)) {
h[i] = omega + sum(alpha * (abs(eps[i - (1:order.alpha)]) -
gamma * (eps[i - (1:order.alpha)]))^delta) +
sum(beta * h[i - (1:order.beta)])
eps[i] = h[i]^deltainv * z[i]
y[i] = mu + sum(ar * y[i - (1:order.ar)]) + sum(ma * eps[i - (1:order.ma)]) + eps[i]
}
if(simple == TRUE) {
data <- c(NA, y[(m + 1):(n + m)]) #removes burn in data and only provides ARIMA + GARCH series.
} else {
data <- tibble(z = c(NA, z[(m + 1):(n + m)]), # Innovations
h = c(NA, h[(m + 1):(n + m)]^deltainv), #Conditional Standard deviation
y = c(NA, y[(m + 1):(n + m)])) # ARIMA + GARCH series
}
return(data)
}
```
## Demonstrating sim_garch
Note that:
1) volatility clusters and significant autocorrelation appear in the series after it is processed through the sim_garch function.
2) sim_garch is a deterministic function.
```{r}
set.seed(32156454)
# Simulating a series of random SGT numbers
inno <- sgt::rsgt(n = 10001, lambda = -0.0143, p = 1.6650, q = 1.9095)
# Introducing Mean and Var Persistence
#Parameters from Statistics and Data Analysis for Financial Engineering pg.421-423
return <- sim_garch(inno,
omega = 0.000005, #key unconditional volatility parameter
alpha = 0.098839,
gamma = 0,
beta = 0.899506,