README.Rmd

---
title: "README"
Author: "Nathan Potgieter"
output: 
  github_document:
    pandoc_args: --webtex
---

```{r setup, include=FALSE}
knitr::opts_knit$set(root.dir = getwd())
knitr::opts_chunk$set(echo = TRUE, fig.width=10, fig.height=8, out.height='80%', out.width='80%')
rm(list = ls()) # Clean your environment:
gc()
```

# MCmarket an R Package Designed for Simulating Asset Market Returns

This is the README for Nathan Potgieter's financial econometrics project. This work sets out a framework for the Monte Carlo simulation of asset markets. This framework is built into the MCmarket package which will be available in on githib under the "Nathan-Potgieter/MCmarket" repository.

## Aim

The aim of this project is to develop a general and easy to use Monte Carlo simulation package designed to generate asset return data, with a pre-specified correlation structure, various asset distribution options and mean and variance time-series properties. Elliptical copulas are used to specify the correlation structure within the simulated data. The option to use the Archmedian Clayton copula is also made available, as this allow for the creation of markets with high left-tail dependence. However, when using this option the correlation structure will not be able to be specified.

## Monte Carlo Framework

The Monte Carlo simulation routine involves the following steps:

This example generates k periods of returns, for D Assets across N markets.

1. Draw a series of k random uniformly distributed numbers (corresponding to k trading periods), across a set of D variables (or D assets), from a multivariate distribution with a given correlation structure. 
    + This is accomplished using Euclidean (Gaussian or t-copula) or Archmediean Clayton copulas. In R this can easily be done using the _ellipCopula_, _archmCopula_ and _rcopula_ functions from the _copula_ package. 
2. Convert the uniformly distributed data into something that more resembles the distribution of asset returns. MCmarket provides functionality to convert them into normal, student-t or skewed-generalized t distributions. 
    + This is done the same way one would convert p-values into test statistics with the dnorm(), dt() and sgt::dsgt() functions.
    + Technically this is accomplished via the inversion of the cumulative distribution function (CDF).
3. Induce mean and variance persistence to the series, by plugging the random numbers resulting from step 2 into a ARMA(1,1) + APARCH(1,1) equation as the innovations. 
    + If the parameters are set accordingly the resulting series should possess the volatility clustering observed in empirical asset returns. 
4. The final step is to repeat the first 3 steps N times to generate an ensemble of asset markets, each with the same risk characteristics. 

#### Loading Packages
```{r loading packages}
library(pacman)
# Loading MCmarket
p_load(MCmarket)
#These packages are used in the Monte Carlo (MC) functions.
p_load(tidyverse, copula, lubridate, sgt, glue)
#These packages are only used in research and demonstration tasks.
p_load(tbl2xts, fGarch, forecast)
```

# The Set up

Since a correlation matrix is the key input in the Monte Carlo simulation framework this section covers the creation of some _ad hoc_ correlation matrices using the MCmarkets::sim_corr function, as well as, the estimation of some empirical correlation matrices using S&P500 data. 


## Generating Ad Hoc Covarience matrix

In this section I developed a simple function that allows the user to easily generate a correlation matrix with a desired correlation structure. Note that the majority of the code was provided by Nico Katzke. The function is located in the gen_corr.R code file. 

### gen_corr 

```{r gen_corr, eval=FALSE, include=TRUE}
#' @title gen_corr
#' @description This function allows users to easily generate a user-defined ad hoc correlation
#' matrix with up to four layers of clusters.
#' @param D the number of variables. The output is a D by D correlation matrix.
#' @param clusters a character string specifying the type of cluster structure.
#' Available options are:
#'
#' "none" for a correlation matrix with no clusters, but significant correlation
#' between constituents.
#'
#' "non-overlapping" for a correlation matrix with one layer of clusters.
#'
#' "overlapping" for a correlation matrix with up to four layers and a set number
#' of clusters per layer.
#' @param num_clusters If clusters = "none", then num_clusters is not used.
#'
#' If clusters = "non-overlapping", then num_clusters is an integer indicating the number of clusters.
#'
#' If clusters = "overlapping", then num_clusters is a vector of length less than or equal to four. The length of
#' num_clusters specifies the number of cluster layers and the integers within the vector specify the number of clusters
#' per layer. It is preferable to arrange the vector in descending order, since failing to do so can result in a
#' unique output, which may not contain the intended number of layers. Additionally, using combinations with repeating
#' numbers of clusters, for example num_clusters = C(10, 10, 5, 5)) will produce fewer layers, but unique intra-cluster
#' correlations (See examples).
#' @return Returns a D by D correlation matrix.
#'
#' @import propagate
#'
#' @examples
#' \dontrun{
#' library(ggcorrplot)
#'
#' ### This generates a 50 by 50 correlation matrix with no clusters.
#' gen_corr(D = 50, clusters = "none) %>%
#'          ggcorrplot(title = "no clusters")
#'
#' ### This generates a 50 by 50 correlation matrix with 5 non-overlapping clusters.
#' gen_corr(D = 50, clusters = "non-overlapping) %>%
#'          ggcorrplot(title = "non-overlapping clusters")
#'
#' ### This generates a 60 by 60 correlation matrix consisting
#' ### of 4 layers with 10, 5, 3 and 2 clusters respectively.
#' gen_corr(D = 60,
#'          clusters = "overlapping",
#'          num_clusters = c(10,5,3,2)) %>%
#'                   ggcorrplot(title = "overlapping clusters")
#'
#' ### Try tinkering with the num_clusters argument in unique ways to effect the
#' ### within cluster correlation coefficients.
#' ### Two cluster layers with low overall correlation.
#' gen_corr(D = 50,
#'          clusters = "overlapping",
#'          num_clusters = c(10,10,10,2)) %>%
#'                   ggcorrplot(title = "overlapping clusters") %>%
#'                   ggcorrplot(title = "overlapping clusters")
#'
#' }
#' @export
gen_corr <- function (D = 50,
                      clusters = c("none", "non-overlapping", "overlapping"),
                      num_clusters = NULL) {

        Grps <- num_clusters
        num_layers <- length(num_clusters)
        #set.seed(123)

        if (!(clusters %in%  c("none", "non-overlapping", "overlapping"))) stop("Please provide a valid clusters argument")

        if(clusters == "none"){
                # Unclustered covariance matrix
                Sigma <- diag(D)
                for (i in 1:D) for (j in 1:D) Sigma[i,j] <- 0.9^abs(i-j)

        } else

                if(clusters == "non-overlapping"){
                        #----------------------
                        # distinct non-overlapping clusters:
                        #----------------------

                        if(is.null(num_clusters)) stop("Please provide a valid num_clusters argument when using non-overlapping clusters")
                        if (num_layers>1) stop("Please provede a valid num_clusters argument of length one when using non-overlapping clusters")

                        Sigma <- matrix(0.6, D, D)
                        diag(Sigma) <- 1


                        for (i in 1:Grps) {
                                ix <- seq((i-1) * D / Grps + 1, i * D / Grps)
                                Sigma[ix, -ix] <- 0.0001                       #think about
                        }

                } else

                        if(clusters == "overlapping"){
                                #----------------------
                                # distinct overlapping clusters:
                                #----------------------
                                if(is.null(num_clusters)) stop("Please provide a valid num_clusters argument when using overlapping clusters")
                                if (num_layers>4) stop("Please provede a valid num_clusters argument of length less than or equal to 4 when using non-overlapping clusters")

                                Sigma <- matrix(0.7, D, D)
                                diag(Sigma) <- 1

                                for (i in 1:Grps[1]) {
                                        ix <- seq((i-1) * D / Grps[1] + 1, i * D / Grps[1])
                                        Sigma[ix, -ix] <- 0.5
                                }
                                if(num_layers>=2){
                                        for (i in 1:Grps[2]) {
                                                ix <- seq((i-1) * D / Grps[2] + 1, i * D / Grps[2])
                                                Sigma[ix, -ix] <- 0.3
                                        } }
                                if(num_layers>=3){
                                        for (i in 1:Grps[3]) {
                                                ix <- seq((i-1) * D / Grps[3] + 1, i * D / Grps[3])
                                                Sigma[ix, -ix] <- 0.15
                                        } }
                                if(num_layers>=4){
                                        for (i in 1:Grps[4]) {
                                                ix <- seq((i-1) * D / Grps[4] + 1, i * D / Grps[4])
                                                Sigma[ix, -ix] <- 0.05
                                        } }
                        }


        return(Sigma)

}

```

Demonstrating the use of gen_corr
```{r using gen_corr}
gen_corr(D = 60, clusters = "none") %>% 
  ggcorrplot::ggcorrplot(title = "No Clusters", hc.order = FALSE)

gen_corr(D = 60, clusters = "non-overlapping", num_clusters = c(3)) %>% 
  ggcorrplot::ggcorrplot(title = "Non-Overlapping Clusters", hc.order = FALSE)

gen_corr(D = 60, clusters = "overlapping", num_clusters = c(10,5,4,2)) %>% ggcorrplot::ggcorrplot(title = "Overlapping Clusters", hc.order = FALSE)

```

## Creating a Dataset of Emperical Correlation Matrices

Since it would also be useful for MCmarkets's users to have easy access to some empirical correlation matrices, a data set containing correlation matrices labeled "stressed", "rally" and "normal" is now constructed.

S&P500 data since 1/01/2000 is used to  sample covariance matrices that can be used as inputs in the Monte Carlo procedure. These matrices will be made available as part of the MCmarket package. They can also be used in correlation matrix generative models, like CorrGAN (https://marti.ai). At a later stage MCmarket may incorporate some CorrGAN model, preferably those like the conditional CorrGAN or conditional CorrVAE. this will allow users to randomly simulate different types of realistic correlation matrices.

Note that the code  below is not run when building this README, it would simply take too long to knit this document.

#### Getting SNP 500 data since 2000

```{r SNP_data, eval=FALSE, echo=TRUE}
pacman::p_load(tidyverse, tidyquant, lubridate)

# impute missing returns from fin merics tut is used to fill in missing data
source("code/impute_missing_returns.R")

# save current system date to a variable
today <- Sys.Date()
# subtract 90 months from the current date| this is used at a later stage. 
#date <- today %m+% months(-90)
# saving from date
date <- as.Date("2000-01-01")

# Getting all tickers in SP500
stock_list <- tq_index("SP500") %>%
    arrange(symbol) %>%
    mutate(symbol = case_when(symbol == "BRK.B" ~ "BRK-B", #repairing names that cause errors
                              symbol == "BF.B" ~ "BF-B",
                              TRUE ~ as.character(symbol))) %>%
    pull(symbol)

# This function gets the data for a ticker and date
get_data <- function (ticker, from) {
    df <- tq_get(ticker, from = from) %>% mutate(symbol = rep(ticker, length(date)))
    return(df)
}

#Getting S&P500 data set
SNP_data <-
    1:length(stock_list) %>% map(~get_data(ticker = stock_list[[.]], from = date)) %>% bind_rows()
save(SNP_data, file = "data/SNP_data.Rda")

#Calculating returns and sd 
#and imputing missing returns using Nico's impute_missing_returns function
SNP_returns <- left_join(
    SNP_data %>%
        group_by(symbol) %>%
        arrange(date) %>%
        mutate(return = log(adjusted/dplyr::lag(adjusted))) %>%
        dplyr::filter(date>first(date)) %>%
        select(date, symbol, return) %>%
        spread(symbol, return) %>%
        impute_missing_returns(impute_returns_method = "Drawn_Distribution_Own") %>% #Imputing missing returns
        gather(symbol, return, -date) %>%
        group_by(symbol) %>%
        mutate(sd = sd(return, na.rm = T)*sqrt(252)) %>%  # calculate annualized SD
        ungroup(),
    SNP_data %>%
        select(date, symbol, volume, adjusted),
    by = c("date", "symbol")
)
save(SNP_returns, file = "data/SNP_returns.Rda")
```

#### Labeling SNP Data into Rally, Normal and Stressed Markets
This chunk calculates the sharp ratios for random equally weighted portfolios of 50 assets over a random time slice of 252 days since 2000/01/01 from the SNP_return data set. The portfolios are then labels as follows:

- ‘stressed market’: A market is ‘stressed’ whenever the equi-weighted basket of stocks has a Sharpe below -0.5 over the year of study (252 trading days).

- ‘rally market’: A market is ‘rallying’ whenever the equi-weighted basket of stocks under has a Sharpe above 2 over the year of study (252 trading days).

- ‘normal market’: A market is ‘normal’ whenever the equi-weighted basket of stocks under has a Sharpe in-between -0.5 and 2 over the year of study (252 trading days).

Note that this methodology is consistent with that used in https://marti.ai/qfin/2020/02/03/sp500-sharpe-vs-corrmats.html.

```{r sharp, eval=FALSE, echo=TRUE}
load("data/SNP_returns.Rda")

#Packages used in chunk
library(pacman)
p_load(tidyverse, furrr, PerformanceAnalytics, tbl2xts, rmsfuns, lubridate, fitHeavyTail)

#This function generates random portfolios with option to supply sharp ratio, 
#sharp = TRUE takes substantially longer to calculate.
gen_random_port <- function(dim = 100, sharp = TRUE) {
    #list of Assets from which sample
    symbols <- SNP_returns %>%
        dplyr::filter(date==first(date)) %>%
        pull(symbol)
    dim <- dim
    sample_symbols <- sample(symbols, dim)

    #Dates from which to sample
    dates <- SNP_returns %>%
        dplyr::filter(symbol == "A") %>%
        select(date)
    indx <- sample.int(nrow(dates) - 252, 1)
    start_date <- dates[indx,]
    end_date <- dates[indx + 252,]
    sample_dates <- dates %>% dplyr::filter(date >= start_date[[1]] &
                   date <= end_date[[1]]) %>% pull()

    if(sharp == FALSE){
        training_data <- list(dates = list(sample_dates),
                              symbols = list(sample_symbols)) %>% as_tibble()
    }else
        if(sharp == TRUE){
            #setting rebalance months
            RebMonths <- c(1,4,7,10)
            EQweights <-
                SNP_returns %>%
                dplyr::filter(symbol %in% sample_symbols &
                                  date %in% sample_dates) %>%
                select(date, symbol, return) %>%
                mutate(Months = as.numeric(format(date, format = "%m")),
                       YearMonths = as.numeric(format(date, format = "%Y%m"))) %>%
                dplyr::filter(Months %in% RebMonths) %>%
                group_by(YearMonths, Months, symbol) %>%
                dplyr::filter(date == last(date)) %>%
                ungroup() %>%
                group_by(date) %>%
                mutate(weight = 1/n()) %>%
                select(date, symbol, weight) %>%
                spread(symbol, weight) %>% ungroup() %>%
                tbl_xts()

            # Return wide data
            Returns <-
                SNP_returns %>%
                dplyr::filter(symbol %in% sample_symbols &
                                  date %in% sample_dates) %>%
                select(date, symbol, return) %>%
                spread(symbol, return) %>% tbl_xts()

            #calculating portfolio returns
            EW_RetPort <- rmsfuns::Safe_Return.portfolio(Returns,
                                               weights = EQweights, lag_weights = TRUE,
                                               verbose = TRUE, contribution = TRUE,
                                               value = 100, geometric = TRUE)
            Sharp <- SharpeRatio(EW_RetPort$returns, FUN="StdDev", annualize = TRUE)

            training_data <- list(dates = list(sample_dates),
                                  symbols = list(sample_symbols),
                                  sharp = Sharp[1]) %>% as_tibble()
        }
    return(training_data)
}

#------------------------------------------
#First I generate the labeled training data
#------------------------------------------

#Generating N random portfolios, with Sharp ratio's.
# The furrr packages is used to speed up computation
set.seed(5245214)
training_data_sharp <-
    1:500 %>% map_dfr(~gen_random_port(dim = 50))
save(training_data_sharp, file = "data/training_data_sharp.Rda")

load("data/training_data_sharp.Rda")

# Separating by sharp ratio into market types.
stressed_market <- 
  training_data_sharp %>% 
  dplyr::filter(sharp < -0.5) 

rally_market <- 
  training_data_sharp %>% 
  dplyr::filter(sharp > 2) 

normal_market <- 
  training_data_sharp %>% 
  dplyr::filter(sharp >= -0.5 & sharp <= 2)

#This function gathers the SNP data corresponding to the portfolios described above
get_market_data <- function(index_df, i){
        SNP_returns %>%
        select(date, symbol, return) %>%
            dplyr::filter(symbol %in% index_df$symbols[[i]] &
                       date %in% index_df$dates[[i]])
}

#Separating portfolio by sharp ratio
stressed_market_data <- 
  1:nrow(stressed_market) %>% 
  map(~get_market_data(stressed_market, .x))

rally_market_data <- 
  1:nrow(rally_market) %>% 
  map(~get_market_data(rally_market, .x))

normal_market_data <- 
  1:nrow(normal_market) %>% 
  map(~get_market_data(normal_market, .x))

#Calculating Covariance matrix using the fitHeavyTail method
calc_cov <- function(df, i){
    df[[i]] %>% select(date, symbol, return) %>%
        spread(symbol, return) %>% select(-date) %>%
        fitHeavyTail::fit_mvt() %>% .$cov
}

stressed_market_cov <- 
  1:length(stressed_market_data) %>% 
  map(~calc_cov(stressed_market_data, .x))

rally_market_cov <- 
  1:length(rally_market_data) %>% 
  map(~calc_cov(rally_market_data, .x))

normal_market_cov <- 
  1:length(normal_market_data) %>% 
  map(~calc_cov(normal_market_data, .x))

# Creating and saving a data frame of covariance matrices.
cov_mats <- list(cov_normal = normal_market_cov,
                  cov_stressed = stressed_market_cov,
                  cov_rally = rally_market_cov)
save(cov_mats, file = "data/cov_mats.Rda")

# Creating and saving a data frame of correlation matrices.
cor_mats <- 
  list(
    cor_normal = 1:length(normal_market_cov) %>% 
                       map(~cov2cor(normal_market_cov[[.x]])),
    cor_stressed = 1:length(stressed_market_cov) %>% 
                       map(~cov2cor(stressed_market_cov[[.x]])),
    cor_rally = 1:length(rally_market_cov) %>% 
                       map(~cov2cor(rally_market_cov[[.x]]))
  )
save(cor_mats, file = "data/cor_mats.Rda")


#------------------------------------------
#Now to generate unlabelled training data
#Hence why sharp = FALSE in gen_random_port.
# This section has been scrapped for now;
# corrgan may only be implemented at a later stage
#------------------------------------------
training_data_indx <-
    1:500 %>% map_dfr(~gen_random_port(dim = 50, sharp = FALSE))

market_data <- 1:nrow(training_data) %>% map(~get_market_data(training_data, .x))

training_data <- 1:length(market_data) %>% map(~calc_cor(market_data, .x))


save(training_data, file = "data/training_data.Rda")

```

#### An Example of Each Type of Emperical Correlation Matrix 

Note that they are converted from covariance matrices to correlation matrices before being plotted.

```{r showing some training data, echo=FALSE}
load("data/cov_mats.Rda")

cov_mats$cov_rally[[1]] %>% cov2cor() %>% 
  ggcorrplot::ggcorrplot(title = "Rally Market", hc.order = TRUE)
cov_mats$cov_normal[[1]] %>% cov2cor() %>% 
  ggcorrplot::ggcorrplot(title = "Normal Market", hc.order = TRUE)
cov_mats$cov_stressed[[1]] %>% cov2cor() %>% 
  ggcorrplot::ggcorrplot(title = "Stressed Market", hc.order = TRUE)

```

# Building the Monte Carlo 

## Step 1: Draw a series of random uniformly distributed numbers across a set of variables with a specified dependence structure.

### Generating Random Draws Using Various Copulas

This section explores the feasibility of using various types of copulas as part of the Monte Carlo framework.

#### Elliptal copulas

The Gaussian and t copulas from the Elliptal family allow us to specify their correlation matrix before randomly selecting observations from their multivariate distribution. Doing so allows one to produce random draws of uniformly distributed variables that adhere to the user specified correlation structure. The chunk of code below demonstrates this functionality. 

```{r Elliptal Copulas, warning=FALSE}
#loading copula package; already loaded
# pacman::p_load(copula)

#generating toy corr matrix
corr <- gen_corr(D = 50, clusters = "overlapping", num_clusters = c(10, 5, 2))

#generating normal and student t copula objects   
Ncop <- ellipCopula(family = "normal", dispstr = "un", param = P2p(corr), dim = 50)
Tcop <- ellipCopula(family = "t", dispstr = "un", param = P2p(corr), dim = 50)

#generating 252 random draws for each of the dim = 50 variables. 
set.seed(123)
rn <- rCopula(copula = Ncop, n = 252)
rt <- rCopula(copula = Tcop, n = 252)

#Checking if the correlation structure was maintained
p_load(patchwork)
# Original corr
p1 <- ggcorrplot::ggcorrplot(corr, hc.order = TRUE) + 
  labs(title = "Input Correlation Matrix") +
  scale_x_discrete(labels = NULL) + scale_y_discrete(labels = NULL) +
  theme(legend.position = "bottom")
# corr from random draws form norm and t copula
p2 <- fitHeavyTail::fit_mvt(rn) %>% .$cov %>% cov2cor() %>% 
  ggcorrplot::ggcorrplot(hc.order = TRUE) + 
  labs(subtitle = "Normal Copula") +
  scale_x_discrete(labels = NULL) + scale_y_discrete(labels = NULL) +
  theme(legend.position = "none")

p3 <- fitHeavyTail::fit_mvt(rt) %>% .$cov %>% cov2cor() %>% 
  ggcorrplot::ggcorrplot(hc.order = TRUE) + 
  labs(subtitle = "T-Copula") +
  scale_x_discrete(labels = NULL) + scale_y_discrete(labels = NULL) +
  theme(legend.position = "none")

#Printing summary output
#Note that all the variables appear uniformly distributed
p_load(printr)
summary(rn)

p1
#Notice that the underlying correlation structure has, for the most part, been maintained.
# Though, some noise has been introduced
(p2+p3) + plot_annotation(title = "Output Correlation Matrices") +
  plot_layout(guides='collect') &
  theme(legend.position='bottom')  


```

### Archimedean Copulas

This work searched for a method to produce simulated markets with high left-tail dependence. This would in theory enable the simulation of markets in which the correlation between assets increases when when overall asset returns are low. (i.e. during crisis periods) 

Unfortunately, Elliptal copulas cannot be calibrated to exhibit dynamic correlations (i.e left-tail dependence). Therefore, in this section we examine some properties of Archimedean copulas which do exhibit tail dependence. Unfortunately, Archimedean copulas don't allow us to stipulate their correlation matrix. 

Archimedean copulas such as the Clayton, Frank, Gumbel and Joe exhibit higher levels of dependence at the tails of the multivariate distribution. In this section we will examine the Clayton copula due to it exhibiting enhanced left-tail dependencies. Other copulas have been examined, but the Clayton copula is currently the only Archimedean copula, in the copula package, that allows random sampling from multivariate distributions with with large dimensions. Additionally, the other copulas have to be rotated 180 degrees to exhibit left-tail dependence (since they naturally posses right-tail dependence). This rotation step can be quite time consuming and often fails in higher dimensions. Therefore, the choice to focus on only using the Clayton copula was largely due to practical concerns raised when tinkering with alternative copulas. Therefore, in future work it would be beneficial to explore alternative high dimensional left-tail copulas. 


```{r Archimedean copula, warning=TRUE}
# first look at at dim=2 to get understanding of what parameter tning does

#Clayton Copula
claycop <- archmCopula(family = "clayton", param = 2, dim = 2)
claycop2 <- archmCopula(family = "clayton", param = 4, dim = 2)
claycop3 <- archmCopula(family = "clayton", param = 6, dim = 2)
#Normal Copula
Ncop <- ellipCopula(family = "normal", dispstr = "un", param = 0.5, dim = 2)
# rCopula(251, claycop)

#note how left tail dependence increases with the parameter along with the overall correlation
#param = 2
persp(claycop, main = "Clayon Copula - param = 2", dCopula, zlim = c(0, 15), theta = 18)
#param = 4
persp(claycop2, main = "Clayon Copula - param = 4", dCopula, zlim = c(0, 15), theta = 18)
#param = 6
persp(claycop3, main = "Clayon Copula - param = 6", dCopula, zlim = c(0, 15), theta = 18)
#Normal Copula
persp(Ncop, main = "Normal Copula - cor = 0.5", dCopula, zlim = c(0, 15), theta = 18)

# Compairing Clayton with Normal copula
bind_rows(rCopula(5000, copula = claycop) %>% as_tibble() %>% mutate(copula = "claycop"),
          rCopula(5000, copula = Ncop) %>% as_tibble() %>% mutate(copula = "normal")) %>% 
  ggplot(aes(x=V1,y=V2)) +
        geom_point(alpha=0.5) +
        geom_density_2d_filled(alpha=0.7) +
        facet_wrap(~copula, nrow = 1) +
        labs(title = "2D kernal Density - Clayton vs Normal Copula",
             caption = "Clayton parameter = 2, Normal correlation = 0.5") +
        theme_bw() +
        theme(legend.position = "bottom")
```

Out of interest some code written when experimenting with alternative Archimedian copulas, deemed unusable, is supplied below:

```{r other arch cops, eval=FALSE, include=TRUE}

#Note that the Gumbel and Joe copulas must be rotated 180 degrees to exhibit greater left tail dependence

#-----------------------------------------------
#Gumbel Copula
#This copula function breaks when using dim >= 7
#-----------------------------------------------
gumcop <- archmCopula(family = "gumbel", param = 2, dim = 2) %>% rotCopula()
persp(gumcop, dCopula, zlim = c(0, 10))
rCopula(1000, copula = gumcop) %>% plot(main = "Gumbel Copula")


#-----------------------------------------------------------------
# Joe copula
#This copulas computation time is unacceptable long with dim = 10
#and doesnt work when dim = 50
#----------------------------------------------------------------
joecop <- archmCopula(family = "joe", param = 3, dim = 50) %>% rotCopula()
persp(joecop, dCopula, zlim = c(0, 10))
rCopula(1000, copula = joecop) %>% plot()

#--------------------------------
#Galambos
#This  function breaks when dim>2
#--------------------------------
galcop <- evCopula(family = "galambos", param = 2, dim = 2) %>% rotCopula()
persp(galcop, dCopula, zlim = c(0, 10))
rCopula(1000, galcop) %>% plot()

```

## Hybrid Copulas

Since this project wanted to include the left-tail dependence property from the Clayton copula and the ability to specify the correlation matrix from the Elliptal copulas, there was an investigation into the creation of hybrid copulas.

@ruenzi states that flexible copula structures an be created through convex combinations of basic copulas. The authors use the 2D case and Tawn's theorem (below) to construct hybrid bivariate copulas. 

Tawn (1988) "shows that a copula is a convex set and every convex combination of existing copula functions is again a copula" [@ruenzi2011, p.8].

Thus, if  $C_1(U_N)$  and  $C_2(U_N)$ are 2D copulas and $w$ is a weighting variable between 0 and 1, then 

<center> $$C(U_N)=w.C_1(U_N)+(1-w)C_2(U_N)$$ </center>

is a unique copula. Therefore, a hybrid copula $C(U_N)$ can be created by linearly weighting an Elliptical and Archimedean copula of the dimension 2. 

This work attempted to generalize this finding into higher dimensional copulas. First lets look at some 2D examples. We will investigate if this works in higher dimensions in Section \ref{test_sim_inno}.

### Generatimg Some 2D Hybrid Copulas.
This chunk of code creates and plots some pseudo hybrid copulas by linearly weighting the random numbers generated from a student-t and Clayton copula. Note how when increasing the Clayton weight, observations concentrate at the bottom left of the density plots.

```{r hybrid, eval=FALSE, include=FALSE}
#this chunk takes very long to knit due to large graph size; Therefore  {eval=FALSE, include=FALSE} options are used
#Plotting Function
plot_hy_cop <- function (corr, df, clayton_param, w1, subtitle) {
  
  #Elliptical  Copulas
  Tcop <- ellipCopula(family = "t", dispstr = "un", param = corr, dim = 2, df = df)
  
  #left tail dependence copulas
  claycop <- archmCopula(family = "clayton", param = clayton_param, dim = 2)

  #Generating Random Draws
  data <- (1-w1)*rCopula(10000, Tcop) + w1*rCopula(10000, claycop)
  colnames(data) <- glue::glue("V{1:ncol(data)}")
  data <- as_tibble(data)
  
  #Plotting 2D Density
  data %>% ggplot(aes(x=V1, y=V2)) +
    geom_point(alpha = 0.3) +
    geom_density_2d_filled(alpha = 0.6) +
    coord_cartesian(xlim = c(0,1), ylim = c(0,1)) +
    theme_bw() +
    labs(subtitle = subtitle) +
    theme(legend.position = "none")
}

#looking at different weights
library(patchwork)
set.seed(897454)

#setting default parameters, much easier than changing it every time
params <- list(cor = 0.1, df = 5, clayton_param = 4, w1 = 0.25)
attach(params)

p1 <- seq(0,1, length.out = 6) %>% map(~plot_hy_cop(corr = cor, df = df,
                                                    clayton_param = clayton_param, w1 = .x,
                                                    subtitle = glue("Clayton Weight = {.x}")))
plot_1 <- (p1[[1]]|p1[[2]])/(p1[[3]]|p1[[4]])/(p1[[5]]|p1[[6]]) + 
  plot_annotation(title = "2d Density Plot - Normal + Clayton Copula",
                  caption = glue("Clayton Parameter = {clayton_param}, Correlaltion = {cor}, t-copula df = {df}"))

p2 <- seq(0,1, length.out = 6) %>% map(~plot_hy_cop(corr = .x, df = df,
                                                    clayton_param = clayton_param, w1 = w1,
                                                    subtitle = glue("Correlation = {.x}")))
plot_2 <- (p2[[1]]|p2[[2]])/(p2[[3]]|p2[[4]])/(p2[[5]]|p2[[6]]) + 
  plot_annotation(title = "2d Density Plot - Normal + Clayton Copula",
                  caption = glue("Clayton Weight = {w1}, Clayton Parameter = {clayton_param}, t-copula df = {df}"))

p3 <- seq(1,1000, length.out = 6) %>% map(~plot_hy_cop(corr = cor, df = df,
                                                    clayton_param = .x, w1 = w1,
                                                    subtitle = glue("Clayton Parameter = {.x}")))
plot_3 <- (p3[[1]]|p3[[2]])/(p3[[3]]|p3[[4]])/(p3[[5]]|p3[[6]]) + 
  plot_annotation(title = "2d Density Plot - Normal + Clayton Copula",
                  caption = glue("Clayton Weight = {w1}, Correlaltion = {cor}, t-copula df = {df}"))

p4 <- seq(1,30, length.out = 6) %>% map(~plot_hy_cop(corr = cor, df = .x, 
                                                     clayton_param = clayton_param, w1 = w1,
                                                     subtitle = glue("T Copula df = {.x}")))
plot_4 <- (p4[[1]]|p4[[2]])/(p4[[3]]|p4[[4]])/(p4[[5]]|p4[[6]]) + 
  plot_annotation(title = "2d Density Plot - Normal + Clayton Copula",
                  caption = glue("Clayton Weight = {w1}, Clayton Parameter = {clayton_param}, Correlaltion = {cor}"))

#Saving Graphical objects
ggsave(plot_1, filename = "plots/plot_1.png", width = 8, height = 8)
ggsave(plot_2, filename = "plots/plot_2.png", width = 8, height = 8)
ggsave(plot_3, filename = "plots/plot_3.png", width = 8, height = 8)
ggsave(plot_4, filename = "plots/plot_4.png", width = 8, height = 8)

```
![Plot 1.](plots/plot_1.png)
![Plot 2.](plots/plot_2.png)
![Plot 3.](plots/plot_3.png)
![Plot 4.](plots/plot_4.png)

Remember that each variable is still currently uniformly distributed.
```{r echo=FALSE, warning=FALSE}
rCopula(10000, copula = claycop) %>% as_tibble() %>% gather(Key, Value) %>% 
  ggplot(aes(x = Value, fill = Key)) +
  geom_histogram(bins = 30) +
  xlim(c(0,1)) +
  facet_wrap(~Key, nrow = 2) +
  labs(title = "Hiostogram", subtitle = "Variables are Uniformly Distrinuted") +
  theme_bw() +
  theme(legend.position = "none")

```

This section has successfully demonstrated how to use copulas to simulate random uniformly distributed data, with some that adhered to the given correlation matrix. It also found a potential way to simulate data with hybrid copulas. 

# Step 2: Converting the uniformly distributed variables to something that better resembles the distribution of asset returns.

## Looking at options for marginal distributions
This section looks at some candidate pdf for the univariate asset returns. Due to convenience, it has become standard to use the normal, or student-t distribution when simulating asset returns. However, after reading up on numerous possible marginal distributions, I decided that the the Skewed generalized t distribution is a valuable alternative as it allows for the most flexibility. In fact, the skewed generalized t (SGT) distribution nests 12 common probability distribution functions (pdf). The tree diagram below indicates how one can set the SGT parameters to achieve the desired pdf. MCmarkets functions will include arguments to induce the marginals to be uniform, normal, student-t and SGT distributed. 

![Plot 5.](plots/SGTtree.png)

### The skewed generalizd t distribution with different parameters
The code below demonstrates how the p, q and $\lambda$ functions influence the SGT distribution. Note how the SGT's parameters allow one to effect the skewness and kurtosis of the distribution. 
```{r, echo=FALSE}
library(pacman, tidyverse)
p_load(sgt)
#-----------------------------------------
#Looking at the Skewnes parameter in sgt distribution
#-----------------------------------------

x <- seq(-4, 4, length=100)
hx <- dnorm(x)

degf <- -c(seq(0,0.45,length = 4))
colors <- c("red", "blue", "darkgreen", "gold", "black")
labels <- c(glue::glue("Param = {degf}"), "normal")

plot(x, hx, type="l", lty=2, xlab="x value",
  ylab="Density", main="Comparison of SGD Distributions with different lambda's", ylim = c(0,0.5))

for (i in 1:4){
  lines(x, dsgt(x,lambda = degf[i]), lwd=2, col=colors[i])
}

legend("topright", inset=.05, title="Distributions",
  labels, lwd=2, lty=c(1, 1, 1, 1, 2), col=colors)

#-----------------------------------------
#now looking at p and q parameters (kurtosis) with lambda = 0.2
#-----------------------------------------

#altering p
x <- seq(-4, 4, length=100)
hx <- dnorm(x)

degf <- c(seq(1.01,2,length = 4))
colors <- c("red", "blue", "darkgreen", "gold", "black")
labels <- c(glue::glue("Param = {degf}"), "normal")

plot(x, hx, type="l", lty=2, xlab="x value",
  ylab="Density", main="Comparison of SGD Distributions with different p parameters", ylim = c(0,0.8))

for (i in 1:4){
  lines(x, dsgt(x,p = degf[i]), lwd=2, col=colors[i])
}

legend("topright", inset=.05, title="Distributions",
  labels, lwd=2, lty=c(1, 1, 1, 1, 2), col=colors)

#altering q
x <- seq(-4, 4, length=100)
hx <- dnorm(x)

degf <- round(c(seq(1.5,4,length = 4)), digits = 2) 
colors <- c("red", "blue", "darkgreen", "gold", "black")
labels <- c(glue::glue("Param = {degf}"), "normal")

plot(x, hx, type="l", lty=2, xlab="x value",
  ylab="Density", main="Comparison of SGD Distributions with different q parameters", ylim = c(0,0.8))

for (i in 1:4){
  lines(x, dsgt(x,q = degf[i]), lwd=2, col=colors[i])
}

legend("topleft", inset=.05, title="Distributions",
  labels, lwd=2, lty=c(1, 1, 1, 1, 2), col=colors)
```

### Here I demonstrate how uniformly distributed observations can be transformed into other distributions.

```{r}
# Generating a uniform (0,1) series
set.seed(1234)
unif <- as_tibble(runif(1000))
unif %>% ggplot(aes(x = value)) +
  geom_histogram() + theme_bw()

# transforming into normal(0,1) distribution
norm <- qnorm(unif$value) %>% as_tibble()
norm  %>% ggplot(aes(x = value)) +
  geom_histogram() + theme_bw()

# transforming into t distribution with df=3
t <- qt(unif$value, df = 3) %>% as_tibble()
t  %>% ggplot(aes(x = value)) +
  geom_histogram() + theme_bw()

# transforming into SGT| left skewed with excess kurtosis
sgt <-
  qsgt(
    unif$value,
    mu = 1,
    sigma = 1,
    lambda = -0.4,
    p = 1,
    q = 10000
  ) %>% as_tibble()
sgt  %>% ggplot(aes(x = value)) +
  geom_histogram() + theme_bw()


```

```{r cleaning workspace, include=FALSE}
rm(list =ls())
gc()

```

# Step 3: Introducing Volatility Persistence 
 
The simulated innovations do not yet demonstrate the mean and/or volatility persistence observed in real asset return series. Therefore, they can be thought of as the innovations of some time-series model/process. In this step I introduce mean and variance persistence using an ARMA(p,q) + APGARCH(q,p) model. 

#### Inreresting artical on garch modeling and forecastig
- "The leptokurtosis, clustering volatility and leverage effects characteristics of financial time series justifies the GARCH modelling approach. The non-linear characteristic of the time series is used to check the Brownian motion and investigate into the temporal evolutionary patterns. The non-linear methods of forecasting and signal analysis are gaining popularity in stock market because of their robustness in feature extraction and classification." source: https://towardsdatascience.com/garch-processes-monte-carlo-simulations-for-analytical-forecast-27edf77b2787

## sim_garch

This function introduces mean and variance persistence to a series of i.i.d random numbers. Note that most of the code was re-purposed from fGarch's garchspec and garchsim functions. 

```{r sim_garch, eval=FALSE, include=TRUE}
#' @title sim_garch
#' @description This function takes a vector of random numbers, referred to as innovations, and
#' induces mean and variance persistence by inserting them into an ARMA(1,1) + APARCH(1,1) model.
#' @param innovations a vector containing the random numbers or the innovations of the
#' ARIMA + GARCH process.
#' @param omega a positive value defining the coefficient of the variance equation. The default is 5e-04.
#' @param gamma a value defining the APARCH leverage parameter in the variance equation. The default
#' of 0 implies no leverage effect and therefore, corresponds to the standard GARCH model.
#' @param alpha a value defining the autoregressive variance coefficient. The default is 0.
#' @param beta a value defining the variance coefficient. The default is 0.
#' @param mu  a value defining the mean. The default is 0.
#' @param ar  a value defining the autoregressive ARMA coefficient. The default is 0.
#' @param ma a value defining the moving average ARMA coefficient. The default is 0.
#' @param delta a strictly positive value defining the delta parameter of the APARCH model. The default is 2,
#' which corresponds to the standard GARCH model.
#' @param simple a logical parameter indicating if the output should be a simple vector containing just the
#' resulting ARIMA + GARCH series, or if FALSE a three column dataframe containing z - the innovations, h - the
#'  conditional variance and y - ARMA + APARCH series.
#'
#' @return if simple = TRUE a vector of the resulting ARMA + APARCH series.
#'
#' If simple = FALSE a three column dataframe containing z - the innovations, h - the conditional variance and y - ARMA +
#' APARCH series. Note the length of the resulting series will be one observation less than that of the innovations.
#' The ARMA(1,1) + APARCH(1,1) model effectively consumes this lag when producing its first value.
#' @note  (1) It is suggested that the randomly distributed numbers have a mean of zero and standard
#' deviation of one, as these attributes can be set within sim_garch.
#'
#' (2) For more information on the ARMA + APARCH parameters see:
#'
#' Ruppert, D. and Matteson, D.S., 2011. Statistics and data analysis for financial engineering (Vol. 13).
#' New York: Springer.
#'
#' @importFrom dplyr tibble
#'
#' @examples
#'
#' \dontrun{
#'
#' library(tidyverse)
#'
#' ### creating series of 501 innovations
#' inno <-  rnorm(501)
#'
#' ### This produces a ARMA + APARCH series of length 500.
#' GARCH <- sim_garch(inno,
#'                    mu = 0.000002,
#'                    omega = 0.00005,
#'                    alpha = 0.098839,
#'                    beta = 0.899506,
#'                    ar = 0.063666,
#'                    ma = NULL,
#'                    gamma = 0.12194,
#'                    delta = 1.85,
#'                    simple = FALSE)
#'
#'  ### Visualising the resulting series
#'  GARCH %>% mutate(period = 1:n()) %>%
#'        gather(key, value, -period) %>%
#'        ggplot() +
#'        geom_line(aes(x = period, y = value, color = key)) +
#'        facet_wrap(~key, scales = "free_y", ncol = 1)
#'
#' }
#' @export

sim_garch <- function(innovations,
                      omega = NULL, alpha = NULL, gamma = NULL, beta = NULL,
                      mu = NULL, ar = NULL, ma = NULL, delta = NULL,
                      simple = TRUE) {

    # Creating model list with defaults
    model <- list(omega = ifelse(is.null(omega), 5e-04, omega),
                  alpha = ifelse(is.null(alpha), 0, alpha),
                  gamma = ifelse(is.null(gamma), 0, gamma),
                  beta = ifelse(is.null(beta), 0, beta),
                  mu = ifelse(is.null(mu), 0, mu),   #changed form NULL to 0
                  ar = ifelse(is.null(ar), 0, ar),
                  ma = ifelse(is.null(ma), 0, ma),
                  delta = ifelse(is.null(delta), 2, delta))


    #obtaining parameters and lag orders from model object
    mu <- model$mu
    ar <- model$ar
    ma <- model$ma
    omega <-  model$omega
    alpha <- model$alpha
    gamma <- model$gamma
    beta <- model$beta
    delta <- model$delta
    deltainv <- 1/delta
    order.ar <- length(ar)
    order.ma <- length(ma)
    order.alpha <- length(alpha)
    order.beta <- length(beta)
    max.order <- max(order.ar, order.ma, order.alpha, order.beta)
    n <- length(innovations) - 1

    #Generating innovations
    z_length <- n + max.order
    z <- c(innovations)[(2-max.order):length(innovations)]  #z is the vector of random innovation

    h <- c(rep(model$omega/(1 - sum(model$alpha) - sum(model$beta)),
               times = max.order), rep(NA, n))    #h is the conditional standard deviation

    y <- c(rep(model$mu/(1 - sum(model$ar)), times = max.order), rep(NA, n))  #Observations from the resulting garch process
    m <- max.order

    #Inducing ARIMA + GARCH
    eps <- h^deltainv * z  #this part often breaks depending on GARCH parameters chosen (must be stationary)

    for (i in (m + 1):(n + m)) {
        h[i] = omega + sum(alpha * (abs(eps[i - (1:order.alpha)]) -
                                        gamma * (eps[i - (1:order.alpha)]))^delta) +
            sum(beta * h[i - (1:order.beta)])

        eps[i] = h[i]^deltainv * z[i]
        y[i] = mu + sum(ar * y[i - (1:order.ar)]) + sum(ma * eps[i - (1:order.ma)]) + eps[i]
    }

    if(simple == TRUE) {
        data <- c(NA, y[(m + 1):(n + m)]) #removes burn in data and only provides ARIMA + GARCH series.
    } else {
        data <- tibble(z = c(NA, z[(m + 1):(n + m)]),  # Innovations
                       h = c(NA, h[(m + 1):(n + m)]^deltainv), #Conditional Standard deviation
                       y = c(NA, y[(m + 1):(n + m)]))  # ARIMA + GARCH series
    }
    return(data)
}

```

## Demonstrating sim_garch

Note that:

1) volatility clusters and significant autocorrelation appear in the series after it is processed through the sim_garch function. 

2) sim_garch is a deterministic function.  

```{r}
set.seed(32156454)
# Simulating a series of random SGT numbers
inno <- sgt::rsgt(n = 10001, lambda = -0.0143, p = 1.6650, q = 1.9095)

# Introducing Mean and Var Persistence
#Parameters from Statistics and Data Analysis for Financial Engineering pg.421-423
return <- sim_garch(inno, 
                    omega = 0.000005, #key unconditional volatility parameter
                    alpha = 0.098839, 
                    gamma = 0, 
                    beta = 0.899506, 
                    mu = 0.899506, 
                    ar = 0.063666, 
                    ma = NULL, 
                    delta = 1.85, 
                    simple = TRUE)

p_load(patchwork)
p1 <- inno %>% as_tibble() %>% ggplot(aes(x=1:length(inno), y=value)) +
  geom_line() + theme_bw() + labs(subtitle = "Random Draws From SGT Distribution", x = "", y = "Innovations")

p2 <- return %>% as_tibble() %>% ggplot(aes(x=1:length(return), y=value)) +
  geom_line() + theme_bw() + labs(subtitle = "Same Random Draws After sim_garch", x = "", y = "Returns")
  
p1/p2 + plot_annotation(title = "SGT Innovations vs APGARCH Returns")

p1 <- ggAcf(inno^2)+ theme_bw()+ labs(title = "ACF of Squared Innovations")
p2 <- ggAcf(return^2) + theme_bw() + labs(title = "ACF of Squared Returns")
p1/p2

```


```{r include=FALSE, eval=TRUE}
rm(list = ls()) # Clean your environment:
gc()
```


## Simulating Returns
This section introduces the sim_market function, which is designed to carry out the first 3 steps of this Monte Carlo framework.

But first some toy examples that demonstrate the inner workings of sim_market are provided.

### Toy Example: Step 1, 2 and 3

```{r step 1}
p_load(MCmarket)

# -------------------
# Step 1:  Draw a series of random uniformly distributed numbers across a set of variables with a specified dependence structure.
# ------------------
D <- 10 # Dimention
k <- 1000 # number of periods

# Create correlation matrix.
corr <- gen_corr(D, cluster = "non-overlapping", num_clusters = 2)

# Specify and copula and simulate uniformly distributed numbers.
ncop <- ellipCopula(family = "normal", dispstr = "un", param = P2p(corr), dim = D)
set.seed(1234)
data <- rCopula(k, ncop)

# Creating a date vector
dates <- rmsfuns::dateconverter(StartDate = Sys.Date(),
                                EndDate = Sys.Date() %m+% lubridate::days(k-1),
                                Transform = "alldays")
# Making Tidy & adding date column
data <- as_tibble(data) %>% 
  purrr::set_names(glue::glue("Asset_{1:ncol(data)}")%>% as.character()) %>%
        mutate(date = dates) %>%
        gather(Asset, Value, -date)

# Visualization

data %>% ggplot() +
  geom_histogram(aes(x = Value, color = "Asset"), bins = 15) +
  facet_wrap(~Asset, nrow = 2) + 
  theme(legend.position = "none")

# ----------------
# Step 2: Converting the uniformly distributed variables to something that better resembles the distribution of asset returns.
# ---------------
# First set the mean and sd for each asset
args <- tibble(Asset = glue::glue("Asset_{1:D}") %>% as.character()) %>%
        mutate(mean = 0,  # Setting mean
               sd = 1)    # and sd;  can also be a vector of length = n.o assets

# Then convert Uniform marginal distributions to norm
data <- data %>% left_join(., args, by = "Asset") %>%
            group_by(Asset) %>%  arrange(date) %>%
            mutate(Return =  qnorm(Value, mean, sd)) %>%
            select(date, Asset, Return) 


# Visualizing the transformations
data %>% group_by(Asset) %>% mutate(mean = mean(Return),
                                    sd  = sd(Return))

data %>% ggplot() +
  geom_histogram(aes(x = Return, color = "Asset"), bins = 15) +
  facet_wrap(~Asset, nrow = 2, scales = "free_x") + 
  theme(legend.position = "none")

# For comparison from step 3
data %>% ggplot() +
  geom_line(aes(x=date, y=Return, color = Asset)) + 
  facet_wrap(~Asset, scales = "free_y", ncol = 2) + 
  theme(legend.position = "none")

# ---------------
# Step 3: Introducing Volatility Persistence 
# ---------------
# Tibble with with garh parameters and defaults
ts_args <- tibble(Asset = glue::glue("Asset_{1:D}") %>% as.character()) %>%
            mutate(omega = 1,
                   alpha = seq(0, 0.999, length.out = 10),
                   gamma =0,
                   beta = 0,
                   mu = rnorm(10, 0.02, 0.01),   #changed from NULL to 0
                   ar = 0,
                   ma = 0,
                   delta = 1.98)

# Applying time series parameters to step 2 data
data <- data %>% left_join(., ts_args, by = "Asset") %>%
            arrange(date) %>% group_by(Asset) %>%
            mutate(Return = sim_garch(innovations = Return,
                                      omega = omega,
                                      alpha = alpha,
                                      gamma = gamma,
                                      beta = beta,
                                      mu = mu,
                                      ar = ar,
                                      ma = ma,
                                      delta = delta)) %>% na.omit() %>%
            select(date, Asset, Return)

# Visualising returns ts
data %>% ggplot() +
  geom_line(aes(x=date, y=Return, color = Asset)) + 
  facet_wrap(~Asset, scales = "free_y", ncol = 2) + 
  theme(legend.position = "none")

# ------------------------------------
# Steps 1 and 2 with hybrid copula
# ------------------------------------

# Creating copula onjects
ncop <- ellipCopula(family = "normal", dispstr = "un", param = P2p(corr), dim = D)
acop <- archmCopula(family = "clayton", param = 4,  dim = D)

clayton_weight <- 0.5
data <- (clayton_weight*rCopula(copula = acop, n =  1000)) + (1-clayton_weight)*rCopula(copula = ncop, n = 1000)

set.seed(1234)
data <- as_tibble(data) %>% 
  purrr::set_names(glue::glue("Asset_{1:ncol(data)}")%>% as.character()) %>%
        mutate(date = dates) %>%
        gather(Asset, Value, -date) 

args <- tibble(Asset = glue::glue("Asset_{1:D}") %>% as.character()) %>%
        mutate(mean = 0,  # Setting mean
               sd = 1)    # and sd;  can also be a vector of length = n.o assets

# Then convert Uniform marginal distributions to norm
data <- data %>% left_join(., args, by = "Asset") %>%
            group_by(Asset) %>%  arrange(date) %>%
            mutate(Return =  qnorm(Value, mean, sd)) %>%
            select(date, Asset, Return) 

# SD can't be set as intended. Not sure what is going on (more on this later)
data %>% group_by(Asset) %>% mutate(mean = mean(Return),
                                    sd  = sd(Return))


```

### sim_market

This is the main function in the MC market package, It performs the first three steps of the Monte Carlo process. See the Roxygen documentation below for further detail.

For the purpose of this readme this version of sim_inno is slightly different to that in the final version of MCmarket. This version allow one to create hybrid multivariate copulas, this functionality is tested in section \lable{test_sim_inno}. 

```{r old sim_inno function, eval=TRUE, include=TRUE}
#' @title sim_market
#' @description This function produces a series of returns for an asset market with a
#' given correlation matrix. The user can choose between the multivariate t and normal
#' distributions and adjust the markets left tail dependency by weighting in the Clayton copula.
#' The univariate asset return distributions can also be set to normal, student-t or sgt
#' distributions. Finally, mean and variance persistence can be induced via the parameters of an
#' ARMA + APARCH model.
#' @note  It is suggested that, if the ts_model is used, then the marginal distributions be left
#' as list(mu = 0, sd = 1). These attributes are better off being set in the ts_model argument.
#' @param corr a correlation matrix specifying the correlation structure of the simulated data.
#' Note that the number of variables simulated is equal to the number of columns in the correlation matrix.
#' @param k a positive integer indicating the number of time periods to simulate.
#' @param mv_dist a string specifying the multivariate distribution. Can be one of c("norm", "t")
#' which correspond to the multivariate normal and t distributions respectively.
#' @param mv_df degrees of freedom of the multivariate distribution, required when mv_dist = "t". Default is 3.
#' @param left_cop_weight a positive value between zero and one stipulating the weight applied to
#' the Clayton copula when simulating the multivariate distribution. Note that a value between zero
#' and one essentially generates a hybrid distribution between the chosen mv_dist and the Clayton
#' copula. Therefore, the greater the left_cop_weight the less the data will reflect the correlation
#' structure. Default is set to 0.
#' @param left_cop_param a positive value indicating the parameter of the Clayton copula. Default is 4.
#' @param marginal_dist a string variable specifying the asset returns univariate distribution. Can
#' be one of c("norm", "t", "sgt") referring to the normal, student-t and skewed-generalized-t
#' distributions respectively. Default is "norm".
#' @param  marginal_dist_model list containing the relevant parameters for the chosen marginal_dist.
#'
#' marginal_dist = "norm" accepts the mean (mu) and standard deviation (sd) arguments with their respective
#' defaults set to list(mu = 0, sd = 1).
#'
#' marginal_dist = "t" accepts the non-centrality parameter (ncp) and degrees of freedom (df) arguments,
#' default values are list(ncp = 0, df = 5).
#'
#' marginal_dist = "sgt" accepts the mean (mu), standard deviation (sd), lambda, p and q parameters
#' list(mu = 0, sigma = 1, lambda, p, q). Note lambda, p and q have no defaults and must therefore be set
#' by the user. For more information see the documentation on the qsgt function from the sgt pacakge.
#' @param ts_model a list containing various ARMA + APGARCH model parameters allowing one to specify
#' the time series properties of the simulated returns. Note that parameter combinations resulting
#' in non-stationary of the mean or variance will produce NAN's and that the maximum lag allowed for
#' any given parameter is 1.
#'
#' The default is ts_model = NULL, in which case time series time series properties are not induced, however if
#' ts_model = list() then the default values are list(omega = 5e-04, alpha = 0, gamma = NULL, beta = 0, mu = 0,
#' ar = NULL, ma = NULL, delta = 2). In order to set different parameters for each asset simply insert a vector
#' of length equal to the number of assets, the first element of the vector will correspond to Asset_1, the 2nd
#' to Asset_2 ect...
#'
#' See the sim_garch function's documentation and the "model" parameter under fGarch::garchSpec() for more details
#' regarding the parameters themselves.
#'
#' @return a tidy tibble containing a date, Asset and Return column.
#'
#' @importFrom copula P2p ellipCopula archmCopula rCopula
#' @importFrom glue glue
#' @importFrom sgt qsgt
#' @importFrom lubridate '%m+%' days
#' @importFrom tidyr gather
#' @import dplyr
#' @import purrr
#'
#' @examples
#'
#' \dontrun{
#'
#' library(tidyverse)
#'
#' ### creating a correlation matrix of 50 assets to use as an input in sim_asset_market.
#' corr <- gen_corr(N = 20, Clusters = "none")
#'
#' ### simulating 500 periods of returns across 50 assets.
#' set.seed(12345)
#' market_data <-
#'          sim_market(corr,
#'                      k = 500,
#'                      mv_dist = "norm",
#'                      left_cop_weight = 0.1,
#'                      marginal_dist = "norm",
#'                      ts_model = list(mu = 0.000002,
#'                                      omega = 0.00005,
#'                                      alpha = 0.098839,
#'                                      beta = 0.899506,
#'                                      ar = 0.063666,
#'                                      ma = NULL,
#'                                      gamma = 0.12194,
#'                                      delta = 1.85))
#'
#'  ### Visualising the market
#'  market_data %>%
#'  group_by(Asset) %>%
#'  mutate(cum_ret = 100*cumprod(1 + Return)) %>%
#'  ggplot() +
#'  geom_line(aes(x = date, y = cum_ret, color = Asset)) +
#'  facet_wrap(~Asset) +
#'  theme(legend.position = "none")
#'
#' }
#' @export

sim_market <- function(corr,
                       k = 252,
                       mv_dist = "t",
                       mv_df = 4,
                       left_cop_weight = 0,
                       left_cop_param = 5,
                       marginal_dist = "norm",
                       marginal_dist_model = NULL,
                       ts_model = NULL) {

    N <- nrow(corr)
    k <- k + 1   # extra room for sim_garch to as a lag.
    Cor <- P2p(corr)

    # Specifying  Copulas
    # elliptical
    if(!(mv_dist %in% c("norm", "t"))) stop("Please supply a valid argument for mv_dist")
    else
        if (mv_dist == "t") {
            if (is.null(mv_df)) stop('Please supply a valid degrees of freedom parameter when using mv_dist = "t".')
            Ecop <- ellipCopula(family = "t",
                                dispstr = "un",
                                df = mv_df,
                                param = Cor,
                                dim = N)
        } else
            if (mv_dist == "norm") {
                Ecop <- ellipCopula(family = "normal",
                                    dispstr = "un",
                                    param = Cor,
                                    dim = N)
            }

    # Left-cop (Archemedian copula)
    if (left_cop_weight < 0|left_cop_weight > 1) stop("Please provide a valid left_cop_weight between 0 and 1")
    if (left_cop_weight != 0) {
        Acop <- archmCopula(family = "clayton",
                            param = left_cop_param,
                            dim = N)
    }

    # Generating random (uniformly distributed) draws from hybrid copula's
    if (left_cop_weight == 0) {
        data <- rCopula(k, Ecop)
    } else
        if(left_cop_weight == 1) {
            data <- rCopula(k, Acop)
        } else {
            data <- left_cop_weight*rCopula(k, Acop) + (1-left_cop_weight)*rCopula(k, Ecop)
        }


    # Creating a date vector
    start_date <- Sys.Date()
    dates <- rmsfuns::dateconverter(StartDate = start_date,
                                    EndDate = start_date %m+% lubridate::days(k-1),
                                    Transform = "alldays")

    # Making Tidy & adding date column
    data <- as_tibble(data) %>%
        purrr::set_names(glue::glue("Asset_{1:ncol(data)}")) %>%
        mutate(date = dates) %>%
        gather(Asset, Value, -date)


    if (!(marginal_dist %in% c("norm", "t", "sgt", "unif"))) stop ("Please supply a valid marginal_dist argument")

    if (marginal_dist == "unif") return(data)

    # Warnings
    if (marginal_dist == "norm" & is.null(marginal_dist_model)) marginal_dist_model <- list(mu=0, sd = 1)
    if (marginal_dist == "t" & is.null(marginal_dist_model))  marginal_dist_model <- list(ncp = 0, df = 5)
    if (marginal_dist == "sgt" & is.null(marginal_dist_model)) stop ('Please supply a valid marginal_dist_model when using marginal_dist="sgt".')

    # Converting Uniform marginal distributions to norm, t or sgt.
    args <- tibble(Asset = glue::glue("Asset_{1:N}")) %>%
        mutate(mean = marginal_dist_model$mu,
               sd = marginal_dist_model$sd,
               ncp = marginal_dist_model$ncp,
               df = marginal_dist_model$df,
               lambda = marginal_dist_model$lambda,
               p = marginal_dist_model$p,
               q = marginal_dist_model$q)

    if (marginal_dist == "norm") {

        if(is.null(marginal_dist_model$mu)) stop('Please supply a valid mu parameter when using marginal_dist = "norm".')
        if(is.null(marginal_dist_model$sd)) stop('Please supply a valid sd parameter when using marginal_dist = "norm".')

        data <- data %>% left_join(., args, by = "Asset") %>%
            group_by(Asset) %>%  arrange(date) %>%
            mutate(Return =  qnorm(Value, mean, sd)) %>%
            select(date, Asset, Return)

    } else
        if (marginal_dist == "t") {

            if(is.null(marginal_dist_model$ncp)) stop('Please supply a valid ncp parameter when using marginal_dist = "t".')
            if(is.null(marginal_dist_model$df)) stop('Please supply a valid df parameter when using marginal_dist = "t".')

            data <- data %>% left_join(., args, by = "Asset") %>%
                group_by(Asset) %>%  arrange(date) %>%
                mutate(Return = qt(Value, df =  df, ncp =  ncp)) %>%
                select(date, Asset, Return)

        } else
            if (marginal_dist == "sgt") {

                if (is.null(marginal_dist_model$mu)) marginal_dist_model$mu <- 0
                if (is.null(marginal_dist_model$sd)) marginal_dist_model$sd <- 1
                if (is.null(marginal_dist_model$lambda)|
                    is.null(marginal_dist_model$p)|
                    is.null(marginal_dist_model$q)) stop('Please supply valid arguments for lambda, p and q when using marginal_dist = "sgt".')

                data <- data %>% left_join(., args, by = "Asset") %>%
                    group_by(Asset) %>% arrange(date) %>%
                    mutate(Return = qsgt(Value, mean, sd, lambda, p, q)) %>%
                    select(date, Asset, Return)

            }

    if (is.null(ts_model)) {

        data <- data %>% dplyr::filter(date > first(date))
        return(data)

    } else {

        # Warnings
        if (!is.null(ts_model$omega) & length(ts_model$omega) != 1 & length(ts_model$omega) != N) stop("Please supply a valid vector length for omega. Must be of length 1 or ncol(corr).")
        if (!is.null(ts_model$alpha) & length(ts_model$alpha) != 1 & length(ts_model$alpha) != N) stop("Please supply a valid vector length for alpha. Must be of length 1 or ncol(corr).")
        if (!is.null(ts_model$gamma) & length(ts_model$gamma) != 1 & length(ts_model$gamma) != N) stop("Please supply a valid vector length for gamma. Must be of length 1 or ncol(corr).")
        if (!is.null(ts_model$beta) & length(ts_model$beta) != 1 & length(ts_model$beta) != N) stop("Please supply a valid vector length for beta. Must be of length 1 or ncol(corr).")
        if (!is.null(ts_model$mu) & length(ts_model$mu) != 1 & length(ts_model$mu) != N) stop("Please supply a valid vector length for mu. Must be of length 1 or ncol(corr).")
        if (!is.null(ts_model$ar) & length(ts_model$ar) != 1 & length(ts_model$ar) != N) stop("Please supply a valid vector length for ar. Must be of length 1 or ncol(corr).")
        if (!is.null(ts_model$ma) & length(ts_model$ma) != 1 & length(ts_model$ma) != N) stop("Please supply a valid vector length for ma. Must be of length 1 or ncol(corr).")
        if (!is.null(ts_model$delta) & length(ts_model$delta) != 1 & length(ts_model$delta) != N) stop("Please supply a valid vector length for delta. Must be of length 1 or ncol(corr).")

        # Inducing mean and/or variance persistence

        # Tibble with with garh parameters and defaults
        ts_args <- tibble(Asset = glue::glue("Asset_{1:N}")) %>%
            mutate(omega = if (is.null(ts_model$omega)) {5e-04} else {ts_model$omega},
                   alpha = if (is.null(ts_model$alpha)) {0} else {ts_model$alpha},
                   gamma = if (is.null(ts_model$gamma)) {0}  else {ts_model$gamma},
                   beta = if (is.null(ts_model$beta)) {0} else {ts_model$beta},
                   mu = if (is.null(ts_model$mu)) {0} else {ts_model$mu},   #changed form NULL to 0
                   ar = if (is.null(ts_model$ar)) {0} else {ts_model$ar},
                   ma = if (is.null(ts_model$ma)) {0} else {ts_model$ma},
                   delta = if (is.null(ts_model$delta)) {2} else {ts_model$delta})

        # Inducing garch properties
        data <- data %>% left_join(., ts_args, by = "Asset") %>%
            arrange(date) %>% group_by(Asset) %>%
            mutate(Return = sim_garch(innovations = Return,
                                      omega = omega,
                                      alpha = alpha,
                                      gamma = gamma,
                                      beta = beta,
                                      mu = mu,
                                      ar = ar,
                                      ma = ma,
                                      delta = delta)) %>% na.omit() %>%
            select(date, Asset, Return)
        return(data)
    }
}
```

### Testing sim_inno \lable{test_sim_inno}

This code tests sim_inno and demonstrates its functionality. 

```{r sim_inno test, fig.width=10, fig.height=10, fig.align=TRUE}


Corr <- gen_corr(D = 50, clusters = "overlapping", num_clusters = c(10,5,2))
Corr %>% ggcorrplot::ggcorrplot()
# ----------------------------------------
# Simulating data with marginal_dist="norm"
# ----------------------------------------
set.seed(872154)
data_norm <- sim_market(corr = Corr, 
                 mv_dist = "t",
                 mv_df = 4,
                 marginal_dist = "norm",
                 marginal_dist_model = list(mu = 0, sd = 0.02311859),
                 k = 500)

# ----------------------------------------
# Simulating data with marginal_dist="sgt"
# ----------------------------------------
set.seed(872154)
sgt_pars <- list(mu = 0, sd = 1, lambda = -0.1, p = 1000, q =3)
data_sgt <- sim_market(corr = Corr, 
                 mv_dist = "t",
                 mv_df = 4,
                 marginal_dist = "sgt",
                 marginal_dist_model = sgt_pars,
                 k = 500)
```

The above seems to work very well, but does it work with left_cop_weight =! 0? i.e. does the MC framework function as intended when using a hybrid copula. 

```{r}
set.seed(123)
data_nlc <- sim_market(corr = Corr, 
                       mv_dist = "t",
                       mv_df = 4,
                       left_cop_param = 5,
                       left_cop_weight = 0,  # Only t copula
                       marginal_dist = "norm",
                       marginal_dist_model = list(mu = 0, sd = 1),
                       k = 500)
set.seed(123)
data_lc <- sim_market(corr = Corr, 
                      mv_dist = "t",
                      mv_df = 4,
                      left_cop_param = 5,
                      left_cop_weight = 0.5,  # t - Clayton copula hybrid
                      marginal_dist = "norm",
                      marginal_dist_model = list(mu = 0, sd = 1),
                      k = 500)
set.seed(123)
data_lco <- sim_market(corr = Corr, 
                      mv_dist = "t",
                      mv_df = 4,
                      left_cop_param = 5,
                      left_cop_weight = 1,  # Only Clayton copula
                      marginal_dist = "norm",
                      marginal_dist_model = list(mu = 0, sd = 1),
                      k = 500)

# checking sd's: They are Not being set correctly in the hybrid copula
data_nlc %>% group_by(Asset) %>% mutate(sd = sd(Return)) # sd = 1
data_lc %>% group_by(Asset) %>% mutate(sd = sd(Return)) # sd approx = 0.58
data_lco %>% group_by(Asset) %>% mutate(sd = sd(Return)) # sd approx = 1

```

The above code demonstrates that the SD of asset returns is not being set correctly when trying to simulate from hybrid copulas. This may be because Tawn's theorem only applied to bivariate copulas. Will be investigated in the future, but for now the hybrid copula functionality is left out fo MCmarkets functions.

# Step 4: Simulating an Ensemble of Asset Markets

This step turns the first three steps discussed thus far into a Monte Carlo simulation. This is done by simply repeating the first three steps a number of times. 
The mc_market function performs the Monte Carlo simulation of asset markets.

## mc_market

```{r mc_market, eval=FALSE, include=TRUE}
#' @title mc_market
#' @description This function performs a Monte Carlo simulation by iterating over the the sim_market function N times.
#' It is intended for users who are not comfortable using the purrr::map functions.
#' @note (1) see ?sim_market for information on the other parameters.
#'
#' (2) See examples under sim_market for instructions on how to add an on-screen progress bar when performing
#' the Monte Carlo simulation, this is recommended for simulations with N >1000 since they can take a number of
#' minuets to complete.
#' @param N a positive integer indicating the number of markets to simulate.
#'
#' @param list a logical value indicating whether the output should be a list of tibbles or a single long tibble (see return).
#' Due to memory issues associated with list = FALSE, list = TRUE is recommended for N > 500. List = FALSE is
#' best used for tidy output that can easily be plotted with ggplot2 (see example).
#'
#' @return if list = TRUE (default), a list of length N where each entry contains a tidy tibble with a date,
#' Asset and Return column. Else if list = FALSE a single tidy tibble with date, Universe, Asset and Return columns.
#'
#'
#' @importFrom tidyr gather
#' @importFrom dplyr progress_estimated
#' @import dplyr
#' @import purrr
#'
#' @examples
#'
#' \dontrun{
#'
#' library(tidyverse)
#'
#' ### creating a correlation matrix to use as input in sim_asset_market
#' corr <- gen_corr(D = 20, clusters = "none")
#'
#' ### simulating 550 periods of returns across 50 assets and 100 universes
#' set.seed(12542)
#' market_data <- sim_asset_market(corr,
#'                                 k = 550,
#'                                 mv_dist = "norm",
#'                                 marginal_dist = "norm",
#'                                 ts_model = list(mu = 0.000002,
#'                                                 omega = 0.00005,
#'                                                 alpha = 0.098839,
#'                                                 beta = 0.899506,
#'                                                 ar = 0.063666,
#'                                                 ma = NULL,
#'                                                 gamma = 0.12194,
#'                                                 delta = 1.85))
#'
#'  ### Visualizing the market
#'  market_data %>% group_by(Asset) %>%
#'  mutate(cum_ret = 100*cumprod(1 + Return)) %>%
#'          ggplot() +
#'          geom_line(aes(x = date, y = cum_ret, color = Asset)) +
#'          facet_wrap(~Asset) +
#'          theme(legend.position = "none")
#'
#' }
#' @export
mc_market <- function(corr,
                      N = 100,
                      k = 252,
                      mv_dist = "t",
                      mv_df = 3,
                      clayton_param = 4,
                      marginal_dist = "norm",
                      marginal_dist_model = NULL,
                      ts_model = NULL,
                      list = TRUE) {
    if (list == TRUE) {

        pb <- dplyr::progress_estimated(N) # setting length of progress bar

        1:N %>%
            map(
                ~ sim_market(
                    corr = corr,
                    k = k,
                    mv_dist = mv_dist,
                    mv_df = mv_df,
                    clayton_param = clayton_param,
                    marginal_dist = marginal_dist,
                    marginal_dist_model = marginal_dist_model,
                    ts_model = ts_model
                )
            )

    } else
        if (list == FALSE) {
            1:N %>%
                map_dfr(
                    ~ sim_market(
                        corr = corr,
                        k = k,
                        mv_dist = mv_dist,
                        mv_df = mv_df,
                        clayton_param = clayton_param,
                        marginal_dist = marginal_dist,
                        marginal_dist_model = marginal_dist_model,
                        ts_model = ts_model
                    ),
                    .id = "Universe"
                )
        }

}

```

## Visualising the Monte Carlo Market

```{r}
#---------
#toy corr
#--------
corr <- MCmarket::gen_corr(D = 20, clusters = "overlapping", num_clusters = c(2,5,4))
#--------------
#emperical corr
#--------------
#load("data/labeled_training_data.Rda")
#set.seed(96472)
#idx <- sample(1:50, size = 20)
#corr <- labeled_training_data$normal_market[[1]] %>% .[idx, idx] 
#-----------------------------
#Monte Carlo market simulation
#-----------------------------
set.seed(521554)
mc_data <- mc_market(corr, N = 10, 
                     k = 500, 
                     marginal_dist = "t",
                     ts_model = list(), # Using default ts_model arguments
                     list = FALSE)

# Plotting results
mc_data %>% 
  group_by(Asset, Universe) %>% 
  arrange(date) %>% 
  mutate(cum_ret = cumprod(1 + Return)*100) %>% 
  ggplot() +
  geom_line(aes(x = date, y = cum_ret, color = Universe), size = 1, alpha = 0.5) + 
  facet_wrap(~Asset, scales = "free_y") + 
  labs(title = "Ensemble of Cumulative Returns",
       subtitle = "50 Realizations for a Market of 20 Assets") +
  theme_bw()+
  theme(legend.position = "none") 
```

### sim_market_with_progress Example

One produce a progress bar when mapping over sim_market.

```{r sim_market with progress bar example}
rm(sim_market)
library(MCmarket)
N <- 100
pb <- dplyr::progress_estimated(N)   # this must be named pb
market <-
       map_dfr(1:N,
               ~sim_market(diag(10), progress = TRUE),
               .id = "Universe")

object.size(x = market) %>% print(units = "Mb")

```

# Readme Summary

This readme accomplished its aim in developing a framework or the Monte Carlo simulation of financial asset markets. There was an investigation into using hybrid copulas to create markets with varying levels of left tail dependency, however, the this proves unsuccessful in high dimensions. Simulated markets can be have the normal, t or Clayton multivariate distributions. The assets returns can be normal, student t or SGT distributed and can exhibit numerous APARCH properties. In addition, markets with the multivariate normal or t distribution can be created to have any correlation matrix.