rChapter5-4.Rmd

---
title: "Summation and averaging"
description: |
  Chapter 5.4 Combining domain-specific dissimilarities
output: distill::distill_article
---

```{r setup, include=FALSE}

# Load required packages
library(here)
source(here("source", "load_libraries.R"))

# Output options
knitr::opts_chunk$set(eval=TRUE, echo=TRUE)
options("kableExtra.html.bsTable" = T)

# load data for Chapter 5
load(here("data", "5-0_ChapterSetup.RData"))

```


```{r, xaringanExtra-clipboard, echo=FALSE}
htmltools::tagList(
  xaringanExtra::use_clipboard(
    button_text = "<i class=\"fa fa-clone fa-2x\" style=\"color: #301e64\"></i>",
    success_text = "<i class=\"fa fa-check fa-2x\" style=\"color: #90BE6D\"></i>",
    error_text = "<i class=\"fa fa-times fa-2x\" style=\"color: #F94144\"></i>"
  ),
  rmarkdown::html_dependency_font_awesome()
)
```

<details><summary>**Click here to get instructions...**</summary>

- Please download and unzip the replication files for Chapter 5
([`r fontawesome::fa("far fa-file-zipper")` Chapter05.zip](source/Chapter05.zip)). 
- Read `readme.html` and run `5-0_ChapterSetup.R`. This will create `5-0_ChapterSetup.RData` in the sub folder `data/R`. This file contains the data required to produce the plots shown below. 
- You also have to add the function `legend_large_box` to your environment in order to render the tweaked version of the legend described below. You find this file in the `source` folder of the unzipped Chapter 5 archive.
- We also recommend to load the libraries listed in Chapter 5's `LoadInstallPackages.R`

```{r, eval=FALSE}
# assuming you are working within .Rproj environment
library(here)

# install (if necessary) and load other required packages
source(here("source", "load_libraries.R"))

# load environment generated in "5-0_ChapterSetup.R"
load(here("data", "R", "5-0_ChapterSetup.RData"))

```
</details>

\

In chapter 5.3, we introduce another option to account for the parallel unfolding of temporal processes: the clustering on a dissimilarity matrix that results from the summation or the averaging of pairwise dissimilarity matrices computed on separately on two (or more) pools of sequences representing the trajectories in different domains. We are now using the `data.frame` `multidim`, which contains both family formation and labour market sequences. The data come from a sub-sample of the German Family Panel - pairfam. For further information on the study and on how to access the full scientific use file see [here](https://www.pairfam.de/en/){target="_blank"}.

## Summation 

We first have to construct an object `X` that contains the dissimilarity matrices in a sequence

```{r, eval=TRUE, echo=TRUE}
X <- list(mc.fam.year.om, mc.act.year.om)
```

We then use the `?cbind` command to combine the `X` object in rows and columns and generate the object `Y`

```{r, eval=TRUE, echo=TRUE}
Y <- do.call(cbind, X)
```

We overwrite `Y` by using the `?array` command to give the right dimensions to the object so that the next steps can be performed

```{r, eval=TRUE, echo=TRUE}
Y <- array(Y, dim=c(dim(X[[1]]), length(X)))
```

We finally use the `?apply` command to apply a summation function (option `sum`) to the two dimensions of the object `Y` (1 and 2). We store the resulting dissimilarity matrix in an object called `mc.summation`

```{r, eval=TRUE, echo=TRUE}
mc.summation<-apply(Y, c(1, 2), sum, na.rm = TRUE)
```

## Averaging 

For sake of clarity, we construct another object `Z` that contains the dissimilarity matrices in a sequence as for the case of summation above

```{r, eval=TRUE, echo=TRUE}
Z <- list(mc.act.year.om, mc.fam.year.om)
```

We then use the `?cbind` command to combine the Z object in rows and columns and generate the object `W`

```{r, eval=TRUE, echo=TRUE}
W <- do.call(cbind, Z)
```

We overwrite `Y` by using the `?array` command to give the right dimensions to the object so that the next steps can be performed

```{r, eval=TRUE, echo=TRUE}
W <- array(W, dim=c(dim(X[[1]]), length(X)))
```

We finally use the `?apply` command to apply a averaging function (option `mean`) to the two dimensions of the object `W` (1 and 2). We store the resulting dissimilarity matrix in an object called `mc.summation`

```{r, eval=TRUE, echo=TRUE}
mc.average<-apply(W, c(1, 2), mean, na.rm = TRUE)
```

## Inspection of the matrices

Let's first display the dissimilarity matrix between the first three sequences in the sample 

```{r, eval=TRUE, echo=TRUE}
mc.fam.year.om[1:3, 1:3]
```

```{r, eval=TRUE, echo=TRUE}
mc.act.year.om[1:3, 1:3]
```

We now inspect the dissimilarity matrix between the first three sequences in the sample after summation....

```{r, eval=TRUE, echo=TRUE}
mc.summation[1:3, 1:3]
```

... and averaging 

```{r, eval=TRUE, echo=TRUE}
mc.average[1:3, 1:3]
```