-
Notifications
You must be signed in to change notification settings - Fork 0
/
rChapter5-3.Rmd
162 lines (116 loc) · 5.21 KB
/
rChapter5-3.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
---
title: "Cross-tabulation of clusters"
description: |
Chapter 5.3 Cross-tabulation of groups from different dissimilarity matrices
output: distill::distill_article
---
```{r setup, include=FALSE}
# Load required packages
library(here)
source(here("source", "load_libraries.R"))
# Output options
knitr::opts_chunk$set(eval=TRUE, echo=TRUE)
options("kableExtra.html.bsTable" = T)
# load data for Chapter 5
load(here("data", "5-0_ChapterSetup.RData"))
```
```{r, xaringanExtra-clipboard, echo=FALSE}
htmltools::tagList(
xaringanExtra::use_clipboard(
button_text = "<i class=\"fa fa-clone fa-2x\" style=\"color: #301e64\"></i>",
success_text = "<i class=\"fa fa-check fa-2x\" style=\"color: #90BE6D\"></i>",
error_text = "<i class=\"fa fa-times fa-2x\" style=\"color: #F94144\"></i>"
),
rmarkdown::html_dependency_font_awesome()
)
```
<details><summary>**Click here to get instructions...**</summary>
- Please download and unzip the replication files for Chapter 5
([`r fontawesome::fa("far fa-file-zipper")` Chapter05.zip](source/Chapter05.zip)).
- Read `readme.html` and run `5-0_ChapterSetup.R`. This will create `5-0_ChapterSetup.RData` in the sub folder `data/R`. This file contains the data required to produce the plots shown below.
- You also have to add the function `legend_large_box` to your environment in order to render the tweaked version of the legend described below. You find this file in the `source` folder of the unzipped Chapter 5 archive.
- We also recommend to load the libraries listed in Chapter 5's `LoadInstallPackages.R`
```{r, eval=FALSE}
# assuming you are working within .Rproj environment
library(here)
# install (if necessary) and load other required packages
source(here("source", "load_libraries.R"))
# load environment generated in "5-0_ChapterSetup.R"
load(here("data", "R", "5-0_ChapterSetup.RData"))
```
</details>
\
In chapter 5.3, we introduce one of the options to account for the parallel unfolding of temporal processes: the cross-tabulation of cluster solutions extracted separately from two (or more) pools of sequences representing the trajectories in different domains. We are now using the `data.frame` `multidim`, which contains both family formation and labour market sequences. The data come from a sub-sample of the German Family Panel - pairfam. For further information on the study and on how to access the full scientific use file see [here](https://www.pairfam.de/en/){target="_blank"}.
## Preparatory work for family formation trajectories
First, we run a Ward cluster analysis based on the dissimilarity matrix `mc.fam.year.om`:
```{r, eval=TRUE, echo=TRUE}
fam.ward<-hclust(as.dist(mc.fam.year.om),
method="ward.D",
members=multidim$weight40)
```
... to be used as initialization of the PAM clustering
```{r, eval=TRUE, echo=TRUE}
fam.pam <- wcKMedRange(mc.fam.year.om,
weights = multidim$weight40,
kvals = 2:10,
initialclust = fam.ward)
```
We now extract 5 clusters...
```{r, eval=TRUE, echo=TRUE}
fam.pam.5cl <- fam.pam$clustering$cluster5
```
...attach the cluster info to the main `data.frame` `multidim`...
```{r, eval=TRUE, echo=TRUE}
multidim$fam.pam.5cl<-fam.pam.5cl
```
... and re-label clusters from 1 to 5 instead of medoid identifiers...
```{r, eval=TRUE, echo=TRUE}
fam.pam.5cl.factor <- factor(fam.pam.5cl,
levels = c(16, 460, 479, 892, 898),
c("1", "2", "3", "4", "5"))
```
...to finally attach the factor info to the main `data.frame` `multidim`:
```{r, eval=TRUE, echo=TRUE}
multidim$fam.pam.5cl.factor<-fam.pam.5cl.factor
```
## Preparatory work for labor market trajectories
First, we run a Ward cluster analysis based on the dissimilarity matrix `mc.act.year.om`:
```{r, eval=TRUE, echo=TRUE}
act.ward<-hclust(as.dist(mc.act.year.om),
method="ward.D",
members=multidim$weight40)
```
... to be used as initialization of the PAM clustering
```{r, eval=TRUE, echo=TRUE}
act.pam <- wcKMedRange(mc.act.year.om,
weights = multidim$weight40,
kvals = 2:10,
initialclust = act.ward)
```
We now extract 5 clusters...
```{r, eval=TRUE, echo=TRUE}
act.pam.5cl <- act.pam$clustering$cluster5
```
...attach the cluster info to the main `data.frame` `multidim`...
```{r, eval=TRUE, echo=TRUE}
multidim$act.pam.5cl<-act.pam.5cl
```
... and re-label clusters from 1 to 5 instead of medoid identifiers...
```{r, eval=TRUE, echo=TRUE}
act.pam.5cl.factor <- factor(act.pam.5cl,
levels = c(6, 25, 78, 539, 709),
c("1", "2", "3", "4", "5"))
```
...to finally attach the factor info to the main `data.frame` `multidim`
```{r, eval=TRUE, echo=TRUE}
multidim$act.pam.5cl.factor<-act.pam.5cl.factor
```
## Cross-tabulation for a 5-cluster solution on both channels
Tabulate the two vectors and store the results in an object that we name `crosstab`...
```{r, eval=TRUE, echo=TRUE}
crosstab<-table(multidim$act.pam.5cl.factor, multidim$fam.pam.5cl.factor)
```
...to print it at our convenience:
```{r, eval=TRUE, echo=TRUE}
crosstab
```