-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathrChapter2-1.Rmd
124 lines (91 loc) · 4.91 KB
/
rChapter2-1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
title: "Defining & printing sequence objects"
description: |
Chapter 2.1 Basic Concepts and Terminology
output: distill::distill_article
---
```{r setup, include=FALSE}
# Load required packages
library(here)
source(here("source", "load_libraries.R"))
# Output options
knitr::opts_chunk$set(eval=TRUE, echo=TRUE)
options("kableExtra.html.bsTable" = T)
# load data for Chapter 2
load(here("data", "2-0_ChapterSetup.RData"))
```
```{r, xaringanExtra-clipboard, echo=FALSE}
htmltools::tagList(
xaringanExtra::use_clipboard(
button_text = "<i class=\"fa fa-clone fa-2x\" style=\"color: #301e64\"></i>",
success_text = "<i class=\"fa fa-check fa-2x\" style=\"color: #90BE6D\"></i>",
error_text = "<i class=\"fa fa-times fa-2x\" style=\"color: #F94144\"></i>"
),
rmarkdown::html_dependency_font_awesome()
)
```
<details><summary>**Click here to get instructions...**</summary>
- Please download and unzip the replication files for Chapter 2
([`r fontawesome::fa("far fa-file-zipper")` Chapter02.zip](source/Chapter02.zip)).
- Read `readme.html`
- You don't have to run `2-0_ChapterSetup.R` for this tutorial because we start with importing the raw data stored in the .dta format (Stata)
- We also recommend to load the libraries listed in the Chapter 2's `LoadInstallPackages.R`
```{r, eval=FALSE}
# assuming you are working within .Rproj environment
library(here)
# install (if necessary) and load other required packages
source(here("source", "LoadInstallPackages.R"))
```
</details>
In chapter 2.1, we introduce different notations of sequence data using example data on family biographies from age 18 to 40. The data come from a sub-sample of the German Family Panel - pairfam. For further information on the study and on how to access the full scientific use file see [here](https://www.pairfam.de/en/){target="_blank"}.
## Defining a state sequence object
We generated the example dataset in Stata. Next to the sequence variables it comprises a few additional variables which will be used to analyze the sequences in later chapters.
We import the data to R using the `read_dta` function from the [`{haven}`](https://haven.tidyverse.org/index.html){target="_blank"} package and inspect the names of the imported variables.
```{r, eval = FALSE}
# import data
family <- read_dta(here("data", "Stata", "PartnerBirthbio.dta"))
```
<div class='pre-scrolly'>
```{r, layout="l-body-outset"}
# view variable names
names(family)
```
</div>
\
The sequence variables begin with the prefix `state`. The data comprise 264 sequence variables per person (wide data format). These variables include monthly information on family biographies covering a period of 22 years.
Haven imports them as numeric variables with labels attached to them (`class = "haven_labelled"`). With the follwing commands we can take a look at the labels.
```{r}
str(family$state1)
attributes(family$state1)$labels
```
The first examples in the book are based on sequences with a reduced alphabet only distinguishing partnership states. The following code generates a data set (`seqvars.partner`) containing the recoded sequence variables using [`{dplyr}`](https://dplyr.tidyverse.org/index.html){target="_blank"}.
```{r, eval=FALSE}
# extracting and recoding the sequence variables (which all start with state)
# recode to reduced state space capturing partnership status only
seqvars.partner <- family %>%
select(starts_with("state")) %>%
mutate_all(~(case_when(
. < 3 ~ 1, # Single
. %in% c(3,4) ~ 2, # LAT
. %in% c(5,6) ~ 3, # Cohabiting
. > 6 ~ 4,))) # Married
```
Then we define two vectors storing the long and short labels for the states in the newly defined alphabet. Once that is done, we can define the data as a state sequence object. Most [`{TraMineR}`](http://traminer.unige.ch/){target="_blank"} functions for analyzing sequences require the data to have this format.
```{r}
shortlab.partner <- c("S", "LAT", "COH", "MAR")
longlab.partner <- c("Single", "LAT", "Cohabiting", "Married")
# create state sequence object
partner.month.seq <- seqdef(seqvars.partner,
labels = longlab.partner,
states = shortlab.partner,
weights = family$weight40)
```
Note that the `seqdef` function can include many more optional arguments. Some of these arguments - most importantly `cpal` - affect the appearance of state sequence plots rendered with `seqplot` or `seqplot.rf`. We cover the definition of color palettes on two separate pages ([definition of color palettes](rChapter2-4_color.html); [definition of grayscale palettes](rChapter2-4_grayscale.html))
\
## Sequence data notation
In chapter 2.1, we introduce different notations for printing sequences. The following commands print the sequences in *STS*, *DSS*, and *SPS* format.
```{r}
print(partner.month.seq[8, ], format = "STS")
seqdss(partner.month.seq[8, ])
print(partner.month.seq[8, ], format = "SPS")
```