-
Notifications
You must be signed in to change notification settings - Fork 30
/
assumption checks.R
153 lines (86 loc) · 3.92 KB
/
assumption checks.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
# The assumptions of SEM are similar to those OLS regression:
# 1. Your model is correct.
# For SEM this means:
# A) Linearity (the relationship between variables is linear).
# B) No specification error (all relationships are
# accounted for, and exogenous variables are actually exogenous).
# C) Sequence (the causal relationship modeled is true).
# 2. Independence of errors
# 3. Multivariate normal distribution (Normality of errors)
# 4. Homoscedasticity of errors
#
# I suggest reading Chapter 5 in Kaplan, D. (2008). Structural equation
# modeling: Foundations and extensions (2nd ed.)
# 1) Your model is correct ------------------------------------------------
# No simple way to verify this other than thinking, although plotting can be
# useful to examine linearity.
#' TODO maybe ggeffects?
# 2) Independence of errors -----------------------------------------------
# We have two major sources of dependency of errors:
# 1. Structure of data:
# 1.1. Is the data multilevel? Have we accounted for that?
# Read more: http://lavaan.ugent.be/tutorial/multilevel.html
# 1.2. Is there a temporal nature to the data (repeated measures)?
# Have we accounted for that? (Using latent growth curve / cross-lagged
# panel modeling)
# 2. Errors are correlated (e.g. autocorrelations)
# While the 1st requires us to think, the second can be tested
# with `lavaan::modificationIndices()`.
library(lavaan)
model <- '
# latent variable definitions
dem60 =~ y1 + a*y2 + b*y3 + c*y4
dem65 =~ y5 + a*y6 + b*y7 + c*y8
# regressions
dem65 ~ dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'
fit <- sem(model, data = PoliticalDemocracy)
modificationIndices(fit, sort. = TRUE, minimum.value = 10)
#> lhs op rhs mi epc sepc.lv sepc.all sepc.nox
#> 94 y2 ~~ y6 11.094 2.189 2.189 0.364 0.364
# Looks like the errors of y2 and y6 are correlated,
# and we can consider adding this correlation to our model.
# 3) Multivariate normal distribution -------------------------------------
# Like the assumption of normalicy in OLS regression, this assumption is
# concerned with the normalicy of the residuals, but here these residuals are
# multivariate.
# We can do this with the following function(s):
# To test these, we must supply the original data the model was fit with, so we
# first make a data.frame with ONLY the variables of interest:
model_data <- PoliticalDemocracy[, c("y1", "y2", "y3", "y4", "y5", "y6", "y7", "y8")]
# NOTE: only use the subset of the data that was used!
## Henze-Zirkler Test of Multivariate Normality --------
test_mvn <- MVN::mvn(model_data, mvnTest = "hz")
test_mvn$multivariateNormality
#> Test HZ p value MVN
#> 1 Henze-Zirkler 1.194575 1.110223e-16 NO
# Looks like a deviation from normalicy - however...
## Multivariate (Chi-Squared) QQ-Plot --------
# We can (and should) also look at a multivariate (chisq-)qqplot:
distances <- mahalanobis(model_data,
center = colMeans(model_data),
cov = cov(model_data))
car::qqPlot(distances, ylab = "Mahalanobis distances (Squared)",
distribution = "chisq", df = mean(distances))
# Looks good!
## Multivariate Skewness & Kurtosis --------
# We can also look at multivariate Kurtosis and Skewness if we want:
library(semTools)
mardiaKurtosis(model_data)
#> b2d z p
#> 76.0448884 -1.3539399 0.1757556
mardiaSkew(model_data)
#> b1d chi df p
#> 1.477098e+01 1.846372e+02 1.200000e+02 1.378703e-04
# We can see that multivariate Skewness does not hold...
# What to if we violate multivariate normalicy?
# Use a robut estimator!
# http://lavaan.ugent.be/tutorial/est.html
# 4) Homoscedasticity of errors -------------------------------------------
# I could not find a way to test this.