-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmulti-level.qmd
1367 lines (911 loc) · 62 KB
/
multi-level.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Multilevel Data Modelling"
---
*Recommended literature:*
Snijders, T. A., & Bosker, R. J. (2011). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Sage.
## Load libraries {#sec-load.libraries}
```{r load packages, message=FALSE, results='hide'}
library(lme4)
library(lmerTest)
library(mitml)
library(sjlabelled)
library(tidyverse)
library(stargazer) ##comparing models
library(sjmisc)
library(lmerTest)
library(broom.mixed)
library(sjPlot)
library(glmmTMB)
library(stargazer)
library(modelsummary)
```
## Multi level issue
- Micro units are nested within macro units
- Micro units impact on macro units and vice versa
- Micro units within a macro unit tend to be more similar to each other as compared to other micro units in other macro units
- Observations are not any longer independent of each other
- In statistical analysis this impacts on their variances
- Neglecting this issue leads likely at least to wrong variance estimates, in the worst case to biased estimates
→ Multilevel analysis is a suitable approach to take into account the social contexts as well as the individual respondents or subjects.
#### Sampling
- Cluster sampling, multi-stage sampling are standard
- Secondary units are not selected independently of each other
- having selected the primary units increases the chance of sampling the secondary units → Ignoring may heavily impact on the variance estimates and may thus produce type I error (reject a true null hypothesis)
{width="200"}
Two arguments for multilevel instead of single level regression of disaggregated data:
1. Dependence as a nuisance:
- Standard errors and tests base on single-level regression are suspect because the assumption of independent residuals is invalid.
2. Dependence as an interesting phenomenon:
- It is interesting in itself to disentangle variability at the various levels; moreover, this can give insight in the directions where further explanation may fruitfully be sought.
{width="200"}
#### Modelling
**Hierarchical linear model** is a type of regression analysis for multilevel data where the dependent variable is at the lowest level.
- Explanatory variables can be defined at any level (including aggregates of level-one variables).
- Also longitudinal data can be regarded as a nested structure; for such data the hierarchical linear model is convenient as well (with some particularities).
{width="200"}
### Exercise {#sec-ml.ex}
1. Load library `library(lme4)`
Already done, see @sec-load.libraries.
2. Load the data:
```{r}
DAT <- read.csv2("data/1.Example.txt")
```
3. Have a look at the data
How many observations and variables?
```{r}
str(DAT)
```
In the data set for each pupil is measured:
- which school he/she belongs to (schoolnr)
- what her/ his unique pupil number is (pupuilNR_new)
- what score the pupil reach in his/hers language test (our dependent variable) (langPOST)
- what its socio-economic background is (ses) - a binary variable that indicates the sex, if the pupil belongs to a minority (Minority) - the verbal IQ of the pupil, centered by the mean (IQ_verbal)
And the dataset contains measures for each school
- the schools socio-economic backgrounds (school_ses)
- the verbal IQ of all pupils in a school (school_iqv)
- an index how many pupils in a school belongs to a minority (school_min)
```{r}
length(unique(DAT$pupilNR_new))
```
3454 observations = pupils 11 variables over all. Each pupil has a pupil number, that is unique and is assigned to one of 211 schools.
For overview let's get some inside in the descriptives of each variable:
```{r}
summary(DAT)
```
The schoolnr for the 211 schools ranges from 1 to 259.
```{r}
table(is.na.data.frame(DAT))
```
No missing values in the data frame.
Get insight in the distribution of each variable:
```{r}
hist(DAT$ses)
hist(DAT$sch_ses)
hist(DAT$IQ_verb)
hist(DAT$sch_iqv)
hist(DAT$Minority)
hist(DAT$sch_min)
```
- ses is left scewed,more pupils have a lower ses than the mean,- this is not true for schools
- iqverb is slightly right scewed, range is wider on the negative site than on the possitve, this is also true for distribution of schools
- distribution for the mean IQ has more outliers on the negative side than on the positive side
- only under 10 percent of the pupils belongs to a minority, there are a lot schools where none of the pupils belongs to a minority group, while there are some schools, where all pupils belongs to a minority.
## Random Intercept Model (RIM) or Random Effects Model
- $i$ indicates the individual level-one unit
- $j$ indicates the group, level two unit
- variables for individual $i$ in group $j$:
- $Y_{ij}$
- $x_{ij}$ explanatory variable at level one
- for group $j$
- $z_j$ explanatory at level two
- $n_j$ group size\
**OLS regression model** of $Y$ on $x$ ignoring groups
$$
Y_{ij} = \beta_0 + \beta_1x_{ij}+ R_{ij}
$$
$R_{ij}$ is not the error term, it is the residual term: An error term is generally unobservable and a residual is observable and calculable, making it much easier to quantify and visualize. In effect, while an error term represents the way observed data differs from the actual population, a residual represents the way observed data differs from sample population data.
**group dependent regression model** is dependent on groups, the following notation means, that the units on level 2 (indivudals) have random intercepts
$$
Y_ij = \beta_{0j} + \beta_{1j}x_{ij}+ R_{ij}
$$
In the random intercept model, the intercepts $\beta_{0j}$ are random variables representing random differences between groups, where $\beta_{0j} =$ average intercept $\gamma00$ (for all level two units) plus group-dependent deviation $U_{0j}$
$$
\beta_{0j} = \gamma_{00} + U_{0j}
$$
In this model, the regression coefficient $β_1$ is common to all the groups (otherwise random slope model)
In a fixed effects model each group has own intercept $\beta_{0j}$ to be estimated, because $\beta_{0j} =$ is not a random variable.
In the random intercept model, the constant regression coefficient $β_1$ is sometimes denoted $\gamma_{10}$ →
$$
Y_{ij} = \gamma_{00} + \gamma_{10}x_{ij}+U_{0j}+ R{ij}
$$\
In the hierarchical linear model, the $U_{0j}$ are random variables. The model assumption is that they are independent, normally distributed with expected value $0$, and variance from
$$
\tau^2 = var(U_{0j})
$$
The statistical parameter in the model is not their individual values, but their variance $tau^2$. $U_{0j}$ represent the random intercept for each cluster. It's really a residual term that measures the distance from each subject's intercept around the overall intercept $β_0$. Rather than calculate an estimate for every one of those distances, the model is able to just estimate a single variance. This is the **between group variance**.
The **within-group variance** is measured by the variance of residuals. Their sum is the total variance in $Y$ that is not explained by $X$.
{width="400"}
### Fixed or Random Effects for Groups?
Arguments for choosing between fixed (F) and random (R) coefficient models for the group dummies:
1. If groups are unique entities and inference should focus on these groups: F. This often is the case with a small number of groups. \$rarr; good for countries
2. If groups are regarded as sample from a (perhaps hypothetical) population and inference should focus on this population: R This often is the case with a large number of groups.
3. If group sizes are small and there are many groups, and it is reasonable to assume exchangeability of group-level residuals, then R makes better use of the data.
4. If the researcher is interested only in within-group effects, and is suspicious about the model for between-group differences, then F is more robust.
- In groups we need enough variance and distribution
5. If group effects $U_{0j}$ (etc.) are not nearly normally distributed, R is risky (or use more complicated multilevel models).
A fixed effect and a random effect are two different statistical models used in data analysis, including in the context of meta-analysis and regression analysis. Here is a brief explanation of each:
#### Fixed Effect:
In a fixed effect model, the assumption is that the effects being studied are constant across all levels of the categorical variable(s) being considered. It assumes that the observed differences are due to the characteristics of the entities being studied, rather than being random or varying across different groups. In a fixed effect model, the parameters being estimated are specific to the entities or groups being analyzed. The fixed effect model is appropriate when the goal is to estimate the specific effects of each entity or group and to make inferences about those specific entities or groups. helen says
#### Random Effect:
In a random effect model, the assumption is that the effects being studied are not constant across all levels of the categorical variable(s) being considered. It assumes that the observed differences are due to a combination of the characteristics of the entities being studied and random variation. In a random effect model, the parameters being estimated represent the average effects across all entities or groups being analyzed, as well as the variability between entities or groups. The random effect model is appropriate when the goal is to estimate the average effects across all entities or groups and to make inferences about the population of entities or groups. It is important to choose the appropriate model based on the research question and the characteristics of the data. The choice between fixed effect and random effect models depends on the underlying assumptions and the goals of the analysis.
Here are some additional resources that may be helpful: pubmed.ncbi.nlm.nih.gov A basic introduction to fixed-effect and random-effects models ... - PubMed PMID: 26061376 DOI: 10.1002/jrsm.12 Abstract There are two popular statistical models for meta-analysis, the fixed-effect model and the random-effects model...
### Intraclass Correlation Coefficent
To deduce if multi-level is necessary & appropriate use the intraclass correlation coefficient.
If their is no correlation among the observations within a cluster, cluster means won't differ (no between-group variance)
Start with the empty model (random effects ANOVA) is a model without explanatory variables:
$$
Y_{ij} = \gamma_{00} + U_{0j} + R_{ij}
$$
Variance decomposition:
$$
var (Y_{ij}) = var (U_{0j}) + var (R_{ij}) =\tau^2 + \sigma^2
$$
Can the covariates on individual level be explained by the group j? Check for **intraclass correlation coefficient (ICC)**. The ratio of the between-cluster variance to the total variance is called the Intraclass Correlation. If the correlation is zero, then nothing of the variance comes in addition.
- Covariance between two individuals (i ̸= k) in the same group j :
$$
cov(Y_{ij} ,Y_{kj}) = var(U_{0j}) = \tau^2
$$
And their correlation with the intraclass correlation coefficient (ICC).
$$
ρ(Y_{ij} ,Y_{kj}) = ρ(Y) =\frac{\tau^2}{\tau^2 + \sigma^2}
$$
It is the proportion what is left unexplained in only looking at the single level. The higher the covariance is, the more multi level analysis is appropriate and can give explanation. The higher the difference between the classes, the higher is the intraclass correlation coefficient. → what are the threshold values indicating an RIM oder random coefficent model?
- Often between 0.05 and 0.25 in social science research, where the groups represent some kind of social grouping.
### Exercise
- micro units: 3454 pupils
- macro units: 211 schools
- Y is a language test *langPost*
1. Estimate empty model:
```{r}
mod1 <- lmerTest::lmer(langPOST #dependent variable
~ (1|schoolnr), #independent variable & random effect is in this case the school number
data = DAT, #data
REML = TRUE) # a robust estimator
summary(mod1) #get summary
```
```{r}
fixef(mod1) # only fixed effects
confint(mod1) # for confindence intervals
```
`.sig01` stands for $\tau^2$ → does not be centered around zero, like expected. `.sigma` stands for $\sigma^2$
REML stands for [restricted maximum likelihood method](https://en.wikipedia.org/wiki/Restricted_maximum_likelihood). LMER uses this because numerically fitting a mixed-effects model can be difficult and the maximum likelihood approach often fails.
The empty model shows, how the language test results differ between the schools. It is the estimated true group mean. Overall we have an estimate of 40.9224. The residual variance on the school level is 18.78 (variance), the one on the individual level is 63.20.
2. Derive the ICC from the output. Assess whether it is really necessary to use a multi-level model for analyzing this data set.
- $\tau^2$, is the variance of $U_{0j} = 18.78$
- $\sigma^2$ is the variance of the residual $R_{ij} = 63.2$
```{r}
18.78 / (18.78 + 63.2)
```
Because ICC is between 0.05 and 0.25 an RIM is appropriate. It indicates, that there is a variance, that can be explained by schools → a between-group variance.
3. What is the estimated mean of the total population of individual values $Y_{ij}$ and the corresponding standard deviation?
The estimated mean of the total population is 40.9224 (see the summmary of mod1, fixed effects and the intercept) with a standard error 0.3325.
The standard deviation is for this total population is
```{r }
#Multiply the mean SE (derived from the fixed effects, SE for the intercept) with the square root of the sample size.
0.3325 * (sqrt(3454))
```
4. What is the estimated mean of the population of class means $\beta_{0j}$ and the corresponding standard deviation?
In this model no difference expected, because is the same, variance should be explained on the individual level
### Adding Covariates
The model becomes more interesting, when also fixed effects of explanatory variables are included:
$$
Y_{ij} = γ_{00} + γ_10x_{ij} + U_{0j} + R_{ij}
$$
Note the difference between fixed effects of explanatory variables and fixed effects of group dummies.
There are two kinds of parameters: 1. fixed effects: regression coefficients γ (just like in OLS regression) 2. random effects: $U_{0j}$ determined by variance component $\tau_0^2$
### Exercise
1. Add the covariate for IQ into the model on micro-level, **linear model** without multi-level
```{r}
mod2 <- lm(langPOST ~ IQ_verb, data=DAT) #independent micro variable
summary(mod2)
```
2. Estimate also a **multi-level model (RIM)**, and compare both. Discuss!
```{r}
mod3 <- lmer(langPOST ~ IQ_verb + (1|schoolnr), data=DAT)
summary(mod3)
```
We can see, the model gives more explanation, if we used the linear mixed-effects model 3. The variance in model 3 for between group and within group is much more smaller than in model 1. The scaled residuals in model 3 are in an less expanded range, than the residuals in model 2, that indicates a better model fit.
```{r}
sigma(mod2)
sigma(mod3)
?sigma
```
In addition the overall variance is decreasing, too.
#### More Covariates
The model becomes more interesting, when also fixed effects of explanatory variables are included:
$$
Y_{ij} = γ_{00} \\
+ γ_{10}x_{1ij} + . . . + γ_{p0}x_{pij} \\
+ γ_{01z1j} + . . . + γ_{0qzqj}\\
+ U_{0j} + R_{ij}\\
$$
- $γ_00$ is zero
- $+ γ_10x1ij + . . . + γp0xpij$ within-group (micro units) differences, for individual variable 1 you have a covariate for the unit belonging to a group. For variable $p$ you have also a covariate for the unit belonging to a group
- $+ γ_01z1j + . . . + γ_0qzqj$ all the between groups (macro units) coefficients,for group variable 1 you have a covariate for the unit belonging to a group. For variable p you have also a covariate for the unit belonging to a group
### Between- / Within-Groups
**What difference you are interested in?**
Difference between within-group and between-group regressions:
- The within-group regression coefficient ($+ γ_10x1ij + . . . + γp_0xpij$) is the regression coefficient within each group, assumed to be the same across the groups. → $\sigma^2$
- The between-group regression coefficient ($+ γ_01z1j + . . . + γ_0qzqj$) is defined as the regression coefficient for the regression of the group means of $Y$ on the group means of $X$.→ $\tau^2$
- This distinction is essential to avoid ecological fallacies → if you ignore, what happen on the lowest level, you will make ecological fallacies
### Ecological fallacy
(also ecological inference fallacy or population fallacy) is a formal fallacy in the interpretation of statistical data that occurs when inferences about the nature of individuals are deduced from inferences about the group to which those individuals belong.
- Example: Inference and generalization mistake: All the French eat snails, what is obviously not true
4 types:
1. confusion between ecological (population) correlations and individual correlations
2. confusion between group average and total average
3. [Simpson's paradox](https://de.wikipedia.org/wiki/Simpson-Paradoxon): as Simpson-Paradoxon (auch simpsonsches Paradoxon oder Simpson'sches Paradoxon, benannt nach Edward Hugh Simpson) ist ein Paradoxon aus der Statistik. Dabei scheint es, dass die Bewertung verschiedener Gruppen unterschiedlich ausfällt, je nachdem ob man die Ergebnisse der Gruppen kombiniert oder nicht. Dieses Phänomen tritt auf, wenn Störvariablen in der statistischen Analyse nicht betrachtet werden. Durch die Nichtbeachtung der Gruppen kommt es zu einer Scheinkorrelation.
4. confusion between higher average and higher likelihood
{width="300"}
{width="300"}
In the model with separate effects for the original variable $x_{ij}$ and the group mean
$$
Yij = \tilde{y}_{00} + \tilde{y}_{10}\hat{x}_{ij} + \tilde{y}_{01}\hat{x.}_j + U_{0j} + R_{ij}
$$
- the within-group regression coefficient is $γ_{10}$
- between-group regression coefficient is $γ_{10} + γ_{01}$
Test difference between within-group and between-group coefficients: consider $\tilde{γ}_{10}= y_{10}, \tilde{γ}_{01} = y_{10} + y_{01}$
Both models are equivalent, and have the same fit:
$$
\tilde{γ}_{10} = γ_{10}, \tilde{γ}_{01} = γ_{10} + γ_{01}
$$
### Model fit with R squares
$R^2$ = 1- (unexplained variance / total variance)
```{r}
multilevelR2(mod3, print = c("RB1", "RB2", "SB", "MVP"))
```
- RB1 and RB2 returns the explained variance at level 1 and level 2, respectively, according to Raudenbush and Bryk (2002, pp. 74 and 79)
- Specifying SB returns the total variance explained according to Snijders and Bosker (2012, p. 112).
- Specifying MVP returns the total variance explained based on "multilevel variance partitioning" as proposed by LaHuis, Hartman, Hakoyama, and Clark (2014).
But note: Don't infer from $R^2$ if multi level should be used or not! Use the ICC for that
### Exercise
1. Add a group level effect for IQ using the R commands
```{r create a group mean of IQ for each level}
groupM <- aggregate(DAT$IQ_verb, by=list(DAT$schoolnr), FUN=mean)
colnames(groupM) <- c("schoolnr", "IQ_mean")
DAT <- merge(DAT, groupM, by="schoolnr", all.x = TRUE)
```
With this command we add a group mean for the verbal IQ per school. With this group mean we separate the effects from each other, So we have the intercept for the school & the verbal IQ mean in school for explaining the between group variance and for the within group variance the verbal IQ
2. Compare the RIM model with and without this group level variable, with a special focus on the level-2 variance. Explain!
```{r model with group variable}
mod4 <- lmer(langPOST ~ IQ_verb + IQ_mean + (1|schoolnr), data=DAT)
summary(mod3)
summary(mod4)
```
Model 1 (empty model with classes fixed), Model 2 (+ individual intelligence score), Model 3 (+group mean of intelligence score). → more explanation, group mean is significant!
```{r}
sigma(mod3)
sigma(mod4)
```
The variance in the random effects between model 3 and 4 is decreasing and more is explained. In mod4 the variance of the residuals and the residual standard deviation is decreasing. Also standard errors are slightly decreasing.
## Prediction on Random Effects
*only possible after model computation, because only then we have the estimates, we can predict from!*
The random effects $U_{0j}$ are not statistical parameters and therefore they are not estimated as part of the estimation routine. However, sometimes it might be necessary to predict them.
Note: It is often not a good idea, because their are predicted, not statistical facts!
There are two ways to get them:
1. The ML estimator of $U_{0j}$ is the group means of the residuals $Yij − (\tilde{γ}_{00} + x′{ij} \hat{γ})$ → we have estimated the parameters out of the available data. The ML estimator of $U_{0j}$of the group means of the residuals is possible, because $\tilde{γ}_{00}$ is equal to zero. (See the @sec-ml.ex)
2. By the empirical Bayes method; the resulting predictions are the posterior means. The posterior mean for group j is based on two kinds of information:
- sample information: the data in group j population information:
- the value U0j was drawn from a normal distribution with mean 0 and variance τ 2
Suppose we wish to predict the 'true group mean' $γ_00 + U_{0j}$ For simplicity consider empty model (without coefficient): $Y{ij} = β_0j + R{ij} = γ_00 + U0j + R{ij}$ → Estimating $β_0j$ and $U0j$ are equivalent problems given that $\hatγ_00$ is available
The empirical Bayes estimate (combined estimate) is a weighted average of the group mean and the overall mean:
$$
\hat{\beta}^{EB}_{0j} = λj \overline{Y}_{.j} + (1 − λj )\hat{γ_00}
$$
where the weight $λ_j$ is the 'reliability' of the mean of group $j$ (with $n_j$ is related group size)
$$
λj =\frac{τ^2_0}{τ 2 0 + σ^2/n_j}
=
\frac{n_jρ_I}{1 + (n_j − 1)ρ_I}
$$
The reliability coefficient indicates how well the true mean $γ_00 + U_{0j}$ is measured by the observed mean $\overline{Y}_{.j}$.
{width="300"}
Empirical Bayes 'estimates' (EB) are biased, but the bias gets smaller with increasing group sizes.
We see that the EB are "shrunk towards the mean". (EB are also called shrinkage estimators.)
{width="300"}
→ checking for standard errors is very important! Therefore, compute the comparative and diagnostic standard errors.
**Comparative** (use: comparison of random effects of different level 2 units): \* deviation from unobserved random variable $U_{0j}$
$$
S.E_{comp}(\hat{U}^{EB}_{0j}) = S.E. (\hat{U}^{EB}_{0j} - U_{0j})
$$
how well can the unobserved level-2 contribution of $U_{0j}$ can be 'estimated' from the data
**Diagnostic** (use: model checking (checking normality of the level-1 residuals))
- Express the deviation from 0, which is the overall mean of the variables $U_{0j}$
$$
S.E_{comp}(\hat{U}^{EB}_{0j}) = S.E. (\hat{U}^{EB}_{0j})
$$
- an be seen as standardized residuals having a std. normal distribution if model is correct
### Exercise
Estimates of random effects with `ranef`. This function gibes the conditional modes of the random effects of a fitted model object. → it is the difference between the population-level average predicted response for a given set of fixed-effects values and the response predicted for a particular individual.
- You can think of these as individual-level effects, i.e. how much does any individual differ from the population?
- They are also, roughly, equivalent to the modes of the Bayesian posterior densities for the deviations of the individual group effects from the population means
→ important to keep in mind: not parameters, so no standard errors are defined.
1. Predict the school-level random effects (EB, for all N = 211 schools), using the R command `ranef(model)` ('model' is name of estimated model in R)
```{r}
be_u <- ranef(mod4)$schoolnr[[1]] # schoolnr[[1]] is the bayes estimate for the level-1 unit for each school, computed in model 4
plot(1:length(be_u), sort(be_u), pch=20, xlab="schoolNr", ylab=expression('U'['0j'])) #plot all the bayes estimates for the schools sorted from the largest to the highest
mlm <- lm(sort(be_u) ~ c(1:length(be_u))) ## plot a line: estimate linear model first
abline(a=coef(summary(mlm))[1,1], b=coef(summary(mlm))[2,1]) ## then use intercept and slope of that model to plot the line
```
On the vertical axis is the $U_{0j}$ displayed, group-dependent deviation from the average for each school on the x-axis. If a linear regression fit the estimates perfectly, we had perfect normal distribution, because the group dependent variance is around zero.Looking at schools around the median (nearly 50% of the ranked schools), they are centered around $U_{0j}$ → good sign of a random effect.
2. Plot the confidence intervals to show the differences within the school:
```{r}
plot_model(mod4, type = "re", col="blue",
title="Random Effects on School Level")
```
This plot shows the Bayes estimates for each of the 211 schools in the data set and their confidence intervals. We can see, there are varying means for each school and varying confidence intervals. This indicates variance on the school level and supports multi-level analysis.
Important: Not a robustness check, more a diagnosis, if the method is appropriate! In the attachment, not in the paper!
## Random Slope Model
It is possible that not only the group average of Y, but also the effect of X on Y is randomly dependent on the group. That is, in
$$
Y_{ij} = β_{0j} + β_{1j}x_{ij} + R_{ij}
$$
it is
$$
β_{0j} = γ_{00} + U_{0j}
β_{1j} = γ_{10} + U_{1j}
$$
Example: The effect of gender has a different effects on the grade in different classes.
This gives (X has a random slope):
$$
Y_{ij} = γ_{00} + γ_{10}x_{ij} + U_{0j} + U_{1j}x{ij} + R{ij}
$$
- $y_00$ intercept for overall population $γ_{10}x_{ij}$ fixed effects part = fixed effect on gender!
- $U_{0j}$ random effect of group dependent deviation in the class from the intercept
- $U_{1j}x{ij}$ effect from random effect of class on the fixed effect gender!
Because we would get too much estimators, we need
**assumptions**:
- group-dependent coefficients ($U_{0j}$ ,$U_{1j}$) are independent across $j$ with a bivariate normal distribution with expected values (0, 0) and covariance matrix defined byVariance of level-1 units (schools)
$$
var(U_{0j}) = τ_{00} = τ^2_0
$$ Variance of the effect of level-1 units (random effect) on the individual x effect (IQ)
$$
(U_{1j}) = τ_{11} = τ_{21}
$$
Co-Variance of school number and IQ
$$
cov(U_{0j},U_{1j}) = τ_{01}
$$
This yields: A linear model for the mean structure, and a parameterized covariance matrix within groups with independence between groups.
#### Exercise {#sec-slope.ex}
1. Add a random slope for IQ to your last model. For this: Have a look in the help function of lme4 package who this works.
```{r}
mod5 <- lmer(langPOST ~ IQ_mean + IQ_verb + (IQ_verb|schoolnr), #instead of an intercept (1), we insert IQ_verb as a random slope
data=DAT)
summary(mod5)
```
Model 4 and model 5 have really similar estimates with very similar standard errors and p-values, BUT:
- Random effect variance of groups is in model 4 slightly smaller (9.119, Std. Error 3.02) than in model 5 (9.3268, Std. Error 3.054)
- Standard error in model 4 for IQ_verb(0.058) is smaller than in model 5 (0.06) and so the standard error for the intercept is
- Model 5 does not decrease the variance is not an improvement in model fit. Though, it is more complex.
2. Write the regression equation for the estimated model.
$$
Y_{ij} = γ_{00} + γ_{10}x_{ij} + U_{0j} + U_{1j}x{ij} + R{ij}\\
langPost_{ij}= γ_{00} + γ_{10}*IQverb_{ij} + y_{01}*IQmean_{j} + U_{0j} + U_{1j}*IQmean_{ij} + R_{ij}\\
Y_{ij} = 41.03 + 2.4864 *IQverb_{ij} +1.030*IQmean_{j} +U_{0j} + U_{1j}*IQverb_{ij} + R_{ij}
$$
3.What is the average of the slope for IQ, and what is the related standard deviation?
In model 5 the average of the slope for IQ is 2.46, it is the estimate for IQ_verb. The standard deviation is:
```{r }
#Multiply the SE (derived from the fixed effects, SE for the intercept) with the square root of the sample size.
0.06687 * (sqrt(3454))
```
4. Plot the regression lines for all N=211 schools. What do you find concerning the variance of Y (language test score) related to X (IQ)? (One option for plotting: use the 'predict' function for each school separately and plot the predicted Y s along X.)
```{r}
schNr <- unique(DAT$schoolnr)
plot(0, xlim= range(DAT$IQ_verb), ylim=range(DAT$langPOST), col="white", xlab="IQ", ylab="langScore")
for(i in 1: length(schNr)) {
pr <- predict(mod5, newdata=DAT[DAT$schoolnr %in% schNr[i],], type="response")
iq <- DAT[DAT$schoolnr %in% schNr[i],"IQ_verb"]
res <- cbind(iq,pr)
lines(res[,1], res[,2])
}
abline(v=0)
```
In the plot you can see a regression line for the connection between the verbal IQ and the score in the language test. While for lower verbal IQs there is more variance in the language test score, it is less variance for pupils with higher IQ. This could be an indicator, that pupils with an higher IQ have a higher probability to get higher language test scores, while the scores for pupils with lower IQs are more dependent on whhich school you are.
This is a sign for heteroscedacity.
## Cross-Level Interaction
What can we find beyond the random slope?
We add level-2 covariates:
Intercept in the specific group. Is described by the grand mean $γ_{00}$, the random term $U_{0j}$ and by something on the other level $γ_{01}z_{j}$
$$
β_{0j} = γ_{00} + γ_{01}z_{j} + U_{0j}
$$ Coefficient on the slope, the coefficient on the other level and the random term on the other level.
Is described by the grand mean$γ_{10}$, the random term $U_{1j}$ and by something on the other level $γ_{11}z_{j}$.
$$
β_{1j} = γ_{10} + γ_{11}z_{j} + U_{1j}
$$
Substitution yields
$Y{ij} = (y_{00} + y_{01}1z_j + U_{0j})$ (intercept part) $+ (y_10 + y_11z_j + U_{1j})x_{ij}$ (slope part) + $R_{ij} (residual part)$
$= y_00$ (grand mean) $+ y_{01}1z_j$ (fixed effect of the teachers gender on the score)+ $y_{10}x_{ij}$ (fixed effect of the gender slope part) + $y_{11}z_{j}x_{ij}$ (cross-level-interaction effect) fixed effect of the teachers gender + the gender on the student of the score) + $U_{0j}$ (random part) + $U_{1j}x_{ij}$(random slope part of the gender) + $R_{ij}$ residiual part)
Note: \* Random effects are still in the model, it don't have to be included, but can! \* Two level equation is brought to one long equatation.
### Exercise {#sec-cross.ex}
1. Add a cross-level interaction effect for IQ and IQ to your last model. (Hint: Mind that this is still an interaction effect, commonly quantified by the product of two variables.)
```{r}
mod6 <- lmer(langPOST ~ IQ_mean #group level effect
+ IQ_verb #individual effect and main predictor
+ IQ_mean*IQ_verb #interaction effect
+ (IQ_verb|schoolnr), #group effect on the individual effect
data=DAT)
summary(mod6)
```
2. What does this effect tell you in the considered case?
Including the cross-level interaction effect decreases the variance in the random effects in comparison to model 4 and 5. Especially the variance in IQ_verb is very small now. Its a slightly better model than model 4.
What does the model say us? The higher the IQ mean in a class, the IQ counts less for the score and vice versa.
### Exercise Model comparision
1. Global intercept model / empty model
```{r}
m1 <- lm(langPOST ~ 1, data=DAT)
summary(m1)
```
The intercept in the global intercept model is 41.32. This is the mean score for the sample.
See:
```{r}
DAT %>% summarize(avg=mean(langPOST), sd=sd(langPOST), se= sd/sqrt(3454))
```
2. Two Explanatory variables, IQ and ses on level 1
```{r}
m2 <- lm(langPOST ~ IQ_verb + ses, data =DAT)
summary(m2)
```
Both explanatory variables and the intercept are significant. Intercept is near to the overall intercept.
- indicates correlation between verbal IQ and language test score: With each verbal IQ point more, the score in the language test will increasee by 2.36.
- indicates correlation between socio economic background and language test score: Whith each point on the ses scale more, the score will increase by 0.15
3. two explanatory variables IQ and SES on level-2 (with $\overline{.}$ denotes the group mean)
```{r}
#add group mean for ses to the data set
groupM <- aggregate(DAT$ses, by=list(DAT$schoolnr), FUN=mean)
colnames(groupM) <- c("schoolnr", "SES_mean")
DAT <- merge(DAT, groupM, by="schoolnr", all.x = TRUE)
m3 <- lm(langPOST~ IQ_mean + SES_mean, data=DAT)
summary(m3)
```
On level 2 we have significant effects, too. The verbal IQ of a school and the ses mean of school have also significant effects on the language test score.
→ this indicates, that the higher the verbal IQ or the ses mean on a school is, the higher are the language test scores and vice versa.
4. An interaction effect for IQ and SES on level-1
```{r}
m4 <- lm(langPOST ~ IQ_verb + ses + IQ_verb*ses, data = DAT)
stargazer(m2,m4, type="text", out = "ml.html")
```
Overall, with a look on $R^2$ and the Adjusted $R^2$ model 4 has a slightly better model fit to explain the variance on level one. All explanatory variables are still significant.
Its very interesting: The positive effect of IQ_verb is slightly decreasing, while the positive effect of ses is slightly increasing, when the interaction term is included. This means, that an increase in ses will decrease the significance and the effect of IQ or vice versa.
→ It could be a hint, that with with a high level socio economic background a low IQ has less effect or with a low level ses a high verbal IQ has less effect on the test score.
5. An interaction effect for IQ and SES on level-2
```{r}
m5 <- lm(langPOST~ IQ_mean + SES_mean + IQ_mean*SES_mean, data=DAT)
stargazer(m3,m5, type="text", out = "ml.html")
```
Both models has a small and $R^2$ in consequence on its own few explanatory power. Though, the model with the interaction term has higher $R^2$, what is an indicator for better model fit.
Adding the interaction term decreases the effect of the mean IQ and the mean SES in a school. Like on level 1 there is a negative and significant effect of the interaction term.
→ This one is a bit tough to interpret: If the IQ in one school is on average higher than in another, a higher ses on average has a lower effect on the language test school than in the other school. If the ses in a school on average is very high, the mean IQ on this school will have a lower effect than on a school with lower ses on average.
6. a cross-level interaction effect for $IQ$ and $\overline{IQ}$
See @sec-cross.ex Cross-level interaction:
```{r}
m6 <- lmer(langPOST ~ IQ_mean #group level effect
+ IQ_verb #individual effectof IQ
+ IQ_mean*IQ_verb #interaction effect
+ (IQ_verb|schoolnr), #group effect on the individual effect
data=DAT)
summary(m6)
```
We can see that the mean IQ in schools and IQ of an individual have both positive effects on the language score test.
The following hypotheses can be deduced from the results:
- The higher the verbal IQ, the higher the probability of a better language score results
- The higher the IQ is in a school on average, the higher is the probability for pupils to get better language score results. This could be explained by the mechanism, that in school the IQ is high on average, the individual IQ for a majority of pupils is also very high.
- The cross interaction effect is negative: If the IQ is in its mean higher in one school, there is lower effect for the individual IQ or vice versa.
- That makes sense, because grades and test results are often relational. If there are pupils with very high IQ in one class, a high IQ is not that outstanding and to get good scores is more difficult.
7. A cross-level interaction effect for $ses$ and $\overline{SES}$
```{r}
m7 <- lmer(langPOST ~ SES_mean #group level effect
+ ses #individual effect of ses
+ SES_mean*ses #interaction effect
+ (ses|schoolnr), #group effect on the individual effect
data=DAT)
summary(m7)
```
We can see that the mean SES in schools and the ses of an individual have both positive effects on the language score test, but only the individual ses is significant and moderate high. The cross interaction effect is negative and very small.
- The higher the ses of a pupil, the higher the probability to get good test scores.
- If the school has on average a high the ses, the effect of the individual on test scores decreases. This makes sense, because like intelligence the socio-economic background is perceived relational in a school.
8. a cross-level interaction effect for $IQ$ and $\overline{SES}$
```{r}
m8 <- lmer(langPOST ~ SES_mean #group level effect
+ IQ_verb #individual effect of IQ
+ SES_mean*IQ_verb #interaction effect of the SES in a school on the individual IQ
+ (IQ_verb|schoolnr), #group effect on the individual effect
data=DAT)
summary(m8)
```
Not controlled for individual ses, we have a positive effect of the ses on school level and a positive effect of the individual IQ, without controlled for school level IQ. Both effects and the cross level interaction effect is significant. The cross level interaction effect is negative.
- The higher the ses of pupils on one school is on average, the higher the probability to get high test results (not controlled for individual values of ses)
- The higher the the IQ, the higher the probability to get high test results (not controlled for group level values of IQ)
- If on a school the average ses is high, the effect of individual IQ on the probablity to get high test results are decreasing and vice versa
9. A cross-level interaction effect for $ses$ and $\overline{IQ}$
```{r}
m9 <- lmer(langPOST ~ IQ_mean #group level effect
+ ses #individual effect of ses
+ ses*IQ_mean #interaction effect of the IQ n a school on the individual ses
+ (ses|schoolnr), #group effect on the individual effect
data=DAT)
summary(m9)
```
There are positive, significant effect of the mean IQ in a school and the individual ses on the language test score and a negative, significant cross level interaction effect.
- The higher the IQ of pupils on one school is on average, the higher the probability to get high test results (not controlled for individual values of IQ)
- The higher the the individual ses, the higher the probability to get high test results (not controlled for group level values of ses)
- If on a school the average IQ is high, the effect of individual ses on the probability to get high test results are decreasing and vice versa
10. a random intercept on school level and a random slope for IQ
See @sec-slope.ex Random Slope Model:
```{r}
m10 <- lmer(langPOST ~ IQ_mean + IQ_verb + (1+IQ_verb|schoolnr), #instead of an intercept (1), we insert IQ_verb as a random slope
data=DAT)
summary(m10)
```
In this model not only the difference in the intercept is random, but also the slope. The slope is determined by the individual IQ, too.
This means, that not only the individual IQ and the IQ on school level has an effect on the test results, the effect of IQ has different effects on different classes.
If I had to choose for one of these models, I would go with the last one. For me, it is a bit neglectful to measure cross level interactions without taken individual effects for the explanatory variables and the group level effects into account. This is especially true for model 9. Model 6 give hints, that the individual effect of IQ is critical for better scores and is weakened by the the mean IQ. Consequently, to test, if the mean IQ is explanatory for better language test results without regarding the individual level appears a bit strange and contradictory. What assumptions should derive from that? That an overall higher IQ level in a school is a cause for better language tests for the individuals? It is a bit far away, to assume, that pupils with higher verbal IQs help the others out and therefore language test results are better for the pupils in the schools, than to assume, that the individual IQ level at a school and a school with a lot high level IQ pupils will have better test results than a school with less high level IQ pupils.
The last one has also the lowest variance in the residuals of random effects.
Another option is to compute random intercept models and random slope models, combining SES and IQ as explanatory variables. Like always, it depends on theory, which model makes sense and then test for statistical inference. For example behind model 4 lays the assumption, that if you have a higher ses, your IQ is less important for test results.
## Statistical Testing
For the fixed parts, the $\gamma$ use the t-test with test statistic (compares the mean between groups):
The estimates comes from the sample data and are in consequence not the true scores → estimates are dependent on data and not taken for true
$$
T(γh) =
\frac{\hat{γ}}
{S.E.(\hat{γ}h)}
$$
(The F-test (test for variance between samples or groups) or Wald test for testing several parameters simultaneously.) The standard error should be based on REML estimation. → robust estimator and really recommended.
Degrees of freedom for the t-test / the denominator of the F-test
- For a level-1 variable: $N − r − 1$, where N = total number of level-1 units, r = number of tested level-1 variables
- For a level-2 variable: $M − q − 1$, where M = number of level-2 units, q = number of tested level-2 variables
- For a cross-level interaction: M − q − 1, where q = number of other level-2 variables interacting with the level-1 variable
- Note: If d.f. ≥ 40, the t-distribution can be replaced by a standard normal
→ the more variables I add, the less power my model is going to have
### Exercise
Extended model like in the R script:
```{r}
mod7 <- lmer(langPOST ~ IQ_verb + ses +
IQ_mean + SES_mean +
IQ_verb*ses +
IQ_mean*SES_mean +
IQ_mean*IQ_verb +
SES_mean*ses +
IQ_mean*ses +
SES_mean*IQ_verb +
+ (IQ_verb|schoolnr), data=DAT)
summary(mod7)
```
1. & 2. Based on a t-test: Which explanatory variables should stay in the model?, Based on a t-test: Which interaction effects should stay in the model?
Significance indicated by p can be taken out of the models → using the p-values from the t-test.
The `lmertest` test the linear models and give the p-values for the explanatory variables. It computes for fixed effects the significance with t-tests. Looking at the results displayed in the summary the following variables and interaction effects should stay in the model:
effects on individual levels
- IQ_verb
- ses
effects
- IQ_mean
- SES_mean
cross-interaction effects:
- IQ_verb:ses
- IQ_mean:SES_mean
3. Reduce your model according to your findings.
```{r}
mod8 <- lmer(langPOST ~ IQ_verb + ses +
IQ_mean + SES_mean +
IQ_verb*ses +
IQ_mean*SES_mean +
(IQ_verb|schoolnr), data=DAT)
summary(mod8)
```
In the reduced model all variables are still significant. SES_mean has only a significance level of 0.055, so we could think about kicking it out.
→ The individual IQ has a positive, significant effect on the score. If the IQ is one point higher, the language score result will probably increase by 2.17. Also the ses has a positive, significant effect on the score.If the ses increase by one point, it increase 0.17.
→ If the average IQ on a school is higher, than the language score test are probably higher, too. And if the average SES is higher on a school, the score will increase, too.
→ There is an interaction effect between IQ and ses on individual and school level: The positive, significant effect of IQ is damped by the ses or vice versa. A low level in ses dampens the positive effect of an high IQ on a language test score. OR: A higher level in ses compensate low levels in IQ.
→ There is also an interaction effect on the school level. This could be, because of the strong correlation between the IQ_mean and SES_mean.
Model 8 has compared to all other models the smallest variance on school level and in the residuals and consequently, will have the a higher ICC than the other models.
## Testing random effects with the lieklihood-ratio test (LRT)
Difference between linear regression and HLM is in the $U_{ij}$, the $\tau^2$
- Assesses goodness of fit of two nested statistical models (model 1 and model 2) based on the ratio of their likelihoods
- Models are considered as [nested](https://joshuawiley.com/MonashHonoursStatistics/LMM_Comparison.html), when one model is a restricted or constrained version of another model
- Data is fixed, we can compute different models. The likelihood Λ gives the probability that the parameters fits the distribution of the sample, comparing different models
- Model 2 has parameter space $θ$ ∈ $Θ$, with $θ$ is unknown / to be estimated
- H0: $Θ$ (the complexer model 2) is in subset $Θ_0$ of $Θ$\
- Simpler model 1 has parameter space $Θ_0$, more complex model 2 parameter space $Θ$
- Null Hypothesis: Model 2 are a subset of the linear regression, therefore model 2 and $U_{ij}$ is not necessary → the more complexer model is equal to the simpler model
- More complex model 2 can be transformed into simpler model 1 by imposing constraints on model's 2 parameters: model 1 is nested in model 2 → model 1 is nested into model 2, while model 2 is tested, because in model 1 are parameters set to 0 and consequently its simpler
- Usually: model 2 found by maximization over the entire parameter space and model 1 found after imposing some constraint
→ key question: what is tested (model 2) and what is nested (model 1)?
Note: Model 1 doesn't have to be a linear regression, could also be a simpler HLM compared to a more complex model 2
Note: If two models are nested, then we have the most options in terms of comparing the two models. For example, we can evaluate whether Model A is a statistically significantly better fit than is Model B using a [**Likelihood Ratio Test (LRT)**](https://joshuawiley.com/MonashHonoursStatistics/LMM_Comparison.html)
- LR test tests whether related likelihood ratio is significantly different from one
- LR test statistic is the deviance difference:
$$
λ_{LR} = −2 ln
\frac{supθ∈Θ0Λ(θ)}
{supθ∈ΘΛ(θ)}
$$
$$
λ_{LR} = −2[l(θ_0) − l\hat{θ})]
$$
where $l(\hatθ)$ is the logarithm of the maximized likelihood function $Λ$ (under model 2), and $l(θ_0)$ is the maximal value if H0 is true (i.e. under model 1).
Higher likelihood means better fit of the data.
{width="300"}
The deviance difference is the difference between the highest points between the two curves. In this case it is 2. If the difference is zero, we don't need the more complex model and can reject H0.
H0 is true means that model 2 (tested model) is not sign. better in fitting the data than model 1 (nested model).
Multiplication with −2 ensures $λ_{LR}$ assym. converges to $χ^2$ distribution under H0
Small p-value (related to $χ^2$ distr.) indicates that H0 is wrong, i.e. evidence for model 2
#### Deviances and $\lambda_{LR}$
We have (with $\hat{θ}$ denotes the fitted parameters for the saturated model(include more or too much parameter):
- Deviance model 1: $D1 = −2(l(\hatθ_s ) − l(\hatθ_1))$ with $\hatθ_1$ fitted parameters of model 1, $|θ_1 = m1$
- Deviance model 2: $D2 = −2(l(\hatθ_s ) − l(\hatθ_2))$ with $\hatθ_2$ fitted parameters of model 2, $|θ_2| = m2, m1 < m2 $
Then: The test statistic for testing model 2 against model 1 is:
$$
\hat{\lambda}_{LR} = −2(l(\hatθ_2) − l(\hatθ_1)) = D1 − D2
$$
When the deviance is zero or near to zero, we don't need a multilevel model.
If the null hypothesis of no difference is true in the population, then $λ$ will follow a chi-squared distribution with degrees of freedom equal to the number of parameters constrained to 0 in Model 1 from Model 2, the difference in degrees of freedom used for each model, that is:
$$
\lambda \sim \chi^2(df1-df2)
$$
Thus we often use a chi-square distribution in the LRT to look up the p-value. Finally, note that because LRTs are based on the log likelihoods (LL) from a model, we need true log likelihoods for the LRT to be valid. Therefore, we cannot use restricted maximum likelihood, we need to use maximum likelihood estimation.