-
Notifications
You must be signed in to change notification settings - Fork 6
/
hierarchical_modelling.qmd
226 lines (160 loc) · 5.7 KB
/
hierarchical_modelling.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
---
title: "Hierarchical Bayesian modelling"
subtitle: ""
author:
- name: "Elizaveta Semenova"
email: "elizaveta.p.semenova@gmail.com"
institute: "Computer Science Department, University of Oxford"
date: 2024-08-21
date-format: medium
title-slide-attributes:
data-background-color: "#f3f4f4"
data-background-image: "../../assets/bmeh_normal.png"
data-background-size: 80%
data-background-position: 60% 120%
format:
revealjs:
slide-number: true
incremental: true
chalkboard:
buttons: false
preview-links: auto
logo: "../../assets/bmeh_normal.png"
theme: [default, ../../assets/style.scss]
---
# Outline
- Assumptions of linear models
- Why are linear models not enough?
- Two extremes of hierarchy: <ins>complete pooling</ins> and <ins>no pooling</ins>
- The golden middle: <ins>partial pooling</ins>
- Random effects
- Types of nestedness
## Assumptions of linear models
- Homoskedasticity
<center>![](assets/dogs1.jpeg){width=40%}
## Assumptions of linear models
- Homoskedasticity
- equal or similar variances in different groups
- sometimes it is appropriate
- ![](assets/dogs1.jpeg){width=30%}
## Assumptions of linear models
- Homoskedasticity?
<center>
![](assets/dogs1.jpeg){width=40%}
![](assets/dogs4.jpeg){width=40%}
## Assumptions of linear models
- Homoskedasticity: equal or similar variances in different groups
- No error in predictors *x*
- No missing data
- Normally distributed errors
- Observations *$y_i$* are independent
## We want
- To model variance
- Capture errors in variables
- Missing data models
- Use generalised linear models (GLMs)
- Use spatial and/or temporal error structure
# Hierarchical models
Exist on the continuum of two extreme cases:
- <ins>complete pooling</ins>
- <ins>no pooling</ins>
## Complete pooling
<center>
![](assets/pooled_model.png){width=60%}
- A *pooled* model implies that the data are sampled from the same model.
- This ignores all variation among the units being sampled.
- All observations share common parameter $\theta.$
<font size="1"> Image: courtesy of Chris Fonnesbeck</font>
## Excercise
- Complete pooling?
<center>
![](assets/dogs1.jpeg){width=30%}
![](assets/dogs4.jpeg){width=30%}
## Excercise
- Complete pooling:
<center>
![](assets/dogs1.jpeg){width=30%}
![](assets/dogs4.jpeg){width=30%}
- It ignores any differences between measurement blocks and does not acknowledge variability block to block.
- Observations within each unit are more likely to be like each other.
## No pooling
<center>
![](assets/unpooled_model.png){width=60%}
- An *unpooled* model implies that the data are sampled from independent parameters.
- Parameter $\theta_i$ is different for each observation.
<font size="1"> Image: courtesy of Chris Fonnesbeck</font>
## Excercise
- No pooling?
<center>
![](assets/dogs1.jpeg){width=30%}
![](assets/dogs4.jpeg){width=30%}
- Captures variability, but not the overall pattern.
- Assumes that there is no relationship between the models.
- Does not provide ways to extrapolate.
## Partial pooling
<center>
![](assets/partial_pooled_model.png){width=60%}
- In a *partially pooled* or *multilevel* or *hierarchical* model, parameters are viewed as a sample from a distribution of parameters.
## Hierarchical models
- Hierarchical models fill in the continuum between the two extremes.
- They allow us to estimate models for each measurement block where each dataset is being fit to its own model. - There is a higher level model, a *hierarchical model*, describing variability in parameters of those sub-models.
## Excercise
- Partial pooling?
<center>
![](assets/dogs1.jpeg){width=30%}
![](assets/dogs4.jpeg){width=30%}
</center>
- Variability can be captured at different scales.
- We can make predictions.
## Example
Normal model: common mean
$$y_i \sim N(\mu, \sigma^2), \quad i=1, ..., N$$
## Example
Normal model: common variance
$$y_i \sim N(\mu_i, \sigma^2) $$
$$\mu_i \sim N(\mu, \tau^2) $$
$$\sigma^2 \sim IG(\alpha, \beta) $$
Instead of assuming that $\mu$ and $\tau$ are *fixed known values*, we assume they are *uknown parameters* which need to estimate.
## Example
Normal model: common variance
$$y_i \sim N(\mu_i, \sigma^2) $$
$$\mu_i \sim N(\mu, \tau^2) $$
$$\sigma^2 \sim IG(\alpha, \beta) $$
Hyperpriors
$$ \mu \sim N(\mu_0, v_0), \\
\tau^2 \sim IG(t_1, t_2)$$
## Levels of hierarchy
- Data model
- Process model
- Parameter model
- Hyperparameters
## Key points about hierarchical models
- They allow to write down models that explain *variability in the parameters* of a model.
- *Partition variability* more explicitly into multiple terms.
- *Borrow strength* across data sets.
- 'Hierarchy' is with respect to parameters.
## Excercise
- How would you model these observations?
<center>
![](assets/dogs1.jpeg){width=30%}
![](assets/dogs4.jpeg){width=30%}
![](assets/cats1.jpeg){width=28%}
</center>
## Model comparison
- Frequentist way:
- optimise different models on the <ins>training set</ins>,
- and compare them on the <ins>test set</ins>.
- Bayesian way:
- use <ins>model evidence</ins> $$p(Data | Model_i)$$.
- It tends to choose models which are just complex enough to model the data, but not overly complex (by which it avoids <ins>overfitting</ins>).
## Information criteria
*Information criteria* are popular tools for model selection that provide a quantitative measure of the trade-off between model fit and complexity. The *lower* the value of IC, the better.
## Examples of Information Criteria:
- AIC (Akaike Information Criterion):
$$\text{AIC} = -2* \ln(L) + 2k,\\
L - \text{maximum value of likelihood},\\
k - \text{number of parameters}$$
- BIC (Bayesian Information Criterion):
$$\text{BIC} = -2 \ln(L) + k \ln(n)$$
- Widely Applicable Information Criterion (WAIC)
# Questions?