Corrections (#13)

* basic setup for corrections * avoid the phrase "true decline" * figure 5.1 caption states 2019 * 29 30 31: life table derivation * 32 kannisto thatcher * 21 22 33: Arriaga * 26 cancer atlases * 1: tube map lives on the line * 2 hierarchical models for areal data * 3: when not to smooth * 3: life table issues * 5 28 migration and epi transition * 12: deaths of despair * 18 19 20 25: cause of death coding * 24: lung cancer -> inequality. 25: justification of cancer chapter * 13 postcodes * 14: beta-binomial moments * 11 15 16: uncertainty * 6: model adequacy and consistency * 7: sensitivity analysis * 8: INLA * 9: alternatives for modelling age * 17: stronger spatial effects? * other optional corrections * proofread and add page numbers
theorashid · Jan 23, 2024 · 31f6e4a · 31f6e4a
1 parent d62d573
commit 31f6e4a
Show file tree

Hide file tree

Showing 26 changed files with 914 additions and 271 deletions.
diff --git a/thesis/Appendices/AppendixA.qmd b/thesis/Appendices/AppendixA.qmd
@@ -9,56 +9,83 @@ Instead, demographers use period (or "current") life tables, which consider what
 Life tables can be constructed using discrete age bands starting at age $x$ and ending at age $x+n$.
 We supply the age-specific death rates, ${}_{n}m_{x}$, and the average person-years lived by those dying in the interval, ${}_{n}a_{x}$, and the life table calculates the mean age at death – the life expectancy, $e_x$.
 
-We start with a hypothetical cohort of size $l_0 = 100,000$ and sequentially apply the probability of dying in each age group, calculated as
+The probability of dying, ${}_{n}q_{x}$, is defined as the ratio of the number of people who died in the age interval, ${}_{n}d_{x}$, to the number who survived to age $x$, $l_x$:
+$$
+{}_{n}q_{x} = \frac{{}_{n}d_{x}}{l_x}.
+$$ {#eq-app-a-prob-dying-deaths}
+
+The age-specific death rate is defined as the ratio of the number of people who died in the age interval to the total number of person-years lived, ${}_{n}L_{x}$, which is the weighted sum of the number of person-years lived ($n$) by those who survived, which, in turn, is the difference between those who survived to age $x$ and those who died in the interval ($l_x - {}_{n}d_{x}$), and the number of person-years lived on average (${}_{n}a_{x}$) by those who died (${}_{n}d_{x}$):
+$$
+{}_{n}m_{x} = \frac{{}_{n}d_{x}}{n \cdot (l_x - {}_{n}d_{x}) + {}_{n}a_{x} \cdot {}_{n}d_{x}}.
+$$ {#eq-app-a-death-rate}
+We assume the denominator of @eq-app-a-death-rate can be approximated by the mid-year population, ${}_{n}P_{x}$, which leads us to recover the expression for the cross-sectional, empirical death rate in @eq-death-rate.
+By rearranging the denominator to make the number of survivors the subject, we obtain
+$$
+l_x = \frac{1}{n} \left({}_{n}P_{x} + (n - {}_{n}a_{x} \cdot {}_{n}d_{x})\right).
+$$ {#eq-app-a-survivors}
+We can substitute this expression into @eq-app-a-prob-dying-deaths and divide by ${}_{n}P_{x}$ to obtain  
 $$
 {}_{n}q_{x} = \frac{n \cdot {}_{n}m_{x}}{1 + (n - {}_{n}a_{x}) {}_{n}m_{x}}.
 $$ {#eq-app-a-prob-dying}
-The open interval ${}_{\infty}q_{x} = 1$, as nobody is immortal.
-Using the probability of surviving in each age group, ${}_{n}p_{x} = 1 - {}_{n}q_{x}$, the number of survivors is given by
+This expression, although unintuitive, allows us to convert from ${}_{n}m_{x}$ to ${}_{n}q_{x}$ with only the parameter ${}_{n}a_{x}$.
+
+In the period life table, we start with a hypothetical cohort of size $l_0 = 100,000$ and sequentially apply the probability of surviving in each age group, ${}_{n}p_{x} = 1 - {}_{n}q_{x}$, to calculate the number of survivors as
 $$
 l_{x+n} = l_x \cdot {}_{n}p_{x}.
 $$ {#eq-app-a-life-table-1}
 
-The number of person-years lived is the sum of the number of survivors weighted by the band width and number of people who died weighted by ${}_{n}a_{x}$
+The number of person-years lived is the sum of the number of survivors weighted by the band width and number of people who died (${}_{n}d_{x} = l_{x} \cdot {}_{n}q_{x}$) weighted by ${}_{n}a_{x}$
 $$
-{}_{n}L_{x} = n \cdot l_x + {}_{n}a_{x} \cdot l_{x} \cdot {}_{n}q_{x} \quad {}_{\infty}L_{x} = \frac{l_x}{{}_{\infty}m_{x}},
+{}_{n}L_{x} = n \cdot l_x + {}_{n}a_{x} \cdot l_{x} \cdot {}_{n}q_{x}.
 $$ {#eq-app-a-life-table-2}
 
-and the total number of person-years lived above $x$ is
+The open interval ${}_{\infty}q_{x} = 1$, as nobody is immortal.
+Using @eq-app-a-prob-dying-deaths, it follows that the number of deaths in this interval is equal to the number who survived to the final age group, i.e. ${{}_\infty}d_{x} = l_x$.
+Since the death rate from @eq-app-a-death-rate can be rewritten using the number of person-years lived, ${}_{n}L_{x}$, as the denominator and we can substitute the number of deaths with the number surviving to the final age group, we can obtain an expression for the number of person-years lived in the open-ended age interval
+$$
+{}_{\infty}L_{x} = \frac{{}_{\infty}d_{x}}{{}_{\infty}m_{x}} = \frac{l_x}{{}_{\infty}m_{x}}.
+$$ {#eq-app-a-life-table-close}
+
+The total number of person-years lived above $x$ is
 $$
 T_{x} = \sum^{\infty}_{x = a} {}_{n}L_{x}.
-$$ {#eq-app-a-life-table-2}
+$$ {#eq-app-a-life-table-3}
 
 Then, life expectancy is given by dividing the number of person-years lived by the number of people who will live them
 $$
 e_x = \frac{T_x}{l_x}.
-$$ {#eq-app-a-life-table-2}
+$$ {#eq-app-a-life-table-4}
 
 Throughout the thesis, I only consider life expectancy at birth.
 
 ### The very young ages and the very old ages
 
-On average, it is a good approximation to assume deaths occur halfway through the age interval: ${}_{n}a_{x} = n /2$.
+On average, it is a good approximation to assume deaths occur halfway through the age interval: ${}_{n}a_{x} = n / 2$.
 But for younger ages, particularly at lower levels of mortality, the majority of infant deaths lie further towards the earliest stages of infancy.
 Coale and Demeny used regression on a series of international datasets to recommend suitable values for ${}_{1}a_{0}$ and ${}_{4}a_{1}$ instead of the midpoint [@coaleRegionalModelLife1983].
 
-The start of the open age group can be many years away from some of the ages at death, particularly in ageing populations.
-In order to produce reliable estimates of death rates at high ages, I used the Kannisto-Thatcher method to expand the terminal age group ($\geq 85$ years) of the life table and adjust ${}_{n}a_{x}$ above 70 years [@thatcherSurvivorRatioMethod2002].
+The start of the open-ended age group can be many years away from some of the ages at death, particularly in ageing populations.
+In order to produce reliable estimates of death rates at older ages, I used the Kannisto-Thatcher method to expand the terminal age group ($\geq 85$ years) of the life table and adjust ${}_{n}a_{x}$ above 70 years [@thatcherSurvivorRatioMethod2002].
+The Kannisto-Thatcher method assumes the probability of dying is a logistic function of age.
+The logit-transformed probability of dying above 70 years is regressed upon age.
+The resulting curve is extrapolated through to 129 years before calculating the number of survivors in the cohort following the adjusted probability of dying to estimate ${}_{n}a_{x}$ above 70 years.
 
 ## Probability of dying
 
 The probability of dying from a specific cause of death, $i$, is calculated as in @eq-app-a-prob-dying.
-Equally, we can subtract the probability of surviving to that age group, $1 - \prod_x {}_{n}p^i_{x}$.
+Equally, we can calculate the probability of dying by subtracting the probability of surviving in each age group through to that age from unity, i.e. $1 - \prod_x {}_{n}p^i_{x}$.
 Note, even for the smallest death rates, ${}_{\infty}q^i_{x} = 1$ – if you live to infinity, you'll die of it eventually.
 
 ## Cause-specific decomposition of differences in life expectancy
 
-@arriagaMeasuringExplainingChange1984 proposed a method to calculate the age-specific contributions to the difference in life expectancy between two populations as
+Using quantities generated from the life tables of two populations as above, @arriagaMeasuringExplainingChange1984 proposed a method to calculate the age-specific contributions to the difference in life expectancy between these populations as
 $$
 {}_{n}\Delta_{x} = \frac{l^1_x}{l^1_0} \left( \frac{{}_{n}L^2_{x}}{l^2_x} - \frac{{}_{n}L^1_{x}}{l^1_x} \right) + \frac{T^2_{x+n}}{l^1_0} \left( \frac{l^1_x}{l^2_x} - \frac{l^1_{x+n}}{l^2_{x+n}} \right).
 $$ {#eq-app-a-arriaga-age}
+The first term on the right hand side corresponds to the "direct effect" on the life expectancy difference between the two populations in the average number of person-years lived by the survivors to that age group (${}_{n}L_{x} / l_x$).
+The second term represents the "indirect effect" on the number of survivors caused by the mortality changes within an age group.
 
-We then assume the age- and cause-specific contributions are proportional to the difference in cause-specific death rates:
+We then assume the age- and cause-specific contributions are proportional to the difference in cause-specific death rates between the two populations:
 $$
 {}_{n}\Delta^i_{x} = {}_{n}\Delta_{x} \cdot \frac{{}_{n}m^i_{x}(2) - {}_{n}m^i_{x}(1)}{{}_{n}m_{x}(2) - {}_{n}m_{x}(1)}
 $$ {#eq-app-a-arriaga-cause}

diff --git a/thesis/Chapters/Chapter2.qmd b/thesis/Chapters/Chapter2.qmd
@@ -22,9 +22,9 @@ To overcome these issues, we can use statistical smoothing techniques to obtain
 In small-area studies, it is common to smooth data using models with explicit spatial dependence, which are designed to give more weight to nearby areas than those further away.
 There are three main categories for modelling spatial effects.
 First, we can treat space as a continuous surface using Gaussian processes or splines.
-Second, we can use areal models, which make use of the spatial neighbourhood structure of the units.
-Third, we can build models that exploit a nested hierarchy of geographical units, for example between state, county and census tract in the US.
-Each of these methods rely on assumptions which may make them more or less appropriate in different applications.
+Second, we can use hierarchical models for areal data, which make use of the spatial neighbourhood structure of the units.
+Third, we can again use hierarchical models for areal data but instead we can exploit a nested hierarchy of geographical units, for example between state, county and census tract in the US.
+Each of these methods, which can be used separately or in combination if the context of the problem allows, rely on assumptions which may make them more or less appropriate in different applications.
 
 #### Space as a continuous process {-}
 
@@ -105,6 +105,7 @@ There might be true variability in the data which a smoothing model would concea
 For example, certain spatial units might contain isolated populations with high mortality over a sustained period, such as counties with Native American reservations in the USA [@dwyer-lindgrenInequalitiesLifeExpectancy2017].
 There can also be spatially- and temporally-specific events that cause a spike in mortality such as the Grenfell Tower fire in 2017.
 Without accounting for these events, the models described above would either attenuate their effect on mortality, or a spike in deaths would cause estimates of mortality in nearby spatial units or years to be erroneously high.
+Beyond the use of subject matter experts, posterior predictive checks and plots of modelled death rates against the observed data can help to identify outlier spikes in mortality which are specific to a particular time or place, and which we do not want our model to smooth.
 
 ### Applications of disease mapping methods
 
@@ -116,6 +117,8 @@ Directly standardised methods, in contrast, require knowledge of the full age st
 Age-standardised death rates, however, suffer the same interpretability issue as the standardised mortality ratio, and are only comparable between studies if the same reference population is used.
 An alternative choice is _life expectancy_.
 @silcocksLifeExpectancySummary2001 explain that life expectancy is a "more intuitive and immediate measure of the mortality experience of a population, [and] is likely to have greater impact... than other measures that are incomprehensible to most people."
+However, although the metric appears more interpretable, life expectancy at birth constructed from a period life table is often misinterpreted as the mean length of life of the cohort into which the newborn is born.
+In fact, it measures the expectation of life assuming that the newborn will be exposed to age-specific mortality conditions throughout their life that are exactly the same as the current population.
 
 The estimation of death rates requires two data sources: deaths counts and populations.
 Modern death registration systems, such as that of the UK, are almost entirely complete and accurate.
@@ -256,7 +259,7 @@ In 2015, the GBD study released its first subnational estimates of mortality, st
 @steelChangesHealthCountries2018 assessed these data, which divided the UK into 150 regions, finding mortality from all-causes varied twofold across the country, with the highest years of life lost in Blackpool and the lowest in Wokingham.
 In a study on forecasting subnational life expectancy in England and Wales, @bennettFutureLifeExpectancy2015 estimated a 8.2 year range in life expectancy for men and 7.1 year range for women in 2012 between 375 districts.
 The lowest life expectancies were seen in urban northern England, and the highest in the south and London's affluent districts.
-Within London itself, male and female life expectancy showed 5-6 years of variation.
+Within London itself, @cheshireFeaturedGraphicLives2012 visualised the heterogeneity of mortality in London by assigning tube stops the life expectancy of the nearest ward, revealing that 10 years are lost between two consecutive stops, Canary Wharf and North Greenwich, on the Jubilee line.
 
 #### Deprivation {-}
 

diff --git a/thesis/Chapters/Chapter4.qmd b/thesis/Chapters/Chapter4.qmd
@@ -80,16 +80,33 @@ For the MSOA analysis, MSOAs were nested in districts, which were, in turn, nest
 For the LSOA analysis, LSOAs were nested in MSOAs, which were nested in districts.
 The terms for the largest spatial unit were centred on zero to allow the spatial effects to be identifiable.
 
-All standard deviation parameters of the random effects had $\sigma \sim \mathcal{U}(0, 2)$ priors.
+All standard deviation parameters of the random effects had $\sigma \sim \mathcal{U}(0, 2)$ priors, which were used for a previous mortality modelling study by the group [@bennettFutureLifeExpectancy2015].
+I performed a sensitivity analysis using the less informative $\sigma \sim \mathcal{U}(0, 100)$ prior, to which the model was robust (the largest inferred standard deviation parameter was for the age group intercept with a mean around 0.9).
 For the global intercept and slope, we used the diffuse prior $\mathcal{N}(0, \sigma^2=10^5)$.
 The overdispersion parameter $r$ had the prior $\mathcal{U}(0, 50)$.
 
 @tbl-ap-ch4-model shows all model parameters, their priors and dimensions for the MSOA-level model in @sec-Chapter5.
+@tbl-ch-4-checks summarises the model adequacy and consistency checks performed for the analyses.
+
+| Type of check              | Checks performed |
+| -------------------------- | ------------------------------------------------ |
+| Model adequacy             | Check all posterior death rates are between 0 and 1; scatter plots of posterior predictions of death rates against observed data by age group and year; inspection of residuals by age group and year |
+| Model bias                 | Compare aggregated posterior predictions of deaths with uncertainty to national number of deaths from data each year; evaluate model shrinkage by inspecting the range of life expectancy between the top and bottom percentiles (aggregating 67 or 68 MSOAs) in 2002 and 2019 estimated using the model and the data |
+| Consistency between models | Aggregate posterior predictions of deaths from MSOA-level model to district-level and compare to posterior predictions of deaths from the same model run at district level (and same checks for LSOA- and national-level where appropriate) |
+
+: Summary of model posterior checks. {#tbl-ch-4-checks}
+
+Although a random walk approach has been used here to model the J-shape age-mortality association, there are a number of alternatives.
+For example, @gonzagaEstimatingAgeSexspecific2016 use a series of linear splines over the age dimension.
+@alexanderFlexibleBayesianModel2017 describe an approach using the first three principal components of standard mortality curves, where the first component represents baseline mortality, and the second and third components allow offsets for higher child mortality and higher adult mortality.
+However, both these approaches require the modeller to manually specify either the number of basis splines and position of the knots or the number of principal components required to accurately describe the age-mortality relationship.
+This becomes more difficult when modelling several different diseases, which might not follow a J-shape, particularly those with a skew towards older ages such as prostate cancer.
+Random walks are more flexible and require less tuning in this respect, and are also used here to model age-specific slopes over time, for which we have no such prior demographic knowledge.
 
 ## Inference
 
 The decision was made early in my PhD research to use Markov chain Monte Carlo (MCMC) sampling methods for inference, as this is the "gold standard" with guarantees that, under mild conditions, the sequence of samples will asymptotically converge to the true posterior distribution [@robertsGeneralStateSpace2004].
-Furthermore, the state-of-the-art approximate inference package for spatial models, `INLA`, scales badly with the number of hyperparameters, and hence would struggle with the high dimensionality of the models in this thesis.
+Although sampling approaches are the focus here, the `R-INLA` package, which uses approximate inference for latent Gaussian fields and has implementations of common spatial models, could also have been used.
 
 Bayesian models can be specified in a probabilistic programming language.
 The starting point for this project was the `NIMBLE` package [@devalpineNIMBLEMCMCParticle2022; @devalpineProgrammingModelsWriting2017].