forked from jrnold/bugs-examples-in-stan
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathengines.Rmd
93 lines (78 loc) · 2.71 KB
/
engines.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# Engines: right-censored failure times
```{r engines_setup,message=FALSE}
library("tidyverse")
library("rstan")
```
## Data
The data are 40 engines tested at various operating temperatures, with the the failure time if the engine failed, or the last time of the observational period if it had not [@Tanner1996a].[^engines-src]
Of the 40 engines, 23 did not fail in their observational periods.
```{r engines}
data("engines", package = "bayesjackman")
glimpse(engines)
```
## Model
Let $y^*$ be the failure time of engine $i$.
The failure times are modeled as a regression with normal errors,
$$
\begin{aligned}[t]
y^*_i &\sim \mathsf{Normal}(\mu_i, \sigma) , \\
\mu_i &= \alpha + \beta x_i .
\end{aligned}
$$
However, the failure times are not always observed.
In some cases, only the last observation time is known, meaning that all is known is $y^*_i > y_i$.
Let $L$ be the set of censored observation.
$$
\begin{aligned}[t]
y_i &\sim \mathsf{Normal}(\mu_i, \sigma) & i \notin L, \\
y^*_i &\sim \mathsf{Normal}(\mu_i, \sigma) U(y_i, \infty) & i \in L, \\
\mu_i &= \alpha + \beta x_i .
\end{aligned}
$$
$$
\begin{aligned}[t]
\log L(y_i, \dots, y_N | \alpha, \beta, \sigma) &= \sum_{i \notin L} \log \mathsf{Normal}(y_i; \mu_i, \Sigma) \\
&\quad + \sum_{i \in L} \log \int_{y_i}^{\infty} \mathsf{Normal}(y^*; \mu_i, \Sigma) d\,y^* ,
\end{aligned}
$$
where
$$
\mu_i = \alpha + \beta x .
$$
```{r mod_engines,results='hide',cache.extra=tools::md5sum("data/engines.stan")}
mod_engines <- stan_model("stan/engines.stan")
```
```{r results='asis'}
mod_engines
```
## Estimation
For the input data to the Stan model, the observations that are observed and censored have to be provided in separate vectors.
```{r }
X <- scale(engines$x)
engines_data <- within(list(), {
N <- nrow(engines)
# observed obs
y_obs <- engines$y[!engines$censored]
N_obs <- length(y_obs)
X_obs <- X[!engines$censored, , drop = FALSE]
K <- ncol(X_obs)
# censored obs
y_cens <- engines$y[engines$censored]
N_cens <- length(y_cens)
X_cens <- X[engines$censored, , drop = FALSE]
# priors
# use the mean and sd of y to roughly scale the weakly informative
# priors -- these don't account for need to exact
alpha_loc <- mean(engines$y)
alpha_scale <- 10 * sd(engines$y)
beta_loc <- array(0)
beta_scale <- array(2.5 * sd(engines$y))
sigma_scale <- 5 * sd(y_obs)
})
```
```{r}
sampling(mod_engines, data = engines_data,
chains = 1, init = list(list(alpha = mean(engines$y))))
```
[^engines-src]: This example is derived from Simon Jackman, "Engines: right-censored failure times - the I(,) construct contrasted with other approaches", 2007-07-24,
[URL](https://web-beta.archive.org/web/20070724034205/http://jackman.stanford.edu:80/mcmc/engines.odc)