-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
169 lines (124 loc) · 5.59 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
output:
md_document:
variant: markdown_github
---
<!-- badges: start -->
[![R-CMD-check](https://github.com/MansMeg/ada_pop/workflows/R-CMD-check/badge.svg)](https://github.com/MansMeg/ada_code/actions)
<!-- badges: end -->
<!-- README.md is generated from README.Rmd. Please edit that file -->
# The Ada model
This repository contains code to run the Ada Poll of Polls model using Stan.
To run the models and (re)produce the output:
1. You need to have Stan and Rstan installed. To install Rstan (and Stan), see [Rstan Installation information](https://mc-stan.org/users/interfaces/rstan.html).
2. Install the SwedishPolls R package. See the install instructions [here](https://github.com/MansMeg/SwedishPolls).
3. Install the ``ada`` R package (see below).
4. Run either [run_ada/run_ada.R](https://github.com/MansMeg/ada_code/blob/main/run_ada/run_ada.R) in R or [run_ada/run_ad_bash.sh](https://github.com/MansMeg/ada_code/blob/main/run_ada/run_ada_bash.sh).
5. Play around with the resulting model object in R.
The whole process of 1-4. can be seen in the github action workflow [here](https://github.com/MansMeg/ada_code/actions).
The model used
==============
We continuously develop the model and improve it. The actual model we
use is set in the
[run\_ada/ada_config.yml](https://github.com/MansMeg/ada_code/blob/main/run_ada/ada_config.yml)
(`model` argument). The same model will exist as a stan file in the R
package, that you can find in
[rpackage/inst/stan\_models/](https://github.com/MansMeg/ada_code/blob/main/rpackage/inst/stan_models/).
The hyperparameter settings we use are then either set in the config
file
([run\_ada/ada_config.yml](https://github.com/MansMeg/ada_code/blob/main/run_ada/ada_config.yml))
or as the default values. The default values are printed when running
the model in R.
Unfortunately, we do not have a better description of the model right
now. We know that it can be cumbersome to read, but if you have any
questions feel free to reach out on Twitter or leave an issue
[here](https://github.com/MansMeg/ada_code/issues) at Github.
# R package
All functionality and tests of implemented functionality are implemented in the R package `ada`.
## Installation
To install, just build the local package (if the repo is cloned):
```{r, eval=FALSE}
devtools::install_local("rpackage")
```
### Access polling data
We can access two types of data through the `ada` package. First, we can access real polling data from Sweden (Spain and Germany).
To access the polls data from the R package, use:
```{r}
library(ada)
data("swedish_polls_curated")
data("swedish_elections")
```
Based on the data, we create a polls_data object.
```{r}
pd <- polls_data(y = swedish_polls_curated[, 3:10],
house = swedish_polls_curated$Company,
publish_date = swedish_polls_curated$PublDate,
start_date = swedish_polls_curated$collectPeriodFrom,
end_date = swedish_polls_curated$collectPeriodTo,
n = swedish_polls_curated$n)
pd
```
To see all available datasets, use:
```{r, eval=FALSE}
data(package = "ada")
```
The real polling data comes both as an original dataset and a curated dataset that has been curated to be internally consistent. See `rpackage/data-raw` for details on how the curation has been conducted. Every file without suffix `_functions` will create the datasets stored in the R-package.
Documentation on each dataset can be found in the R package docs or in `rpackage/R/docs_data.R`.
To simplify modelling, we can also simulate polls data. Simulation can be done using ```simulate_polls()``` as follows.
```{r}
library(ada)
data(x_test)
set.seed(4711)
spd <- simulate_polls(x = x_test[[1]],
pd = pd_test,
npolls = 150,
time_scale = "week",
start_date = "2010-01-01")
```
We can also plot the simulated polls with:
```{r, eval=FALSE}
plot(spd, y = "x")
```
### Accessing true state (election) data
In the next step, we add information on the true underlying latent state. In the case of real data, this is election data (see above). Below we create a dataset with weeks 3, 4, and 72 known.
```{r}
true_idx <- c(3, 44, 72)
known_state <- tibble::tibble(date = as.Date("2010-01-01") + lubridate::weeks(true_idx), x = x_test[[1]][true_idx])
known_state
```
### Running a poll of polls model
We only need to use the `poll_of_polls()` function to fit the model.
```{r}
output <- capture.output(suppressWarnings(
pop <- poll_of_polls(y = "x",
model = "model8h3",
polls_data = spd,
time_scale = "week",
known_state = known_state,
warmup = 1000,
iter = 2000,
chains = 4)
))
```
We can also extract some basic information and the results.
```{r}
pop
```
We can find the stan object in ```pop$stan_fit```
```{r}
head(rstan::summary(pop$stan_fit)$summary)[,1:3]
```
We can also quickly visualize the latent series with:
```{r, eval=FALSE}
plot(pop, "x")
```
## Updating/adding new Stan models
All Stan Code can be found in `rpackage/inst/stan_models`. The purpose is that the stan files should be a part of the rpackage for testing.
Although local stan models can be tested directly by:
```
pop <- poll_of_polls(...,
model = "path/to/my/stan/model.stan",
...)
```
In this way, we can edit a model quickly without needing to rebuild the R package.
The filename needs to have the same name as the available models to identify how the package should parse data.