-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
213 lines (156 loc) · 7.73 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
---
output: github_document
---
```{r setup, include = FALSE}
suppressPackageStartupMessages(library(dplyr))
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# rfacts
[![cran](https://www.r-pkg.org/badges/version/rfacts)](https://cran.r-project.org/package=rfacts)
[![active](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![check](https://github.com/EliLillyCo/rfacts/workflows/check/badge.svg)](https://github.com/EliLillyCo/rfacts/actions?query=workflow%3Acheck)
[![lint](https://github.com/EliLillyCo/rfacts/workflows/lint/badge.svg)](https://github.com/EliLillyCo/rfacts/actions?query=workflow%3Alint)
The rfacts package is an R interface to the [Fixed and Adaptive Clinical Trial Simulator (FACTS)](https://www.berryconsultants.com/software/) on Unix-like systems. It programmatically invokes [FACTS](https://www.berryconsultants.com/software/) to run clinical trial simulations, and it aggregates simulation output data into tidy data frames. These capabilities provide end-to-end automation for large-scale simulation workflows, and they enhance computational reproducibility. For more information, please visit the [documentation website](https://elilillyco.github.io/rfacts/).
## Disclaimer
`rfacts` is not a product of nor supported by [Berry Consultants](https://www.berryconsultants.com/). The code base of `rfacts` is completely independent from that of [FACTS](https://www.berryconsultants.com/software/), and the former only invokes the latter though dynamic system calls.
## Limitations
* FACTS files prior to version 6.2.4 are unsupported.
* `rfacts` only works on Unix-like systems.
* `rfacts` requires paths to pre-compiled versions of Mono, FLFLL, and the FACTS Linux engines. See the installation instructions below and the [configuration guide](https://elilillyco.github.io/rfacts/articles/config.html).
## Installation
To install the latest release from CRAN, open R and run the following.
```{r, eval = FALSE}
install.packages("rfacts")
```
To install the latest development version:
```{r, eval = FALSE}
install.packages("remotes")
remotes::install_github("EliLillyCo/rfacts")
```
Next, set the `RFACTS_PATHS` environment variable appropriately. For instructions, please see the [configuration guide](https://elilillyco.github.io/rfacts/articles/config.html).
## Run FACTS simulations
First, create a `*.facts` XML file using the [FACTS](https://www.berryconsultants.com/software/) GUI. The `rfacts` package has several built-in examples, included with permission from Berry Consultants LLC.
```{r}
library(rfacts)
# get_facts_file_example() returns the path to
# an example a FACTS file from rfacts itself.
# For your own FACTS files you create yourself in the FACTS GUI,
# you can skip get_facts_file_example().
facts_file <- get_facts_file_example("contin.facts")
basename(facts_file)
```
Then, run trial simulations with `run_facts()`. By default, the results are written to a temporary directory. Set the `output_path` argument to customize the path.
```{r}
out <- run_facts(
facts_file,
n_sims = 2,
verbose = FALSE
)
out
head(get_csv_files(out))
```
Use `read_patients()` to read and aggregate all the `patients*.csv` files. `rfacts` has several such functions, including `read_weeks()` and `read_mcmc()`.
```{r}
read_patients(out)
```
## The simulation process
`run_facts()` has two sequential stages:
1. `run_flfll()`: generate the `*.param` files and the folder structure for the FACTS Linux engines.
2. `run_engine()`: execute the instructions in the `*.param` files to conduct trial simulations and produce CSV output.
```{r}
out <- run_flfll(facts_file, verbose = FALSE)
run_engine(facts_file, param_files = out, n_sims = 4, verbose = FALSE)
read_patients(out)
```
`run_engine()` automatically detects the Linux engine required for your FACTS file. If you know the engine in advance, you can use a specific engine function such as `run_engine_contin()` or `run_engine_dichot()`.
```{r}
out <- run_flfll(facts_file, verbose = FALSE)
run_engine_contin(param_files = out, n_sims = 4, verbose = FALSE)
read_patients(out)
```
If you are unsure which engine function to use, call `get_facts_engine()`
```{r}
get_facts_engine(facts_file)
```
## Run a single scenario
If we take control of the simulation process, we can pick and choose which FACTS simulation scenarios to run and read.
```{r}
# Example FACTS file built into rfacts.
facts_file <- get_facts_file_example("contin.facts")
# Set up the files for the scenarios.
param_files <- run_flfll(facts_file, verbose = FALSE)
# Each scenario has its own folder with internal parameter files.
scenarios <- get_param_dirs(param_files) # not in rfacts <= 1.0.0
scenarios
# Let's pick one of those scenarios and run the simulations.
scenario <- scenarios[1]
run_engine_contin(scenario, n_sims = 2, verbose = FALSE)
read_patients(scenario)
```
## Parallel computing
rfacts makes it straightforward to parallelize across simulations. First, use `run_flfll()` to create a directory of param files. The example below uses a `tempfile()` to store the param files (i.e. `output_path`). However, for distributed computing on traditional HPC clusters, `output_path` should be a directory path that all nodes can access.
```{r}
library(rfacts)
facts_file <- get_facts_file_example("contin.facts")
# On traditional HPC clusters, this should be a shared directory
# instead of a temp directory:
tmp <- fs::dir_create(tempfile())
param_files <- file.path(tmp, "param_files")
run_flfll(facts_file, param_files)
```
Next, write a custom function that accepts the param files, runs a single simulation for each param file, and returns the important data in memory. Be sure to set a unique seed for each simulation iteration.
```{r}
sim_once <- function(iter, param_files) {
# Copy param files to a temp file in order to
# (1) Avoid race conditions in parallel processing, and
# (2) Make things run faster: temp files are on local node storage.
out <- tempfile()
fs::dir_copy(path = param_files, new_path = out)
# Run the engine once per param file.
run_engine_contin(out, n_sims = 1L, seed = iter)
# Return aggregated patients files.
read_patients(out) # Reads fast because `out` is a tempfile().
}
```
At this point, we should test this function locally without parallel computing.
```{r}
library(dplyr)
# All the patients files were named patients00001.csv,
# so do not trust the facts_sim column.
# For data post-processing, use the facts_id column instead.
lapply(seq_len(4), sim_once, param_files = param_files) %>%
bind_rows()
```
Parallel computing happens when we call `sim_once()` repeatedly over several parallel workers. A powerful and convenient parallel computing solution is [`clustermq`](https://mschubert.github.io/clustermq/). Here is a sketch of how to use it with `rfacts`. `mclapply()` from the `parallel` package is a quick and dirty alternative.
```{r, eval = FALSE}
# Configure clustermq to use our grid and your template file.
# If you are using a scheduler like SGE, you need to write a template file
# like clustermq.tmpl. To learn how, visit
# https://mschubert.github.io/clustermq/articles/userguide.html#configuration-1
options(clustermq.scheduler = "sge", clustermq.template = "clustermq.tmpl")
# Run the computation.
library(clustermq)
patients <- Q(
fun = sim_once,
iter = seq_len(50),
const = list(params = params),
pkgs = c("fs", "rfacts"),
n_jobs = 4
) %>%
bind_rows()
# Show aggregated patient data.
patients
```
Alternatives to `clustermq` include `parallel::mclapply()`, `furrr::future_map()`, and `future.apply::future_lapply()`.
## Helpers
Various `get_facts_*()` functions interrogate FACTS files.
```{r}
get_facts_scenarios(facts_file)
get_facts_version(facts_file)
get_facts_versions()
```