Skip to content

Commit

Permalink
move over parallelization part of old cloudDemo vig
Browse files Browse the repository at this point in the history
  • Loading branch information
see24 committed Jan 5, 2024
1 parent 6022e93 commit 2a8be24
Show file tree
Hide file tree
Showing 3 changed files with 64 additions and 0 deletions.
15 changes: 15 additions & 0 deletions analyses/scripts/02_run_model_furrr.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@

future::plan("multisession")
tictoc::tic()
# which variable in mtcars is the best predictor of mpg
r2_tab <- furrr::future_map_dfr(names(mtcars)[-1], do_mod) |>
dplyr::mutate(variable = reorder(variable, r.squared))

ggplot(r2_tab, aes(variable, r.squared))+
geom_col()

ggsave("figures/r2_mpg_variables.png")

tictoc::toc()

future::plan("sequential")
21 changes: 21 additions & 0 deletions analyses/scripts/03_run_model_foreach.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@

future::plan("multisession")

tictoc::tic()
# which variable in mtcars is the best predictor of mpg


r2_tab <- foreach::foreach(x = names(mtcars)[-1], .combine = rbind) %dofuture%
do_mod(x)

r2_tab <- r2_tab |>
dplyr::mutate(variable = reorder(variable, r.squared))

ggplot(r2_tab, aes(variable, r.squared))+
geom_col()

ggsave("figures/r2_mpg_variables.png")

tictoc::toc()

future::plan("sequential")
28 changes: 28 additions & 0 deletions vignettes/parallelization.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
title: "Parallelization"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Parallelization}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE,
comment = "#>")
```

This tutorial explains different methods for using the cloud to run an analysis in parallel.

## Running the analysis in parallel
So far our analysis is run sequentially so it will not be able to take advantage of the multiple cores available on a cloud machine. To make use of parallel computing we can set up our script to run each model on a separate core. Because this example is so simple I have added a delay in the function so we can see the benefit of running in parallel. There are many ways to do this and I will explain a few options. One option is to create a parallel backend in R using something like the future package. This can be quite straight forward and there are packages to connect this familiar methods of iterating (eg for loops, lapply or purrr). This can get a bit tricky in that you are typically running only parts of the script on multiple cores you need to make sure the right dependencies are available on the workers. future manages this for the most part but see https://future.futureverse.org/articles/future-4-issues.html for tips when that fails.

### Using the future and furrr packages

The future package sets up the infrastructure needed to run things in the background. To initialize this you run `future::plan("multisession")` which will use all `future::availableCores()` by default. See script "analyses/02_run_model_furrr.R" for an example.

### Using future and foreach

If you usually use for loops then the foreach package might be easier to use. This can be used with future to create the backend using the doFuture package.


0 comments on commit 2a8be24

Please sign in to comment.