Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multispecies / multiresponse vignette #361

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
247 changes: 247 additions & 0 deletions vignettes/web_only/multispecies.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
---
title: "Fitting multispecies models with sdmTMB"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Fitting multispecies models with sdmTMB}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

**If the code in this vignette has not been evaluated, a rendered version is available on the [documentation site](https://pbs-assess.github.io/sdmTMB/index.html) under 'Articles'.**

```{r setup, include = FALSE, cache=FALSE}
dplyr_installed <- require("dplyr", quietly = TRUE)
ggplot_installed <- require("ggplot2", quietly = TRUE)
pkgs <- dplyr_installed && ggplot_installed
EVAL <- identical(Sys.getenv("NOT_CRAN"), "true") && pkgs
knitr::opts_chunk$set(
collapse = TRUE,
warning=FALSE,
message=FALSE,
comment = "#>",
fig.width = 7,
fig.asp = 0.618,
eval = EVAL,
purl = EVAL
)
```

```{r packages, message=FALSE, warning=TRUE}
library(ggplot2)
library(dplyr)
library(sdmTMB)
```

For some applications, we might be interested in fitting a model that includes multiple responses (such as 2+ species). The most important step in fitting these models is understanding which parameters are shared, and which parameters are species specific.

Below, we illustrate a series of models that may be used to answer different questions. We'll start by simulating a 2-species dataset. Each species is allowed to have unique spatial ranges and variances, as well as different year effects.

```{r sim_dat}
set.seed(123)

# make fake predictor(s) (a1) and sampling locations:
predictor_dat <- data.frame(
X = runif(500), Y = runif(500),
year = rep(1:5, each = 100)
)
predictor_dat$fyear <- as.factor(predictor_dat$year)
mesh <- make_mesh(predictor_dat, xy_cols = c("X", "Y"), cutoff = 0.1)

sim_dat_A <- sdmTMB_simulate(
formula = ~ 0 + fyear,
data = predictor_dat,
time = "year",
mesh = mesh,
range = 0.5,
family = tweedie(),
seed = 42,
sigma_O = 0.2,
phi = 0.1,
sigma_E = 0.1,
B = runif(5, min = 5, max = 8) # 5 random year effects
)
sim_dat_A$species <- "A"

sim_dat_B <- sdmTMB_simulate(
formula = ~ 0 + fyear,
data = predictor_dat,
time = "year",
mesh = mesh,
range = 0.2,
family = gaussian(),
seed = 42,
sigma_O = 0.3,
phi = 0.1,
sigma_E = 0.1,
B = runif(5, min = 5, max = 8) # 5 random year effects
)
sim_dat_B$species <- "B"

sim_dat <- rbind(sim_dat_A, sim_dat_B)
sim_dat$fyear <- as.factor(sim_dat$year)
```

We'll start by making the mesh across the full dataset

```{r mesh_fig, fig.asp=1}
spde <- make_mesh(sim_dat, c("X", "Y"), cutoff = 0.1)
plot(spde)
```

### Model 1: species specific intercepts

As a first model, we can include species specific year effects. This can be done in a couple ways -- one option would be to estimate the `species*year` interaction, letting the year effects for each species be independent. All other parameters (range, spatial field, spatial variance, spatiotemporal fields, spatiotemporal variances) are shared.

```{r}
fit <- sdmTMB(
formula = observed ~ fyear*species,
data = sim_dat,
time = "year",
spatiotemporal = "iid",
mesh = spde, family = gaussian())
```

This model doesn't appear to converge, and the results from `sanity(fit)` indicate problems with the spatial and spatiotemporal variances. This is likely a sign that our assumptions in the estimation model are wrong for this particular dataset (not surprising, as our simulated data has different parameters by species).

Our simulated dataset only has 5 years of data -- if we had a longer time series, an alternative formulation for our year effects could be to fit smooth terms, like in a GAM with `mgcv`.

```{r eval=FALSE}
fit <- sdmTMB(
formula = observed ~ s(year,by="species"),
data = sim_dat,
time = "year",
spatiotemporal = "iid",
mesh = spde, family = gaussian())
```

### Model 2: species specific fields

We may be interested in fitting a model that lets the spatial patterning differ by species. These kinds of models can be expressed using spatially varying coefficients. Note that we use `spatial = off` because this represents a global spatial intercept -- turning this off is equivalent to using a `-1` of `0` in a main formula including a factor.

```{r}
fit <- sdmTMB(
observed ~ 0 + fyear, # shared year effects
data = sim_dat,
mesh = spde,
family = gaussian(),
spatial = "off",
time = "year",
spatiotemporal = "iid",
spatial_varying = ~ 0 + as.factor(species)
)
```

This model also appears to be problematic. If we look at the estimated parameters, you'll notice that there are two row entries for `sigma_Z`. This means that our model is trying to estimate separate species specific variance terms for the species specific spatial fields.

```{r}
fit$sd_report
```

If we wanted to estimate species specific spatial fields with a single shared variance (meaning the magnitude of the peaks and valleys in the field was similar), we could do that by specifying a custom map argument and passing it into `sdmTMBcontrol()`

```{r}
map_list = list(ln_tau_Z = factor(c(1, 1)))

fit <- sdmTMB(
observed ~ 0 + fyear, # shared year effects
data = sim_dat,
mesh = spde,
family = gaussian(),
spatial = "off",
time = "year",
spatiotemporal = "iid",
spatial_varying = ~ 0 + as.factor(species),
control = sdmTMBcontrol(map = map_list)
)
```

### Model 3: species specific spatiotemporal fields

As a last example, we can set up a model that also includes spatiotemporal fields unique to each species. In all of the examples above, spatiotemporal fields are included -- but shared among species.

One approach to including spatiotemporal fields is creating a new variable combining species and year. We can then include this spatiotemporal variation by changing the `time` argument to be `time = "species_year"`

```{r}
# This needs to be a factor
sim_dat$species_year <- as.factor(paste(sim_dat$species, sim_dat$year))

map_list = list(ln_tau_Z = factor(c(1, 1)))

fit <- sdmTMB(
observed ~ 0 + fyear, # shared year effects
data = sim_dat,
mesh = spde,
family = gaussian(),
spatial = "off",
time = "species_year",
spatiotemporal = "iid",
spatial_varying = ~ 0 + as.factor(species),
control = sdmTMBcontrol(map = map_list)
)

```

This is getting a little better -- there is a gradient warning, but the Hessian appears to be ok.

### Putting it all together

Our spatiotemporal model above came close to passing all sanity() checks, and could be modified to also include the species-specific year effects that we started with

```{r}
fit <- sdmTMB(
observed ~ 0 + fyear*species, # shared year effects
data = sim_dat,
mesh = spde,
family = gaussian(),
spatial = "off",
time = "species_year",
spatiotemporal = "iid",
spatial_varying = ~ 0 + as.factor(species),
control = sdmTMBcontrol(map = map_list)
)
```

These examples illustrate a number of ways that species - specific effects can be included in `sdmTMB` models, and can be extended to other types of groupings. A brief summary of these can be summarized as:

```{r echo=FALSE}
desc <- data.frame("Form" = c("Main effects", "Spatial effects", "Spatial effects w/shared variance", "Spatiotemporal effects"), "Implementation" = c("Year by species interactions, or smooths", "Spatially varying coefficients", "Spatially varying coefficients + map argument", "Species-year factor as time variable"))
knitr::kable(desc)
```

### Extensions

Including spatiotemporal effects like species-year combinations in the `time` argument is relatively straightforward, but these models could get more complicated if we wanted to also include other grouping factors (such as cohort effects for each species). There may be some identifiability issues that would have to be resolved, but even if the `time` argument is being used, it's possible to add other factors through the `spatial_varying` formula. As an example, if our dataframe included a cohort variable for each species, we could first create a new combination

```{r eval=FALSE}
sim_dat$species_cohort <- as.factor(paste(sim_dat$species, sim_dat$cohort))
```

If there were ~ 10 levels of `species_cohort`, and we were including our original spatial effects in `spatial_varying`, we could create a new map list that would indicate we wanted all species-cohort fields to have the same spatial variance,

```{r eval=FALSE}
sim_dat$species_year <- as.factor(paste(sim_dat$species, sim_dat$year))

# The first 2 elements correspond to species, remaining 10 to species-cohort
map_list = list(ln_tau_Z = factor(c(1, 1, rep(2, 10))))

fit <- sdmTMB(
observed ~ 0 + fyear*species,
data = sim_dat,
mesh = spde,
family = gaussian(),
spatial = "off",
time = "species_year",
spatiotemporal = "iid",
spatial_varying = ~ 0 + as.factor(species) + species_cohort,
control = sdmTMBcontrol(map = map_list)
)
```








Loading