How do I combine 3 surveys worth of data? Only two of them are overlapping spatially #177

ericward-noaa · 2023-02-06T23:19:29Z

ericward-noaa
Feb 6, 2023
Collaborator

Question via email:
I am currently dealing with three different bottom survey time series, i.e., survey 1, survey 2, and survey 3. Surveys 2 and 3 are partially overlapping spatially (but different vessels / gears) - and Survey 1 overlaps neither. In this case, I thought I could use survey 3 time series to impute missing years of data in survey 2 using sdmTMB (or also use survey 1 data to impute missing years of data in survey 2). Can I try forecasting with sdmTMB to impute survey 2 missing data using either survey 3 or survey 1 data?

Response: there are several ways you could go about this and generate an index across the three surveys. It seems like Survey 1 should probably be dealt with separately, in that you can fit an sdmTMB model and make predictions to a grid specific to that survey. You can then fit a second model to data from Survey 2 and 3, with a couple of ways to incorporate differences between surveys (vessels / gears). Some options:

as a null model, don't include Survey as a covariate, and treat the data as if they're coming from similar vessels / gears
include Survey as a fixed effect (factor) in the main formula. If you have data for lots of vessels, and there's thought to be differences between them, vessel ID could be included as a random grouping factor in the main formula, e.g. ~ + (1 | vessel_id)
include Survey as a covariate on the dispersion. Note: there is a branch with this formula option (dispformula2), and the syntax is the same as the main formula dispformula = ~ 0 + Survey.
Once the model is fit, you can predict to a grid for Surveys 2 and 3, and then combine predictions across the 2 areas to generate a total index that includes Survey 1.

There are subtle differences in how these models may be interpreted. When covariates such as Survey enter the dispersion parameter, they let each Survey have a different variance -- so you can imagine one being down-weighted relative to the other. When covariates such as Survey are included in the main formula (for example using a Tweedie distribution as a response), they affect the estimated latent log biomass available to be sampled (this is shifted up or down by an intercept for each survey). For delta models, Survey could be a covariate either in the presence-absence submodel (probability of species occurrence differs by survey region) or the positive submodel (catch rates vary by survey). In this particular application, Survey 2/3 are overlapping spatially -- so it may be important to consider which of these is reasonable (e.g. it may be unlikely that occurrence rates differ by Survey)

Thom-Teears · 2023-05-14T22:22:01Z

Thom-Teears
May 14, 2023

Hello,
Thank you for providing this excellent post. I have been trying to add vessel ID to an sdmTMB spatiotemporal model as you advised above and this model is extremely slow. I suspect it is due to having several thousand vessels in the dataset. I simply added vessel ID as a random effect to my main formula as:

Response_variable ~ Flag + s(HBF.ratio, k = 3) + (1 | Vessel_ID).

Does it matter if vessel_ID is input as a factor or character string? Also, there are a portion of vessels with no ID that I have listed as "missing". I am wondering if I have specified this correctly and if there is some way to configure this in a more efficient way or if it is unrealistic to include a random effect with so many different vessels?

Many thanks,
Thom

0 replies

ericward-noaa · 2023-05-15T09:02:42Z

ericward-noaa
May 15, 2023
Collaborator Author

Great questions. Currently, Vessel_ID should be passed in as a factor -- though currently working on the internal stuff to also accept characters.

If you have a "missing" label on some vessels, that will be estimated as a separate level of the factor, rather than NA. In general, missing levels of random effect groupings aren't allowed -- so they could be filtered out prior to fitting. With sdmTMB, you can make predictions to data with missing group identifiers (so you could use the fitted model to make predictions for those vessels you don't have groups for).

In terms of speed, several thousand random effects may make things a bit slow, but will still work. A few things to check might be: are there any vessels with just a few observations? If so, it might make sense to remove them. Second, you can look at the estimated random effect deviations -- are those ~ normally distributed, or is there some weirdness that may be affecting estimates? Specifically I'm thinking about multiple modes in the estimates, etc.

Try this to get the vessel level random effects,
tidy(fit, effects = "ran_vals")

0 replies

Thom-Teears · 2023-05-15T21:41:55Z

Thom-Teears
May 15, 2023

Thank you so much for your excellent response and for the suggestions. I will give them a try.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I combine 3 surveys worth of data? Only two of them are overlapping spatially #177

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

How do I combine 3 surveys worth of data? Only two of them are overlapping spatially #177

ericward-noaa Feb 6, 2023 Collaborator

Replies: 3 comments

Thom-Teears May 14, 2023

ericward-noaa May 15, 2023 Collaborator Author

Thom-Teears May 15, 2023

ericward-noaa
Feb 6, 2023
Collaborator

Thom-Teears
May 14, 2023

ericward-noaa
May 15, 2023
Collaborator Author

Thom-Teears
May 15, 2023