Abundance estimation #277

meriops · 2023-11-19T11:40:44Z

meriops
Nov 19, 2023

Hi everyone,

We are trying to estimate the abundance of a group-living species in a fairly
large (~ 13000 km2) area, with repeated surveys (ie 6 counts in a day) at 101 points.

There aren't that many observations to work with though, with ~ 50 to ~ 120 (or ~ 300 to ~ 750 individuals) observed at each survey.

The model is fairly simple, with group_size modeled with a negative binomial
distribution, and for now, an intercept-only model with a spatial component to account for location
of observations, and a temporal component to account for repeated surveys, and because it is
kind of required by the get_index() function.

The model seems to fit reasonably well, and the sanity and residuals checks are OK.

I have 2 question (maybe for now) :

Not all points were surveyed 6 times. Is this already and properly 'accounted for' through the varying intercept by survey (ie through the temporal component of the model)? Or should I also use the actual number of surveys per point as an ofset?
As far as I understand in this context, the estimate provided by get_index() for each survey (time slice) is the abundance over the 'study area' as defined by the mesh. Before today, this estimate seemed to be extremely dependent on the cell size of the prediction grid. I had results ranging from small but sensical, to nonsensical crazy high estimates. Why would this estimate be so dependent on grid size, and what what would be the best way to find the "correct" grid size?

Disclaimer: as I am trying again now, this estimate seems much more stable across different grid sizes. I was likely doing something wrong, and this second question may not be relevant anymore. Please let me know if this is the case...

Thank you in advance for your help.

Olivier

seananderson · 2023-11-22T02:19:44Z

seananderson
Nov 22, 2023
Maintainer

Hi Olivier,

and a temporal component to account for repeated surveys, and because it is
kind of required by the get_index() function.

You need to specify the time argument so that get_index() knows what to group the abundance over, but you can leave spatiotemporal = "off" to skip the spatiotemporal random effects if you want. You could choose to have the time component be only in the fixed effects and/or in the time_varying component and/or in a s() smoother.

Not all points were surveyed 6 times. Is this already and properly 'accounted for' through the varying intercept by survey (ie through the temporal component of the model)? Or should I also use the actual number of surveys per point as an ofset?

I'm a bit confused about the setup. What is the unit of time in your index calculation? Day? Something longer? I don't think you'd want this as an offset, unless this is your measure of effort and you're summing your survey observations. The offset would be the (log) measure of effort for a given observation. E.g., (log) hours surveyed, assuming your response is the count and not already the ratio of count to effort. It is possible you would want a random intercept or something in time-varying to account for observations from the same 'survey' being correlated with each other, but again, I'm not sure I understand the setup. It's fine for some sites to have more observations than others (technically as long as your choice of where to sample isn't being driven by the underlying abundance dynamics themselves).

As far as I understand in this context, the estimate provided by get_index() for each survey (time slice) is the abundance over the 'study area' as defined by the mesh.

As defined by the 'grid' (or similar) supplied to newdata when you predict from your model. The mesh gets you from the random effects to the data and from the random effects to the grid/newdata via bilinear interpolation.

Before today, this estimate seemed to be extremely dependent on the cell size of the prediction grid. I had results ranging from small but sensical, to nonsensical crazy high estimates. Why would this estimate be so dependent on grid size, and what what would be the best way to find the "correct" grid size?

Did you set the area argument in get_index()? This would default to 1, i.e. 1 unit of area per grid cell. If you adjusted grid cell size you would also want to adjust the area accordingly. Tough to say what be "correct" grid size is, and within reason, it hopefully doesn't matter that much. It's just a way of integrating over the area of interest. We grappled with it a little in this paper: https://doi.org/10.7717/peerj.12783 You'd want your grid to be roughly on the scale of this spatial resolution you are collecting data at also balancing the decision with computational speed. Finer grids will take longer and take more memory to compute.

0 replies

meriops · 2023-11-24T10:59:12Z

meriops
Nov 24, 2023
Author

Hi Sean,

Thank you for your reply. Let's see if I can be a bit more precise about the setup below...

You need to specify the time argument so that get_index() knows what to group the abundance over, but you can leave spatiotemporal = "off" to skip the spatiotemporal random effects if you want. You could choose to have the time component be only in the fixed effects and/or in the time_varying component and/or in a s() smoother.

This is indeed useful to know.

I'm a bit confused about the setup. (...)

The study was initially set up as "point transects" for distance sampling. Each point is surveyed 6 times at 2 hours interval, measuring the distance and bearing to each animal group and the group size, within a ~ 5km radius plot centered on the survey point. Observations from a same surveys might be spatially correlated, but within a single survey, each group was observed only once. However, because of this design with 6 surveys, many of the animal groups were counted repeatedly across surveys throughout the day, within the same point transect.
I wasn't sure what would be the best way to deal with this: include the 6 surveys as a "temporal effect", or as an offset ... But since I was under the impression that the temporal effect was required to be able to use the estimate the abundance, I went for the temporal effect. Thinking back to it, the offset is probably better, especially if it is possible to specify the time argument with spatiotemporal = "off". But please correct me if that's the wrong way...

As far as I understand in this context, the estimate provided by get_index() for each survey (time slice) is the abundance over the 'study area' as defined by the mesh.

As defined by the 'grid' (or similar) supplied to newdata when you predict from your model. The mesh gets you from the random effects to the data and from the random effects to the grid/newdata via bilinear interpolation.

Right. Now I understand better what was going on.

Tough to say what be "correct" grid size is, and within reason, it hopefully doesn't matter that much. It's just a way of integrating over the area of interest. We grappled with it a little in this paper: https://doi.org/10.7717/peerj.12783 You'd want your grid to be roughly on the scale of this spatial resolution you are collecting data at also balancing the decision with computational speed. Finer grids will take longer and take more memory to compute.

I guess I was doing something wrong... The estimates I get now seem to be much more "stable" across different grid sizes, which makes more sense.
I will definitely check the paper out. Thank you for the reference.

Thank you again for your help.

Olivier

0 replies

seananderson · 2023-11-24T23:49:01Z

seananderson
Nov 24, 2023
Maintainer

What is a row of data (observation) that you are fitting? Is it a count of individuals at one point in space and time from one survey?

I wasn't sure what would be the best way to deal with this: include the 6 surveys as a "temporal effect", or as an offset

Here, what would the 'offset' column in your data look like?

1 reply

meriops Nov 25, 2023
Author

Yes, each observation is a count of individuals at one point in space and time from one survey. For example :

Plot P1, survey A, group of 7 animals at coordinates (x1, y1)
Plot P1, survey A, group of 12 animals at coordinates (x2, y2)
Plot P1, survey B, group of 2 animals at coordinates (x3, y3)
Plot P2, survey F, group of 15 animals at coordinates (x4, y4)

The offset column would then be a column containing the total number of surveys carried out at each plot, which is 6 everywhere, except for a handful of plots with 2 or 3 surveys.

meriops · 2024-01-24T14:45:46Z

meriops
Jan 24, 2024
Author

@seananderson Coming back to this after a break.... I'm still a bit confused. Please allow me to try to clarify a bit further, as I would really appreciate your input.

Most plots were surveyed 6 times at 2 hours interval, but a few were surveyed only 2 or 5 times. This is coded in the data set as a factor column called facnumhour with levels 1 to 6 ; ie if the plot was surveyed only 2 times, then only levels 1 and 2 appear. The data also includes a column numhour , which is the numeric equivalent of facnumhour . There is also a column effort which gives the total number of surveys for each plot ; ie if the plot was surveyed 6 times, then effort contains 6 for each row concerning this particular plot. Each row is one group of size animals, observed during a given survey, for a given plot. When plots were surveyed with no observation, then size == 0 .

I currently fit the model with :

fit <- sdmTMB(
  size ~ 0 + facnumhour,
  data = ddf,
  mesh = mesh,
  family = nbinom2(link = "log"),
  spatial = "on",
  # offset = "logeffort", # ???
  time = "facnumhour",
  spatiotemporal = "off"
)

My main question right now is about whether I need the offset = "logeffort" line. In other words, does time = "facnumhour" account for repeated surveys, even though spatiotemporal = "off" ? Or do I (always) need the offset to account for these repeated surveys at each plot ? And is it the same if / when we also estimate the spatiotemporal random field ?

Thank you in advance for your help.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abundance estimation #277

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Abundance estimation #277

meriops Nov 19, 2023

Replies: 4 comments · 1 reply

seananderson Nov 22, 2023 Maintainer

meriops Nov 24, 2023 Author

seananderson Nov 24, 2023 Maintainer

meriops Nov 25, 2023 Author

meriops Jan 24, 2024 Author

meriops
Nov 19, 2023

Replies: 4 comments 1 reply

seananderson
Nov 22, 2023
Maintainer

meriops
Nov 24, 2023
Author

seananderson
Nov 24, 2023
Maintainer

meriops Nov 25, 2023
Author

meriops
Jan 24, 2024
Author