Speeding up a model with a lot of data #125

seananderson · 2022-08-23T22:27:23Z

seananderson
Aug 23, 2022
Maintainer

Question via email:

[...] Is there any other argument that would help to reduce time require to fit model using large dataset? silent = FALSE?

...
I thought about doing the model selection with a coarser mesh and rerun the best ones with a finer mesh. From your experience, do you think it could be a valuable approach? Did you ever test it and encounter significant results using different mesh resolution (coarser vs finer)?

I have a large bunch of data (> 500k observations). So anything that would help to improve the time to run those models would be appreciated!

Answered by seananderson

Aug 23, 2022

silent = FALSE just gives you progress info during fitting, which can be helpful to monitor progress on big models, but won't speed things up.

The biggest thing to speed things up is to use a coarser mesh. Also, some random field structures will be faster than others. Adding spatiotemporal fields will be much slower than just spatial fields. Depending what you’re doing, sometimes reml = TRUE is faster, but I wouldn’t use that for index standardization, if that’s what you’re doing.

A large number of extra_time slices will slow things down.

With > 500k observations, you could also consider gridding your data and changing the family accordingly. E.g. instead of binomial, grid the data, use t…

View full answer

seananderson · 2022-08-23T22:30:01Z

seananderson
Aug 23, 2022
Maintainer Author

silent = FALSE just gives you progress info during fitting, which can be helpful to monitor progress on big models, but won't speed things up.

The biggest thing to speed things up is to use a coarser mesh. Also, some random field structures will be faster than others. Adding spatiotemporal fields will be much slower than just spatial fields. Depending what you’re doing, sometimes reml = TRUE is faster, but I wouldn’t use that for index standardization, if that’s what you’re doing.

A large number of extra_time slices will slow things down.

With > 500k observations, you could also consider gridding your data and changing the family accordingly. E.g. instead of binomial, grid the data, use the centroids, and model the count per cell as negative binomial. Or similar for biomass etc.

Or downsampling for model experimentation.

Yes, you could certainly compare models with coarser meshes and then increase the resolution. You can often get away with surprisingly coarse meshes. At some point the models become overfit if the mesh becomes too fine, which isn’t always appreciated, and is only seen through cross validation (we're working on a paper looking at this).

You can also try setting up parallel processing: e.g. TMB::openmp(n = 2, DLL = "sdmTMB") for 2 cores.

But check that’s actually doing anything and actually faster. I’m not sure if that’s working on Windows. At some point more cores can definitely be slower.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up a model with a lot of data #125

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Speeding up a model with a lot of data #125

seananderson Aug 23, 2022 Maintainer

Replies: 1 comment

seananderson Aug 23, 2022 Maintainer Author

seananderson
Aug 23, 2022
Maintainer

seananderson
Aug 23, 2022
Maintainer Author