-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generate_cv_index_spt
#394
Comments
@mitchellmanware I'd say this is indeed a valid CV method. Couple questions and comments:
|
becomes
|
@sigmafelix @kyle-messier |
@mitchellmanware If we don't have to fix the number of folds, I think it would be better to use a semantic (or "common sense") temporal period to make folds, for example, one year or two years. FYI, as you might have noticed, the initial version of |
This is good input, thank you @sigmafelix. When you say "semantic (or "common sense") temporal period to make folds, for example, one year or two years.", do you mean derive the number of folds from the years/dates available in |
For cut points at |
@mitchellmanware I think that data in a new 6-month period can be appended to the last fold. Our initial period is five years and there are datasets for two following years (2023-2024), which may result in five folds of 18 months * 4 + 12 months or seven folds (12 months * 7). |
true, but by deriving from the dates in
think that also gets into the question of whether the models are re-trained at each new interval or if the first run models are fit to the new temporal points. Although the introduction of new AQS sites will also introduce new spatial points as well. |
Then I will modify the function to convert date objects to integer to get temporal blocks for spatiotemporal block CV. I think that the model needs to be retrained with the newly acquired datasets. The CV folds are generated fairly fast as the CV indexing function is lightweight (i.e., returning only integers). |
@sigmafelix @kyle-messier
I am trying to update the
generate_cv_index_spt
function to create a spatiotemporal cross validation method based on a combination of leave one location out and leave one time out where one "location" is aspatialsample::spatial_block_cv
block and one "time" is a year's worth of data. It would look something like thisI have developed a version on
beethoven_dev
at https://github.com/mitchellmanware/beethoven_dev/blob/mm-0203/function.R.The function, as expected, creates 25 (5 spatial x 5 temporal) folds where the testing (aka assessment) data for each set is one spatial block for one year. This is seen by checking the unique years available in the training and testing sets for each split.
Is this a valid CV method? I based it on our custom functions but I don't know if manually making the CV splits like this is ok.
The text was updated successfully, but these errors were encountered: