Skip to content

Conversation

@camillegiuliano
Copy link
Contributor

This adds a new function called gcLocatorCreate.
This function builds the gcLocator raster layer. This was built according to the data we have for Saskatchewan, where we already had a productivity class and spatial unit raster layer, along with a growth curve lookup csv that assigns the correct gcid to each combination of productivity class, spatial unit id, and leading species for the province. Whenever I would try to run a new data source, I'd have to rebuild the gcLocator file, so I figured a quick function would be simplest moving forward.

Currently the function only works if all 3 raster layers are in the same CRS and all have the same extents. I can either add a check for this and put a stop in the function if they don't and explicitly let the user know that these don't match, or I can add some sort of if/else situation that will reproject so that all the layers match. I'm leaning towards just a check and letting the user mess with their own files rather than forcing a projection change.

With this function, you can build a gcLocator layer as long as you have a growth curve lookup table, and raster layers for productivity class, spatial unit IDs, and leading species. This should hopefully make using new data sources in traditional (non LandR) spadesCBM simpler.

I'm setting this as a draft for now, so I can settle on a decision about what to do if extents/CRS don't match, get the documentation in order, and also because the package version numbers will definitely be wildly different once some of the larger plotting PRs go through later, so I haven't change that on my end.

@suz-estella
Copy link
Contributor

I think there's a chance that we don't need this function since CBM_dataPrep and CBM_vol2biomass_SK can accept multiple columns for curveID. e.g. (with most arguments omitted):

setupProject(

  ## omit the gcIndexLocator argument

  curveID = c("speciesId", "prodclass"),
  cohortLocators = list(
    speciesId  = leadSpeciesRaster,
    prodclass = siteProductivityRaster
))

This will give you a cohortDT table including the columns spatial_unit_id, speciesId, and prodclass. As long as userGcMeta and userGcM3 also have the speciesId and prodclass columns, CBM_vol2biomass_SK will generate the gcMeta and growth_increments tables to have a unique gcids column for every spatial_unit_id, speciesId, and prodclass combination defined in the userGcSPU input object (created by CBM_dataPrep). This new gcids column is also added to cohortDT.

More info here: PredictiveEcology/CBM_vol2biomass_SK#27

@camillegiuliano
Copy link
Contributor Author

That's a good point it could work like this too, and I could get rid of this function entirely. Unless we would want to have a tangible gcLocator raster for whatever reason (in which case we can probably go about it a different way too). I'll look into doing a run this way with the SCANFI files.

Currently our userGcMeta and userGcM3 tables don't have prodClass (in fact, I don't think productivity class is present anywhere in current spadesCBM runs at the moment). I could update them to have that column though. I've been wanting to update those SK tables to include all the SK gcid options for a while, so that we don't run into issues of missing growth curves whenever we use a new data source, or hit that error Dominique hit last week where the two files didn't match. I definitely could make all those updates, then rely on the raw productivity class and leading species files when running any data source in SK and they all should run without too much effort, we could also just get rid of the gcIndex file we currently use for CASFRI at that point, since that file was built in the same way as the function here anyway.

@suz-estella
Copy link
Contributor

Definitely could still be a nice function to have in our repertoire regardless!

One thing that is nice about using the multiple columns from "raw" sources is it leads to more reproducible code - no need to ask "how was this gcIndexLocator raster made"? It also makes it easy if we want to swap just one source - e.g. speciesID - to see how it changes things.

@camillegiuliano
Copy link
Contributor Author

So.... weird update here, I ran the SCANFI data with the edited curveID rather than the gcIndex raster I'd built, and it runs with no real issues, EXCEPT results are different between the two somehow, investigating what is happening here.
I did find out the SCANFI age raster was using decimals for ages at some point, so it's possible I'm looking at old results where I was using the decimal version.

@cboisvenue
Copy link
Collaborator

Good work here.
Here is my input: I think we absolutely need to use curveID as what defines what curve get used where will be different for each study area. As we get better remote sensing information, these columns in curveID may change to things I can't even think of yet.
Note: the changes in results are expected if we change what curve is used where. I would be checking that CBM_vol2biomass is running correctly, with the corrections to curves making sense. I am happy to help with this if needed.

@suz-estella
Copy link
Contributor

Looking into this a bit: I put a temporary stop here to ensure that curveID is just 1 column: https://github.com/PredictiveEcology/CBM_vol2biomass_SK/blob/23d767afc9c221bc9ef582d0f79b1060b173e882/CBM_vol2biomass_SK.R#L151

I threw this in because there's quite a bit of following code in the Init event that treats curveID like it is a length 1 vector. However, it would likely be easy to take a bit of time to update the event to allow for more than 1 column. I think it would be a worthwhile thing to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants