-
Notifications
You must be signed in to change notification settings - Fork 24
Add ClassificationLayer
for Raster Classification
#668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks, this PR will be used as support for 28/03/2025 and debate with the all team in person :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not a real review, I just look a it for a minute, thanks for work again
1a10549
to
d687665
Compare
86a9747
to
9c59796
Compare
77fe41a
to
bf7fd44
Compare
:raise ValueError: if req_stats_classes are not class names. | ||
""" | ||
if self.classification_masks is None or self.class_names is None: | ||
raise ValueError("Classification has not been applied yet. Call apply_classification() first.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace "apply_classification" -> apply
Thanks @vschaffn for the implementation. This will certainly be useful for mainy users!
|
I was about to write my feedback in the issue following previous comments there (GlacioHack/xdem#692 (comment)), but seeing @adehecq's comment I'll follow up here as well. Taking stock of existing tools: So I think the big question is: What's different in our case that would justify developping a new functionality? There are two aspects: So, in my view, it's only 2 that is relevant. But then, I don't think we want to create a completely new API when there is already a good abstraction covering all of our use cases that exists in Pandas/Xarray. So I'd be in favour of adding a stats, masks = dem.groupby(slope, slope_bins, return_masks=True)
stats.to_csv(out_csv)
masks.save(out_tif) If that API is still deemed too difficult by CNES, we could maintain the classes above to wrap up the operations, but it would duplicate things into a second API. |
Thinking a bit more about this, a couple additional points:
So, to simplify the API for a Raster and avoid the more complex logic of Pandas, we could immediately have a higher-level function class Raster
def grouped_statistic(self, band=1, vars, bins, apply_funcs):
df_self = pd.DataFrame([self.data[band-1].ravel(), *(v.ravel() for v in vars)])
return df_self.groupby([var1, pd.cut(var1, bins1, ...)], [...]).agg(apply_funcs) For this reason (and why it's like that in xDEM's |
So the question we need to answer to move forward is: What would be missing in the functionalities of such a For getting/saving the georeferenced masks, there could be an optional dictionary output using the same group name logic coded in this PR, with group key copied from the output dataframe, and all could be saved doing: stats, mask_dict = dem.grouped_statistics(slope, slope_bins, ...)
for name, mask in mask_dict.items():
mask.save("mask_"+name+".tif") But, if we wanted to have tailored plotting functions (to visualize the stats of 1D/2D groups), it'd be something hard to fit as an output of a function call (and we use it a lot in xDEM), and a class would be really useful. Finally, having a CLI functionality that wraps this (reading + binning + plotting + saving) all at once might also simplify things for end-users? These are all my thoughts! 😄 |
Just a note on my suggestion of adding a landcover classification. I think we could use the ESA worldcover. An example for accessing the data through the AWS bucket is shown here: https://github.com/ESA-WorldCover/esa-worldcover-datasets. |
Thank you @rhugonnet and @adehecq for your comments. It is nice to see that the topic is generating a lot of interest. However, I suggest that these ideas should correspond to new issues. It might be interesting to merge if there are no blocking points and propose new issues to be addressed later. |
@belletva Thanks for wrapping things up! 🙂 From my point of view, based on existing functionalities in SciPy/Pandas described above that already cover all use cases, I think we should have a wrapper method In short: If there is a need for a class structure, I haven't really grasped it yet. This rather fits a method-type functionality (which would be very fast to implement, as simply wrapping Pandas/SciPy). |
Resolves GlacioHack/xdem#692.
Context:
This PR introduces the
ClassificationLayer
functionality within the geoutils library. The goal is to provide tools for classifying raster data into discrete classes, whether based on numeric intervals (e.g., binning elevation ranges) or categorical classes (e.g., land cover types). This enhancement improves geospatial workflows by allowing users to easily classify raster data and compute statistics for each class.Features:
ClassificationLayer
Abstract Base Class:apply_classification
), computing statistics (get_stats
, based onRaster.get_stats
), and saving results (save
).Binning
Class:ClassificationLayer
for classifying continuous raster data using custom binning.Segmentation
Class:Fusion
Class:Save/Export Functionality:
Tests:
Unit tests for Binning to verify:
Unit tests for
Segmentation
.Unit tests for
Fusion
.Documentation:
Example: