Updated WorldCerealInferenceDataset #103

kvantricht · 2024-09-03T10:06:58Z

Extractions of test patches are now more in line with actual inference on openEO. WorldCerealInferenceDataset has been updated to cope with these new files.

kvantricht · 2024-09-04T07:46:34Z

Scattered discussion of the above in older PR (#65). Once we'd merge this one, we can close the other one.

presto/dataset.py

gabrieltseng · 2024-09-05T08:59:16Z

presto/dataset.py

+            months,
+            target,
+            lon,
+            lat,


Do we need latlons and lon, lat here? If I understand correctly, lon == latlons[:, 1] and lat == latlons[:, 0], which means we don't need lon, lat?

It's not so simple I'm afraid, I struggled with this a lot. After the meshgrid and flattening of latlons it becomes quite hard to get back to original lon, lat we need to properly reconstruct the DataArray.

latlons.shape (2500, 2) lon.shape (50,)

So no, lon != latlons[:, 1]. I've been thinking about easier ways but haven't found them as of yet.

hmm okay. I think this is probably (?) easier, especially considering we additionally apply the transformation. The latlons take up quite a bit of RAM, and for large tiles this might become an issue.

Just in case its useful here is some code to go from the flat latlons back to the original ones (but without the transformation):

import numpy as np from einops import rearrange org_lat = np.array([1, 2, 3]) org_lon = np.array([4, 5, 6, 7]) def to_flat_latlons(lat, lon): lon, lat = np.meshgrid(lon, lat) latlons = rearrange(np.stack([lat, lon]), "c x y -> (x y) c") return latlons def from_latlons(latlons): x = len(np.unique(latlons[:, 0])) y = len(np.unique(latlons[:, 1])) latlons = rearrange(latlons, "(x y) c -> c x y", x=x, y=y) lats, lons = latlons[0], latlons[1] return lats[:, 0], lons[0, ] output_lat, output_lon = from_latlons(to_flat_latlons(org_lat, org_lon)) assert np.equal(output_lon, org_lon).all() assert np.equal(output_lat, org_lat).all()

I wonder if its worth passing around the original lats and lons and only applying that transformation right before the model ingests the values

I hope i understood this well. Could you check if my latest commit address this the way you suggest it? Feel free to suggest to do it differently. Rest of the day, unfortunately I'm away. Will catch up tomorrow morning.

kvantricht · 2024-09-05T10:21:57Z

@gabrieltseng can't get the checks to work on this one :-( flake complains (not locally for me) about missing white space but when I add it, black wants to split the line and this is going endless in loops. Not sure how to fix.

gabrieltseng · 2024-09-05T10:23:02Z

hmm strange. I will try fixing on my end and if it works merge it in

gabrieltseng · 2024-09-05T11:29:23Z

Hmm actually @kvantricht would you mind undoing 8ac58df ? I think i prefer the solution before.

Perhaps in dataset.py you could make the variable names more descriptive to make it clear why we need to pass lat lons twice?

perhaps flat_latlons and lat_for_reconstruction, lon_for_reconstruction (although thats quite a mouthful, at least its very explicit).

This reverts commit 8ac58df.

…esto-worldcereal into updated-inferencedatasets

kvantricht · 2024-09-05T18:57:29Z

Hmm actually @kvantricht would you mind undoing 8ac58df ? I think i prefer the solution before.

Perhaps in dataset.py you could make the variable names more descriptive to make it clear why we need to pass lat lons twice?

perhaps flat_latlons and lat_for_reconstruction, lon_for_reconstruction (although thats quite a mouthful, at least its very explicit).

@gabrieltseng so this appears to be all a bit more nasty than anticipated. I reverted the commit and started implementing your suggestion of name changes, but it turns out that with all the changes that were done (had to be done), once we arrive at combine_predictions, we're actually not talking about lat and lon necessarily, not when the original CRS of the .nc file was different from WGS84. So we're reprojecting for Presto, but the resulting coordinates violate the typical x and y coordinates that only change in x and y, respectively. When transforming to lat/lon, every single pixel gets a unique lat/lon combo from which we can no longer infer 1D lat and lon arrays to use for reconstruction.

Not sure if I'm making myself clear, but the only way the code works is by using the original x and y coordinates of the DataArray in the reconstruction, which are not necessarily lat and lon. So for now I called them what they are, x_coord and y_coord. Only one test for the combine_predictions was checking if the final level names in the DataFrame correspond to lat and lon, which is now x and y.

Looking forward to what you make of all this ...

gabrieltseng · 2024-09-09T07:44:36Z

Cool this looks good to me - thanks @kvantricht

gabrieltseng

lgtm

kvantricht added 14 commits September 3, 2024 12:03

Updated inference dataset

47034ac

Updated WorldCerealInferenceDataset for new files

d7d541e

Remove unused import

5d6fe88

Fix typing

f0b1c8c

Ignore this type check

ce52f9c

Black fixes

1ac7ea2

More line length fixes

906652e

:facepalm

1a19ec1

🤦

8ee5b47

Added ground truth labels

3f75632

Only subset of the file for faster tests

9155516

Fix gt selection and rearranged t subset

5f6b58d

test update

794065e

More consistent handling of inference datasets

817da45

kvantricht mentioned this pull request Sep 4, 2024

fix: latlon handling in inference dataset #65

Closed

kvantricht requested a review from gabrieltseng September 4, 2024 07:45

kvantricht added 4 commits September 4, 2024 10:38

use h5netcdf instead of rioxarray

666d05f

Dont import at the top to avoid dependency

7f14c8c

Add rioxarray and relax xarray version

70015af

Relax rioxarray library version

eab0569

gabrieltseng reviewed Sep 5, 2024

View reviewed changes

presto/dataset.py Outdated Show resolved Hide resolved

gabrieltseng reviewed Sep 5, 2024

View reviewed changes

kvantricht added 4 commits September 5, 2024 11:00

Test file update

3eea359

Clarified comment

4f9a18f

Black fix

6547114

Only create the grid before feeding to presto

8ac58df

Formatting

fab066c

kvantricht added 3 commits September 5, 2024 19:59

Revert "Only create the grid before feeding to presto"

6079001

This reverts commit 8ac58df.

Merge branch 'updated-inferencedatasets' of github.com:WorldCereal/pr…

2575d39

…esto-worldcereal into updated-inferencedatasets

Clarified variable naming

be7a31b

Black fix

13f8433

gabrieltseng approved these changes Sep 9, 2024

View reviewed changes

kvantricht merged commit 547c908 into main Sep 9, 2024
1 check passed

kvantricht deleted the updated-inferencedatasets branch September 9, 2024 09:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated WorldCerealInferenceDataset #103

Updated WorldCerealInferenceDataset #103

kvantricht commented Sep 3, 2024

kvantricht commented Sep 4, 2024

gabrieltseng Sep 5, 2024

kvantricht Sep 5, 2024 •

edited

Loading

gabrieltseng Sep 5, 2024

kvantricht Sep 5, 2024

kvantricht commented Sep 5, 2024

gabrieltseng commented Sep 5, 2024

gabrieltseng commented Sep 5, 2024

kvantricht commented Sep 5, 2024 •

edited

Loading

gabrieltseng commented Sep 9, 2024

gabrieltseng left a comment

Updated WorldCerealInferenceDataset #103

Updated WorldCerealInferenceDataset #103

Conversation

kvantricht commented Sep 3, 2024

kvantricht commented Sep 4, 2024

gabrieltseng Sep 5, 2024

Choose a reason for hiding this comment

kvantricht Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

gabrieltseng Sep 5, 2024

Choose a reason for hiding this comment

kvantricht Sep 5, 2024

Choose a reason for hiding this comment

kvantricht commented Sep 5, 2024

gabrieltseng commented Sep 5, 2024

gabrieltseng commented Sep 5, 2024

kvantricht commented Sep 5, 2024 • edited Loading

gabrieltseng commented Sep 9, 2024

gabrieltseng left a comment

Choose a reason for hiding this comment

kvantricht Sep 5, 2024 •

edited

Loading

kvantricht commented Sep 5, 2024 •

edited

Loading