Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for undersampling of certain geographic regions due to cloud cover filters #93

Closed
weiji14 opened this issue Dec 19, 2023 · 1 comment
Assignees
Labels
bug Something isn't working data-pipeline Pull Requests about the data pipeline
Milestone

Comments

@weiji14
Copy link
Contributor

weiji14 commented Dec 19, 2023

In #28 and #80, we've developed a geographic sampling scheme based on WorldCover that is supposed to sample a diverse set of regions based on landcover types.

However, in #60/#68, we've applied a NoData filter that removes some of those sampled regions. We'll need to double check if those filters are undersampling certain geographic regions that have high cloud cover, or areas where high surface reflectance can lead to false positive cloud cover values (e.g. over polar regions).

For example, we should have 40+ MGRS tiles over Greenland with the sampling procedure from #81:

image

But I looked at the s3 bucket, searching over 20X-26X, 21W-26W, 22V-24V, and couldn't find a single tile, even in the coastal areas that are not pure white!

So we'll need to check if there are certain data gaps, and potentially increase the cloud cover threshold or something.

@weiji14 weiji14 added bug Something isn't working data-pipeline Pull Requests about the data pipeline labels Dec 19, 2023
@weiji14 weiji14 changed the title Check for undersampling of certain geographic regions due to cloud filters Check for undersampling of certain geographic regions due to cloud cover filters Dec 19, 2023
@weiji14 weiji14 added this to the v1 Release milestone Dec 19, 2023
@yellowcap
Copy link
Member

We addressed this by using the least cloudy image for each season in an area. This increases the chances of getting enough imagery in each region. But the bias might still be there, since in very cloudy areas even that approach might lower the number of samples.

We can revisit in a later stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data-pipeline Pull Requests about the data pipeline
Projects
None yet
Development

No branches or pull requests

2 participants