Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grids are not fully random #59

Open
juangallegozamorano opened this issue Sep 19, 2024 · 4 comments
Open

Grids are not fully random #59

juangallegozamorano opened this issue Sep 19, 2024 · 4 comments
Assignees

Comments

@juangallegozamorano
Copy link

Hi @chenyangkang,

I'm revising (again) the results of the model and I realised that the initial spatial division is not fully random. If you check carefully the grids in the attached image or the gridding plot in your demo, you can see that there is a pattern. I think the problem is in the rotation, when I load the geofile in a GIS, I can see that every new grid rotates in the same direction (different angle but same direction) while I think it should be completely random as Fink and colleagues do.
Would that be easy to solve? Hope is not much trouble.

image

Best,

Juan

@chenyangkang
Copy link
Owner

chenyangkang commented Sep 19, 2024

Hi @juangallegozamorano,

Glad to hear your questions. Yes, the current setting of rotation angle is not randomly generated, but it is effectively random as the ensemble counts increase.

The current setting is that if you generate 10 ensembles, let's say, then the model evenly divides the 90 degree angle into 10 rotation angles, with 9 degree intervals. This is much more effective and robust than completely random sampling for small number of ensembles in my experiments (e.g., < 20). The S&T product of ebird team used 200 ensembles, so they won't encounter this problem.

This line of code generates the rotation angle intervals.

https://github.com/chenyangkang/stemflow/blob/8fa5e6e2fd1322e7341932caff5baa78b2b8964a/stemflow/utils/quadtree.py#L158C5-L158C50

If you wish a completely random rotation angle sampling as Fink et al. did, it will be easy to implement. But I would like to hear why you think it could be necessary, as it will also determine whether we should set the "completely random angles" as default (I tend not to) or optional.

For the rotation direction, I don't think it matter whether starting from right or left each time, as long as the rotation angles are random enough to cover the 360 degree circle.

Hope this can help.

@juangallegozamorano
Copy link
Author

Hi @chenyangkang,

As always, thanks a lot for the quick and informative response!
The reason that I pointed out is because I observed some artifacts in my maps that are probably the result of the spatial division (see the hard edge that I highlighted in red in the map). So when I saw the non-randomization of the grids I tought that it could be the cause. This map is the result of 30 models (indeed they cover the 360 degree circle), with grid a lower threshold of 200,000m and an upper threshold of 2,000,000m.
Your point is completely fair actually, and you probably tested more things than what I did, but I wonder why is this happening in my maps. Might it be that the thresholds are not yet properly adjusted? I will, anyhow, test with other thresholds and see if I can get rid of those artifacts.
Test

Best,

Juan

@chenyangkang
Copy link
Owner

chenyangkang commented Sep 19, 2024

@juangallegozamorano Thanks for the clarification, Juan!

From my personal experience, that artifact happens often when the ensemble numbers are not robust enough. So if I were you, the first thing I would try is to increase the ensemble count to, e.g., 60. But it also depends on your computational resources. I think as long as you have ensemble fold = 100 or 200, it is less likely a problem.

Another thing is that, you can try to set aggregation = 'median' in your prediction method, which could be more robust. Actually I remember that the ebird team used the median as their final method when they release the S&T product.

Best,
Yangkang

@juangallegozamorano
Copy link
Author

Hi @chenyangkang
Great! I will try to increase the number of models (although my resources are not a lot). The aggreagation = 'median' is implemented within the predict() function right? I will also try that :)
Thanks for the tips!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants