Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

createWeights.R (for Frescalo) contains arguably inappropriate hard-coded distance function #240

Open
sacrevert opened this issue Jan 13, 2022 · 5 comments

Comments

@sacrevert
Copy link

Just noting that this function to create neighbourhood land cover-based weights uses dist() with default options. This is a Euclidean distance measure that is potentially inappropriate for very sparse matrices (because lots of shared zeros between items can have a strong influence on the distance measures -- a similar issue that often comes up in community ecology). A warning, and the option to use something like the cosine similarity measure, would be desirable. Efficient code for the latter is the second answer here: https://stats.stackexchange.com/questions/31565/compute-a-cosine-dissimilarity-matrix-in-r

@AugustT
Copy link
Member

AugustT commented Jan 13, 2022

Thanks @sacrevert not something I had considered

@sacrevert
Copy link
Author

Just noting here that I have created a quick tool for visualising Frescalo weights (https://github.com/sacrevert/visualiseFresNeighbours) and sets of new weights at various geographic scales (https://github.com/sacrevert/frescaloNeighbourhoods). It would be interesting to compare the existing approach in Sparta to these new sets that use newer land cover information, additional geological information, and the cosine similarity measure (rather than the Euclidean approach currently encoded in sparta)

@sacrevert
Copy link
Author

sacrevert commented Feb 22, 2022

Quick comparison here between the sparta approach and what I did. Doesn't actually make a great deal of difference in this case (although some neighbourhoods show differences, this is probably negligible for trend estimates, even if they are slightly more coherent ecologically); still, might be wise to give the user an option, or warning, with regards to the dissimilarity measure, as it could have bigger effects in other cases. See https://github.com/sacrevert/frescaloNeighbourhoods/blob/main/spartaWeightsComparison.pdf

@AugustT
Copy link
Member

AugustT commented Feb 22, 2022

@sacrevert thank you for doing the comparison and taking the time to put together the PDF. Realistically, given other priorities, I don't see any changes being made to sparta's frescalo functionality in the near future. I'd be happy to review and pull in and changes that you want to make, but realise you may well not have the time either.

@sacrevert
Copy link
Author

No worries. No, I probably won't have time either : )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants