Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hdbscan, distance matrix #35

Open
kmzapp opened this issue Aug 5, 2019 · 3 comments
Open

hdbscan, distance matrix #35

kmzapp opened this issue Aug 5, 2019 · 3 comments
Assignees

Comments

@kmzapp
Copy link

kmzapp commented Aug 5, 2019

Currently the complete distance matrix is computed in the hdbscan function. Is it possible that parts of it are computed and used sequentially for the mutual reachability distance such that it could be stored in smaller objects? I currently get an error message about too large vector size when using the function on a large dataset.

@mhahsler
Copy link
Owner

mhahsler commented Aug 5, 2019

I think this would be a nice feature to have. I will refer this to Matt.

@peekxc
Copy link
Collaborator

peekxc commented Aug 5, 2019

I would love to have this as well. One could probably precompute the core distances only, and then change the MST code to compute the mutual reachability distances on demand. I can't remember if there was a reason for not doing that in the first place.

But I'm open to suggestions, there's probably a better way. @kmzapp did you have any other ideas on how to actually achieve this algorithmically?

@kmzapp
Copy link
Author

kmzapp commented Aug 6, 2019

Thank you for the quick reply. I was thinking it might be either possible to compute it on demand or to store it somehow differently that it does not create one too large object. But I do not have a precise idea how to achieve that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants