-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
max hypercube sample size #118
Comments
I like this idea and believe making a separate cover class would be a good approach. One thought I had was to adapt something like a Barnes-Hut tree to have an overlap parameter. What do you mean by "not until we make those proper classes"? |
Adding it to mapper with parameters feels cluttery:
Instead, I am thinking of (but haven't fleshed out yet) something like:
and either adding subcover parameters to I really like the Barnes-Hut idea. If I understand correctly the subdivision is automatically driven by the number of samples in the splits. That seems smarter than a simple sub-mapping/evenly spaced fat partitioning. |
Have you seen the newly supported interface for covers?
I'd like to phase out the You're right about the Barnes-Hut. We would probably have to build a custom implementation so we can specify how many samples we want per hypercube. Also, in the example code you posted, are you setting a different number of cubes and overlap for each dimension of the lens? If so, that feature should be supported in the normal cover.
|
Thanks, I totally missed KeplerMapper having support for that! So subtle the implementation ;). So I made a 'km.cover.RecursiveCubicalCover()', and it works (kinda). The problem is the overlap, it quickly becomes an overlap of overlap and edge explosion (entire hypercubes may be inside the overlap between two larger hypercubes). I think a solution may be to create subgraphs. Either attach them to the node that has met max_samples_cluster, or show these seperately as their own graphs. This would require being closer in '.map()' though. |
One problem is a large concentration of samples in a certain projection range. Graphs may look good for non-concentrated samples, but lack structure for the large concentrations. Also it can slow down clustering algorithms if 25.000+ samples are in a single hypercube.
I would like to be able to set a
max_clustering_size
. When this size is met, that hypercube should be subdivided into smaller hypercubes. Preferably, this subdivision can be set in a custom manner.The only way to currently overcome this is to set a finer resolution, or add another dimension to the projection. These settings however drastically alter the output graphs. Another way to somewhat overcome this is to rankorder the projection (creates an even number of samples in each hypercube), but this method also ignores legit clusters by stretching them out.
Why I'd like a custom manner, is because you may use an overlap percentage of 100% or more. Recursing on that makes my brain hurt too much. Though I can think of ways to implement this, I can't think of solid API ways to add custom mapping on hypercubes. At least, not until we make those proper classes, like
kmapper.cover.MultiScaleCubical()
. Thenmax_sample_cluster
andsubcover=kmapper.cover.MultiScaleCubical()
could be parameters forMultiScaleCubical
.The text was updated successfully, but these errors were encountered: