Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum community size #53

Open
brilyantconsulting opened this issue Nov 21, 2020 · 8 comments
Open

Minimum community size #53

brilyantconsulting opened this issue Nov 21, 2020 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@brilyantconsulting
Copy link

The maximum cluster size feature is a great feature! Would it be possible to add something for a minimum cluster size constraint? In particular, high resolution clustering often results in singletons, which are not particularly useful for downstream analysis.

@vtraag
Copy link
Owner

vtraag commented Nov 22, 2020

There might be some possibilities in this direction. I will try to see what I can do, but it might take some time.

@vtraag vtraag self-assigned this Nov 22, 2020
@vtraag vtraag added enhancement New feature or request new feature labels Nov 22, 2020
@brilyantconsulting
Copy link
Author

Thanks for looking into this! I can't imagine it is a trivial feature to add.

@vtraag
Copy link
Owner

vtraag commented Nov 22, 2020

Well, we already do have some solutions in another context. The question is how to best adapt it to the more general framework of this package.

@davidfrosch
Copy link

Hello vincent, i just wanted to add, that I was as well looking for that feature! It would be great if it could be added at some point ! Thanks for the great work.

@deklanw
Copy link

deklanw commented Aug 10, 2021

This would also be greatly helpful for me.

As a dirty workaround I was considering just merging singletons greedily into whatever group but that doesn't resolve the duotons and tripletons, etc which can be equally annoying.

Are there any other workarounds possible at this time?

Well, we already do have some solutions in another context.

@vtraag Can you expand on this bit? Is there another algorithm which does have minimum size restrictions?

@vtraag
Copy link
Owner

vtraag commented Oct 15, 2021

@vtraag Can you expand on this bit? Is there another algorithm which does have minimum size restrictions?

In the Java version of the Leiden algorithm, we are using an approach that progressively merges smaller communities. The approach used there may not generalise directly to other quality functions, which are also supported by this Python package. If you want to use the Java version, you can simply supply this argument on the command line when you run it (see the documentation for more information).

For the details of how to do this see the source code. The idea is that you progressively assign small communities to other communities, based on the relative density. The relative density here refers to w_ij / n_i n_j where w_ij is the total number of edges (or weight) between community i and j and where n_i is the total number of nodes in community i. You can then assign any community i that is too small to any other community to which it has the highest relative density. There is actually some argument for doing so in the context of CPM. If you do this starting with the smallest community, you can iteratively assign the smallest cluster that is too small to larger clusters.

@girwin1
Copy link

girwin1 commented Jun 6, 2024

Did this progress?

@Jieran-S
Copy link

Jieran-S commented Dec 3, 2024

Curious to see some update on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants