-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Fixing the number of clusters #89
Comments
In principle, you can do this by optimising an existing partition, where you initially assign the existing partition a random initial membership, but limited to the desired number of communities. For example, n_comms = 5
partition = la.CPMVertexPartition(G,
initial_membership=np.random.choice(n_comms, 100),
resolution_parameter=0.5) where If you then turn off the option to consider also empty communities for moving nodes, the optimiser only considers existing communities, i.e. it will never go beyond the indicated number of communities opt = la.Optimiser()
opt.consider_empty_community = False
opt.optimise_partition(partition) Of course, it is possible that you will end up having fewer communities than
What was the idea you had in mind? I'm curious to hear about it! |
@vtraag - thanks for your immediate response. I will consider your snippet as one of the possible ways to attack this. In regard to my idea, it goes like this.
Do you see any problem with this approach ? |
This is also an interesting approach. It is doing something else though: it tries to deal with minimum community sizes somehow. This is what is discussed in #53, with the approach that I outlined being very similar to what you propose here. |
@vtraag - I am not sure I understand why this would ensure a minimum community size. If K= 10 and M = 5 and the size of the clusters are [10,9,8,6,1,1,1,1,1,1] then using the above approach - is not possible that the new list of clusters become [16,9,8,6,1] ? So all the single nodes go to the 10 cluster. If this is not possible, can you throw some light as to why that wouldn't happen ? |
Yes, it is certainly possible that this would be an outcome.
How you formulated this is that you were interested in simply keeping the largest All in all, it depends a bit on what you are trying to do. For example, if you are interested in just restricting the number of communities, it might be that a more even distribution of communities would be better than simply aggregating existing communities. For example, it could be that the communities with sizes |
@vtraag - I see the confusion now. It is not a requirement for me that the clusters are evenly distributed. I just need to make sure that there are exactly I coded up the routine I mentioned in my previous messages. This is the trend I observe. The leiden algorithm outputs more than 20 clusters which I iteratively decrease to 8. [136, 41, 86, 121, 54, 50, 48, 42, 28, 28, 14, 14, 8, 7, 5, 5, 4, 2, 2, 2, 2, 2] |
OK, good to see you got something working that does the trick for you! If you post your code here, it might also help some other people who would like to do the same. One final suggestion: in your final result consisting of 8 communities, you can still let the Leiden algorithm run while setting |
@vtraag - thanks for your suggestion. I have incorporated it in my code. Please take a look at the code below and if possible, please provide feedback.
I also have a small question regarding your suggestion. If I run the leiden algorithm on my final output, is it possible that I will get less than |
@vtraag - I have also found the function This is to prevent the below complicated logic.
leidenalg/src/leidenalg/VertexPartition.py Line 256 in c4dcc64
Can you please verify from your end that I can use this to keep the zero sized cluster at the end. |
Overall, the script looks good to me, thanks for sharing!
In principle this not only removes the last empty community, it also optimizes the remaining assignments (without considering an empty community). It is possible that this already gets you less than
Yes, similar to the
Yes, this will remove empty communities. In principle, it would probably be better to indeed remove empty communities using Finally, one other comment: I don't think it's necessary to get the aggregate partition and update the original partition for each cluster that you change. You can simply get the aggregation partition once, then update the original partition using |
@vtraag - thanks for your approval. As of now, performance does not seem to be a concern. |
@vtraag - do you think this would be a useful addition to you code repository ? If you think so, I do not mind creating a pull request for this functionality. |
Sorry for the late response @Alex-Mathai-98 ! In principle, the functionality could be interesting for others. However, I would prefer to have an implementation in the C++ layer that could then be exposed in Python, instead of doing it in Python all together. Moreover, performance will become an issue at some point, so I do prefer to make sure it is sufficiently performant before integrating it in the library. If you are open to doing this, I would be willing to work with you on a PR! If not, I'll try to see if I can find some time in the future to implement such a feature. |
Sorry for bumping up this issue, but I have a related question. I would be interested to use the A possible solution would be to add a new argument + def find_partition(graph, partition_type, initial_membership=None, weights=None, n_iterations=2, max_comm_size=0, empty_community=True, seed=None, **kwargs):
....
optimiser.max_comm_size = max_comm_size
+ optimiser.consider_empty_community = empty_community Would such a change be possible or not ? |
Hi @vtraag - thankyou for this amazing repo. It has been immensely useful.
I have a feature request - can you add an additional argument that fixes the number of communities ? Say, if I want 5 communities (even if that results in partitions with a lower modularity score) - is there a way to fix that ?
While going through your implementation - I have thought of a possible way of doing this (more like a post-processing step with the assumption that leiden will always give more communities than I need).
But wanted to check with you first, before going there.
The text was updated successfully, but these errors were encountered: