Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Multiplex leiden clustering #4828

Open
2 tasks done
Tracked by #3337
niklasmueboe opened this issue Dec 11, 2024 · 5 comments
Open
2 tasks done
Tracked by #3337

[FEA]: Multiplex leiden clustering #4828

niklasmueboe opened this issue Dec 11, 2024 · 5 comments
Labels
feature request New feature or request

Comments

@niklasmueboe
Copy link

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Critical (currently preventing usage)

Please provide a clear description of problem this feature solves

The (most used) Leiden implementation in leidenalg supports multiplex clustering, where you can cluster multiple graphs with the same vertices jointly. In the field of single-cell transcriptomics and spatially resolved transcriptomics this can be used to cluster multi-modality data (as done in muon) or to jointly cluster cells based on their features and spatial neighborhoods (as done in SpatialLeiden).
With the increasing datasets (hundred thousands to millions of cells/vertices), runtime for Leiden clustering on the CPU becomes a limiting factor for exploring various parameter combinations.

Describe your ideal solution

The leiden function should support a list (or similar) of graphs as input. Therefore, also the resolution parameter would need to be extended to support a resolution for each graph (layer). Furthermore, a new parameter that gives a weight to each layer corresponding to its "importance" would be needed.

Describe any alternatives you have considered

No response

Additional context

No response

Code of Conduct

  • I agree to follow cuGraph's Code of Conduct
  • I have searched the open feature requests and have found no duplicates for this feature request
@niklasmueboe niklasmueboe added ? - Needs Triage Need team to review and classify feature request New feature or request labels Dec 11, 2024
@abs51295
Copy link

abs51295 commented Dec 11, 2024

I would also consider adding support for directed weighted graphs since scanpy.tl.leiden uses a directed weighted graph with leidenalg package.(Nevermind since they are moving to igraph). Also, support for fixing the membership labels for a part of the graph is useful when dealing with merging of two different datasets: https://www.nature.com/articles/s41598-020-71805-1.

@ChuckHastings
Copy link
Collaborator

This is something we can explore. Within our current cugraph framework, we could potentially support this as follows:

  • Define edge types for each layer (number the layers from 0 to n)
  • Create a variation of the Leiden algorithm that considers the layers
  • Allow for different resolution values for each layer
  • Allow for certain layers to be ignored (so you don't have to recreate the graph in different scenarios

Does this seem like a reasonable approach? We would need to determine when to address this in our road map.

@niklasmueboe
Copy link
Author

In this approach would it be possible for one layer to have directed and another to have undirected edges?

@ChuckHastings
Copy link
Collaborator

As it doesn't exist yet, we can certainly pursue this idea.

The biggest complication is that we don't have a directed version of Leiden yet, so we would have to update our Louvain/Leiden implementation to support directed graphs.

Regarding some layers being directed and other layers being undirected, is this inherent in the construction of these layers, or a virtual abstraction you choose to lay on top of it? That is, when you create layer X... is the data inherently an undirected graph and you will always treat it as undirected and when you create layer Y the data is inherently a directed graph and you will always treat it as directed; or when you create a layer the edges are directed and you sometimes want to treat those directed edges as undirected and other times you want to treat the edges of the same layer as directed.

The reason I ask is this... in libcugraph we store a directed graph (think of a CSR data structure). If we want to represent an undirected graph then we symmetrize the directed graph when we construct the CSR structure. If a layer you create is inherently directed or undirected and will always be treated that way, then we can easily construct that layer as symmetric or asymmetric when we create the graph and everything will just work. If you need to be able to treat a layer as symmetric (undirected) sometimes and asymmetric (directed) other times, then we need to think about a different solution.

@ChuckHastings ChuckHastings removed the ? - Needs Triage Need team to review and classify label Jan 22, 2025
@niklasmueboe
Copy link
Author

The use cases I would think of the layer would be inherently directed/undirected, but probably other people will come up with use cases where this is not the case.

But given that the Leiden implementation is undirected so far it would probably be easier to first add the multiplex clustering before also adding support for directed graphs. At least that would be the priority for me. The directed graph clustering would be just a nice to have but not really necessary as in most cases i have seen it is just ignored whether the underlying graph is directed or not and just treated as an undirected graph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants