Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Random Latent Clique Lifting from Graphs to Simplicial Complexes #63

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

mauriciogtec
Copy link

@mauriciogtec mauriciogtec commented Jul 13, 2024

A Random Latent Clique Lifting from Graphs to Simplicial Complexes

TL;DR We propose a lifting that ensures both 1) small-world property and 2) edge/cell sparsity. Combining these two properties is very attractive for Topological Deep Learning (TDL) because it ensures computational efficiency due to the reduced number of higher-order connections: only a few message-passing layers connect any two nodes.

Background. A graph is sparse if its number of edges grows proportional to the number of nodes. Many real-world graphs are sparse, but they contain many densely connected subgraphs and exhibit high clustering coefficients. Moreover, such real-world graphs frequently exhibit the small-world property, where any two nodes are connected by a short path of length proportional to the logarithm of the number of nodes. For instance, these are well-known properties of social networks, biological networks, and the Internet.

Contributions. In this notebook, we present a novel random lifting procedure from graphs to simplicial complexes. The procedure is based on a relatively recent proposed Bayesian nonparametric random graph model for random clique covers (Williamson & Tec, 2020). Specifically, the model can learn latent clique complexes that are consistent with the input graph. The model can capture power-law degree distribution, global sparsity, and non-vanishing local clustering coefficient. Its small-world property is also guaranteed, which is a very attractive property for Topological Deep Learning (TDL).


In the original work [1], the distribution has been used as a prior on an observed input graph. In particular, in the Bayesian setting, the model is useful to obtain a distribution on latent clique complexes, i.e. a specific class of simplicial complexes, whose 1-skeleton structural properties are consistent with the ones of the input graph used to compute the likelihood. Indeed, one of the features of the posterior distribution from which the latent complex is sampled is that the set of latent 1-simplices (edges) is a superset of the set of edges of the input graph.


In the context of Topological Deep Learning [2][3] and the very recently emerged paradigm of Latent Topology Inference (LTI) [4], it is natural to look at the model in [1] as a novel LTI method able to infer a random latent simplicial complex from an input graph. Or, in other words, to use [1] as a novel random lifting procedure from graphs to simplicial complexes.

Next, we provide a quick introduction to the model in [1]. For a more in-depth exposition, please refer to the paper. To the best of our knowledge, this is the first random lifting procedure relying on Bayesian arguments.

To summarize, this is:

  • a non-deterministic lifting,
  • not present in the literature as a lifting procedure,
  • Based on connectivity,
  • modifying the initial connectivity of the graph by adding edges (thus, this can be also considered as a graph rewiring method).

The Random Clique Cover Model

Let $G=(V,E)$ be a graph with $V$ the set of vertices and $E$ the set of edges. Denote the numer of nodes as $N=|V|$. A clique cover can be described as a matrix $Z$ of size $K \times N$ where $K$ is the number of cliques such that $Z_{k, i} = 1$ if node $i$ is in clique $k$ and $Z_{k, i} = 0$ otherwise. The Random Clique Cover (RCC) Model, defined in [1], is a probabilistic model for the matrix $Z$. This matrix can have an infinite number of rows and columns, but only a finite number of them will be active. The model is based on the Indian Buffet Process (IBP), which is a distribution over binary matrices with a possibly infinite number of rows and columns, or more specifically, the Stable Beta IBP as described in [5]. While the mathematics behind the IBP are complex, the model admits a highly intuitive representation describe below.

First, recall that a clique is a fully connected subset of vertices. Therefore, a clique cover $Z$ induces an adjacency matrix by the formula $A = min(Z^T Z - diag(Z^T Z), 1)$, where $min$ is the element-wise minimum. The IBP model can be described recursively as follows:

Conditional on $Z_1, Z_2, ..., Z_{K-1}$, where $Z_j$ is the $j$-th row of $Z$. Then, $Z_K$ is drawn as follows:

  1. $Z_K$ will contain new unobserved nodes according to a distribution:
    $$Z_K | Z_1, Z_2, ..., Z_{K-1}\sim \text{Poisson}\left(\alpha \frac{\Gamma(1 + c)\Gamma(N + c + \sigma - 1)}{\Gamma(N + \sigma)\Gamma(c + \sigma)}\right)$$
  2. The probability that a previously observed node $n$ will belong to $Z_K$ is proportional to how many cliques it is already in. Specifically, letting $m_i=\Sigma_{k=1}^{K-1} Z_{k, i}$, then
    $$P(Z_{K,i}=1|Z_1, Z_2, ..., Z_{K-1}) = \frac{m_i + \sigma}{K + c - 1}.$$

The last expression is highly intuitive in the sense that the number of cliques that a node will appear in is proportional to the number of cliques it is already in.

The RCC model depends on four parameters $\alpha, c, \sigma, \pi$. The first three parameters are part of the IBP. Explaining them in detail is beyond the scope of this notebook. However, the reader may see [5]. Fortunately, the learned (posterior) values of $\alpha, \sigma, c$ are strongly determined by the data itself. By contrast, $\pi$ is approximately the probability that an edge is missing from the graph. Generally, the lower $\pi$ is, the lower the number of cliques will be and the less interconnected the nodes of the clique will be.

Importantly, by leveraging the possibility of latent inferred edges, one will superimpose the small-world property on the graph.

References


[1] Williamson, Sinead A., and Mauricio Tec. "Random clique covers for graphs with local density and global sparsity." Uncertainty in Artificial Intelligence (UAI). PMLR, 2020.

[2] Papamarkou, Theodore, et al. "Position paper: Challenges and opportunities in topological deep learning." arXiv preprint arXiv:2402.08871 (2024).

[3] Hajij, Mustafa, et al. "Topological deep learning: Going beyond graph data." arXiv preprint arXiv:2206.00606 (2022).

[4] Battiloro, Claudio, et al. "From latent graph to latent topology inference: Differentiable cell complex module." The Twelfth International Conference on Learning Representations (ICLR), 2024.

[5] Teh, Yee Whye, and Dilan Görür. "Indian Buffet Processes with Power-law Behavior." Advances in neural information processing systems. 2009.


Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@mauriciogtec mauriciogtec changed the title Lifting integrate upstream A Random Latent Clique Lifting from Graphs to Simplicial Complexes Jul 13, 2024
@gbg141 gbg141 added challenge-icml-2024 award-category-1 Lifting to Simplicial or Cell Domain award-category-4 Connectivity-based Lifting labels Jul 13, 2024
@gbg141
Copy link
Member

gbg141 commented Jul 13, 2024

Hello @mauriciogtec! Thank you for your submission. As we near the end of the challenge, I am collecting participant info for the purpose of selecting and announcing winners. Please email me (or have one member of your team email me) at guillermo_bernardez@ucsb.edu so I can share access to the voting form. In your email, please include:

Before July 12, make sure that your submission respects all Submission Requirements laid out on the challenge page. Any submission that fails to meet this criteria will be automatically disqualified.

@gbg141 gbg141 added Winner Awarded submission and removed challenge-icml-2024 labels Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
award-category-1 Lifting to Simplicial or Cell Domain award-category-4 Connectivity-based Lifting Winner Awarded submission
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants