SGtSNEpiPy
is a Python
interface to
SG-t-SNE-П
, a Julia
package. It enables Python users to utilize and take full
advantage of the SG-t-SNE-П
functionalities within the Python
ecosystem. With SGtSNEpiPy, users can swiftly and directly embed and
visualize a sparse graph in a
The input is a sparse graph
SG-t-SNE extends t-SNE
4,5,6,7 from point feature data to graph data,
especially sparse graphs represented by their adjacency matrices in
sparse formats. Here, SNE stands for stochastic near-neighbor
embedding, and SG stands for sparse graph. SG-t-SNE
makes a direct
translation of the node adjacency on the graph to spatial proximity in
the embedding space. In comparison, other graph embedding methods
first embed the vertices by spectral decomposition or learning,
followed by SNE.
By our empirical tests and by user feedback in
SCARF
[3], the SG-t-SNE mapping has higher fidelity and efficiency.
The SG-t-SNE algorithm was first introduced in 2019 in the
paper
[1]. A
software SG-t-SNE-П
in C/C++ was released in 2019
in
[2]
and then made accessible via Julia in 2021
on GitHub
.
This package SGtSNEpiPy
makes SG-t-SNE-Pi deployable to the Python ecosystem.
This Python wrapper seamlessly converts the data formats and
translates all input and output arguments between Julia and Python.
To install SGtSNEpiPy
through Python
from PyPi
, issue
$ pip install SGtSNEpiPy
The installation is straightforward: import SGtSNEpiPy
and issue the command
from SGtSNEpiPy.SGtSNEpiPy import sgtsnepipy
Warning:
The current version of SGtSNEpiPy
requires WSL2 on Windows and Rosetta 2 on Apple ARM hardware.
More details can be found in the full documentation.
SGtSNEpiPy.sgtsnepipy
The calling sequence is simple,
Y = sgtsnepi(A, kwargs**)
where
-
A
: the adjacency matrix of a graph$G(V, E)$ , in the compressive sparse row (CSR) format. MatrixA
has$n=|V|$ rows,$n$ columns and the$m$ nonzero elements represent the edges in$E$ . The graph is directed or undirected, with weighted or unweighted edges. The graph may represent a real-world network or the graph is synthetically generated. In the former case, the vertices may represent data feature vectors/points, and the edges represent the pairwise similarities of the feature vectors.Data type:
scipy.sparse.csr.csr_matrix
-
Y
: then x d
array of vertex coordinates in the$d$ -dimensional embedding space.Data type:
numpy.ndarray
.
-
d
(Integer): positive, the dimension of the embedding space. Default value:2
-
λ
(Integer or Float): positive, SG-t-SNE scaling factor. Default Value:10
-
max_iter
(Integer): the maximum number of iterations for the SNE optimization process. Default Value: 1000 -
early_exag
(Integer): the number of early exaggeration iterations. Default Value:250
-
alpha
(Integer or Float): exaggeration strength for the first early_exag iterations. Default Value:12
-
Y0
: ann x d
numpy
array for the initial embedding configuration in the embedding space. Default setting:None
(the initial configuration is generated randomly). For reproducibility, the user is advised to set and saveY0
. -
eta
(Integer or Float): the learning parameter. Default Value:200.0
-
drop_leaf
(Boolean): ifTrue
, remove leaf nodes. Default Value:False
-
np
(Integer): number of threads (set to 0 to use all available cores). Default Value:threading.active_count()
, which returns the number of active threads in the current process. -
h
(Float): grid step length. Default Value: 1.0 -
list_grid_size
(a list of integers): the list of FFT grid sizes for data interpolation. Default Value:False
. The FFT module tends to be more efficient if the transform size can be factored into small prime numbers. -
profile
(Boolean): ifTrue
, the function returns performance profile in a 3-tuple :(Y, t, g)
, whereY
is the embedding coordinate array,t
is the table of the execution times of each module per iteration (size6 x max_iter
), andg
consists of the grid size, the embedding domain size (maximum(Y) - minimum(Y)
), and the scaling factors_k
for the band-limited version, per dimension (size3 x max_iter
). Default Value:False
-
fftw_single
(Boolean): ifTrue
, use the FFTW (Fast Fourier Transform) in single precision. Default Value:False
This simple example illustrates the use of SGtSNEpiPy
for spatial
embedding and visualization of Zachary's Karate Club network,
which is readily available in NetworkX
. The vertex coordinate array
returned by SGtSNEpiPy
is passed to the plot function
matplotlib.pyplot
for visualization. The vertices represent the club members, they are
colored according to the membership types, either 'Mr. Hi' or
'Officer'. The scatter plot is a spatial portrait of the social
network's structure and patterns.
The 2D embedding code
from SGtSNEpiPy.SGtSNEpiPy import sgtsnepipy
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
# 'G' is the Zachary's Karate Club graph with 'club' attribute for each node
G = nx.karate_club_graph()
G_sparse_matrix = nx.to_scipy_sparse_matrix(G)
y = sgtsnepipy(G_sparse_matrix,d=2)
# Separate the X and Y coordinates from the embedding 'y'
X = y[:, 0]
Y = y[:, 1]
# Get the color for each node based on the 'club' attribute
node_colors = ['red' if G.nodes[node]['club'] == 'Mr. Hi' else 'blue' for node in G.nodes]
# Create a scatter plot to visualize the embedding and color the nodes
plt.scatter(X, Y, c=node_colors, alpha=0.7)
# Label the nodes with their numbers (node names)
for node, (x, y) in enumerate(zip(X, Y)):
plt.text(x, y, str(node))
plt.title("2D SG-t-SNE-П Embedding of Zachary's Karate Club")
plt.xlabel("Dimension 1")
plt.ylabel("Dimension 2")
plt.show()

3D Embedding of Zachary's Karate Club
For the 3D embedding, SGtSNEpiPy
is used the same way as for the 2D embedding.
For the 3D visualization, the graph can be viewed from various viewpoints via rotation.
We created an animated gif
file with
matplotlib.pyplot,
matplotlib.animation,
and
mpl_toolkits.mplot3d.axes3d.Axes3D
in matplotlib
.
In order to save the animation to a gif file, install
Pillow
through Python from PyPi by
issuing the command
$ pip install pillow
The 3D embedding code
from SGtSNEpiPy.SGtSNEpiPy import sgtsnepipy
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import animation
from mpl_toolkits.mplot3d import axes3d
G = nx.karate_club_graph()
G_sparse_matrix = nx.to_scipy_sparse_matrix(G)
y = sgtsnepipy(G_sparse_matrix,d=3)
# Get the color for each node based on the 'club' attribute
node_colors = ['red' if G.nodes[node]['club'] == 'Mr. Hi' else 'blue' for node in G.nodes]
# Separate the X, Y, and Z coordinates from the 3D embedding 'y'
X = y[:, 0]
Y = y[:, 1]
Z = y[:, 2]
# Create the 3D scatter plot to visualize the embedding
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
scatter = ax.scatter(X, Y, Z, c=node_colors, cmap='coolwarm') # One may choose other colormaps
# Label the nodes with their names/indices
for node, (x, y, z) in zip(G.nodes, zip(X, Y, Z)):
ax.text(x, y, z, node)
ax.set_title("3D SG-t-SNE-П Embedding of Zachary's Karate Club")
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')
ax.set_zlabel('Z-axis')
# Function to initialize the animation
def init():
scatter.set_offsets(np.column_stack([X, Y, Z])) # Update the scatter plot data
return scatter,
# Function to update the plot for each frame of the animation
def animate(i):
ax.view_init(elev=30., azim=3.6*i)
return scatter,
# Create the animation
ani = animation.FuncAnimation(fig, animate, init_func=init,
frames=100, interval=100, blit=True)
# Save the animation to a gif file
ani.save('3d_karate_club_animation.gif', writer='pillow')
To see the spatial embedding of larger networks/graphs, visit the website
SG-t-SNE-Pi
Cody (Chenshuhao) Qin: chenshuhao.qin@duke.edu
Aaron (Yihua) Zhong: yihua.zhong@duke.edu
Dr. N. Pitsianis, Dr. D. Floros, and Dr. X. Sun have offered valuable introductions to SG-t-SNE, sparse data formats, and Foreign Language Interfaces.
[1] Nikos Pitsianis, Alexandros-Stavros Iliopoulos, Dimitris Floros, Xiaobai Sun, Spaceland Embedding of Sparse Stochastic Graphs, In IEEE HPEC Conference, (2019).
[2] Nikos Pitsianis, Dimitris Floros, Alexandros-Stavros Iliopoulos, Xiaobai Sun, SG-t-SNE-Π: Swift Neighbor Embedding of Sparse Stochastic Graphs, JOSS, 4(39), 1577, 2019.
[3] Dhapola, P., Rodhe, J., Olofzon, R. et al. Scarf enables a highly memory-efficient analysis of large-scale single-cell genomics data, Nat Commun 13, 4616 (2022).
[4] G. Hinton, S. Roweis, Stochastic Neighbor Embedding, NIPS, 857–864, (2002).
[5] L. van der Maaten, G. Hinton, Visualizing data using t-SNE, JMLR, 9(11) 2579−2605, (2008).
[6] L. van der Maaten, Accelerating t-SNE using tree-based algorithms, JMLR, 15(1) 3221-3245, (2014).
[7] Linderman, G.C., Rachh, M., Hoskins, J.G. et al. Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data, Nat Methods 16, 243–245 (2019)..
If you use this software, please cite the following paper:
author = {Pitsianis, Nikos and Iliopoulos, Alexandros-Stavros and Floros,
Dimitris and Sun, Xiaobai},
doi = {10.1109/HPEC.2019.8916505},
booktitle = {IEEE High Performance Extreme Computing Conference},
month = {11},
title = {Spaceland Embedding of Sparse Stochastic Graphs}},
year = {2019} }