Hyperbolic embedding of complex networks with nodes' features.
- Installation
- Usage
- Publications
Two interfaces of the program are available: a command line executable and a Python module.
Requirements
- A C++17 (or newer) compliant compiler
- The header
unistd.h
.
# Unix (Linux / MAC OS)
chmod +x build.sh
./build.sh -b Release
The structure of the network to be embedded is passed to the program via a file containing its edgelist (one link per line). The edgelist file consists in a simple text file with the following convention
# lines beginning with "#" are ignored (comments).
# note that nodes' name must be separated by at least one white space.
# there may be white space at the beginning of a line.
[name of node1] [name of node2] [remaining information will be ignored]
[name of node2] [name of node3] [remaining information will be ignored]
[name of node4] [name of node5] [remaining information will be ignored]
# comments can be inserted between links
[name of node5] [name of node6] [remaining information will be ignored]
[name of node7] [name of node6] [remaining information will be ignored]
...
Note that the nodes' name will be imported as std::string
and can therefore be virtually anything as long as they do not include white spaces (i.e., there is not need for the nodes to be identified by contiguous integers).
IMPORTANT: this class only considers simple undirected networks without self-loops. Any multiple edges (e.g., if the graph is originally directed) or self-loops will be ignored.
IMPORTANT: in the actual version of the code, the network must have only one component.
The structure of the input nodes' coordinates obtained from UMAP has the form
[name of node1] [X1] [Y1] [Z1]
[name of node2] [X2] [Y2] [Z2]
...
Running mercator
is quite straightforward
# Command line
./mercator <edgelist_filename>
# Python module
mercator.embed(<edgelist_filename>)
The program outputs several files during and after the embedding procedure
*.inf_coord
: Contains information about the embedding procedure, the inferred parameters and the inferred positions.*.inf_log
: Contains detailled information about the embedding procedure.
Several options are provided to adjust the embedding procedure to specific needs and can be combined
- Input nodes' coordiantes file
- Custom output filename
- Custom value for beta
- Custom value for the seed of the random number generator
- Clean output mode
- Fast mode
- Post-processing of the inferred values of the radial positions
- Quiet mode
- Refine mode
- Screen mode
- Validation mode
Specify the path to the file with nodes' coordinates used for initialization
./mercator -u <filename_pos> <edgelist_filename>
All generated files are named <output_rootname>.<extension>
and a custom <output_rootname>
can be provided. If none is provided, the <output_rootname>
is extracted from the <edgelist_filename>
by removing its extension, otherwise the full <edgelist_filename>
is used as <output_rootname>
if <edgelist_filename>
does not have any extension.
# Command line
./mercator -o <custom_output_rootname> <edgelist_filename>
# Python module
mercator.embed(<edgelist_filename>, output_name=<custom_output_rootname>)
A custom value for the parameter beta
can be provided. By default a value for beta is inferred to reproduce the average local clustering coefficient of the original edgelist.
# Command line
./mercator -b <beta_value> <edgelist_filename>
# Python module
mercator.embed(<edgelist_filename>, beta=<beta_value>)
A custom seed for the random number generator can be provided (useful when several embeddings are performed in parallel). By default, EPOCH
is used.
# Command line
./mercator -s <seed_value> <edgelist_filename>
# Python module
mercator.embed(<edgelist_filename>, seed=<seed_value>)
Outputs a file with extension *.inf_coord_raw
containing the columns 2, 3 and 4 of the file with extension *.inf_coord
. Rows follow the same order as in the file with extension *.inf_coord
. The global parameters (i.e., beta, mu, etc.) ate not included in the file. Default is false
.
# Command line
./mercator -c <edgelist_filename>
# Python module
mercator.embed(<edgelist_filename>, clean_mode=True)
Skip the likelihood maximization step (i.e., only infers the positions using the EigenMap methods). Default is false
.
# Command line
./mercator -f <edgelist_filename>
# Python module
mercator.embed(<edgelist_filename>, fast_mode=True)
The inferred radial positions are updated based on the inferred angular positions. When deactivated, nodes with the same degree have the same radial position in the hyperbolic disk. Default is true
.
# Command line
./mercator -k <edgelist_filename>
# Python module
mercator.embed(<edgelist_filename>, post_kappa=False)
The program does not output details about the progress of the embedding procedure (i.e., the file *.inf_log
is not generated). This mode supersedes the verbose mode (i.e., no output on screen). Default is false
.
# Command line
./mercator -q <edgelist_filename>
# Python module
mercator.embed(<edgelist_filename>, quiet_mode=True)
When a file containing the previously inferred coordinates is provided (*.inf_coord
), the program uses the already inferred positions and parameters as a starting point and perform another round of the likelihood maximization step to refine the inferred positions. The use of a different output filename is recommended to keep track of the changes. Default is false
.
# Command line
./mercator -r <inferred_coordinates_filename> <edgelist_filename>
# Python module
mercator.embed(<edgelist_filename>, inf_coord=<inferred_coordinates_filename>)
The program outputs details about the progress of the embedding procedure on screen instead of generating the file *.inf_log
. Default is false
.
# Command line
./mercator -a <edgelist_filename>
# Python module
mercator.embed(<edgelist_filename>, screen_mode=True)
Validates and characterizes the inferred random network ensemble. This is done by generating a large number of networks based on the inferred parameters and positions. The following files are generated
*.inf_inf_pconn
: Compares the inferred probability of connection with the theoretical one based on the inferred parameters.*.inf_theta_density
: Density the angular distribution*.inf_vprop
: Contains the following topological properties for every nodes in the inferred ensemble and in the original network:- degree
- sum of the degree of the neighbors
- average degree of the neighbors
- number of triangles
- local clustering coefficient
*.inf_vstat
: Contains the following statistics of the inferred ensemble.- degree distribution
- spectrum of the average degree of neighbors
- clustering spectrum
*.obs_vstat
: Contains the same statistics as above but for the original network.
Default is false
. A python script plot_validation_of_embedding.py
is provided to visualize the results of the validation mode.
# Command line
./mercator -v <edgelist_filename>
# Python module
mercator.embed(<edgelist_filename>, validation_mode=True)
# To plot the various characterization quantities
python3 scripts/plot_validation_of_embedding.py <rootname of the edgelist (everything before the last ., if any)> <custom rootname (if any, otherwise leave blank)>
Please cite:
-
Feature-aware ultra-low dimensional reduction of real networks
Robert Jankowski Pegah Hozhabrierdi Marián Boguñá and M. Ángeles Serrano
npj Complexity
Full text | arXiv -
The D-Mercator method for the multidimensional hyperbolic embedding of real networks
Robert Jankowski, Antoine Allard, Marián Boguñá and M. Ángeles Serrano
Nature Commmunications 14, 7585 (2023)
Full text | arXiv -
Mercator: uncovering faithful hyperbolic embeddings of complex networks
Guillermo García-Pérez*, Antoine Allard*, M. Ángeles Serrano and Marián Boguñá
New Journal of Physics 21, 123033 (2019)
Full text | arXiv