This repository contains all the code and instructions necessary to reproduce the experimental results from our paper Scaling Up Structural Clustering to Large Probabilistic Graphs Using Lyapunov Central Limit Theorem.
There are four directories each with their our purpose:
prep_graphs/
contains all the python code used to format datasets for the clustering algorithms NUSCAN and USCAN.uscan/
holds the C++ implementation of USCAN as coded by the authors of Qiu et. al., with few additions needed for our analysis.nuscan/
holds the modified USCAN code that includes the NUSCAN algorithm.analysis/
has the scripts used to analyze the clusters and probability calculations made by both algorithms.
In each of these directory there are more specific instructions for using the code inside.
In general to execute the analysis that was done in our paper the following sets must take place:
- Format graph - NUSCAN and USCAN both operate on undirected probabilistic graphs. See
prep_graphs/
for more information on the formatting requirements. - Run the graph through both clustering algorithms with the output option to generate the text files with the probabilities
$P[e, \varepsilon]$ for each$e$ and another text file with the cluster sets, hubs, and outliers. Seeuscan/
andnuscan/
for more details. - Analyze results - there are some scripts that compute cluster quality, compare
$P[e, \varepsilon]$ between both methods, compare cluster, hub, outlier sets. Seeanalysis/
for more direction.
Both NUSCAN and USCAN have the option to output two text files one called <graphfile>-eta-eps-mu-thres.cluster_nuscan
and <graphfile>-eta-eps-mu-thres.prob_nuscan
(for uscan, the thres is not present and the suffix is "_uscan").
The two files are required to run the code in analysis/
, as the code assumes the formatting produced by NUSCAN and USCAN. See uscan/
and nuscan/
for more information on the files produced, and see analysis/
for more information on the analyzes preformed on the files.