Skip to content

Latest commit

 

History

History
120 lines (93 loc) · 5.76 KB

README.md

File metadata and controls

120 lines (93 loc) · 5.76 KB

GNN Visualization

This is code implementation for semester project Understanding and Visualizing Graph Neural Networks .

Usage

All commands should be executed within the src/run subfolder. The relevant configuration files for experiments are in src/configs.

1) Fixed Point

  • training GCN with joint loss function
python fixpoint.py --dataset cora --fixpoint_loss --exp_times 10

where cora is the dataset name and may be changed to pubmed or citeseer. If --fixpoint_loss is set True, then GCN is trained with proposed joint loss function, otherwise it's trained with normal entropy loss for classification. --exp_times represent the repeating times of the experiments, the result shown in final report is the average of 10 experiments.

  • to visualize the accuracy on 3 citation datasets, apply the above command for each dataset respectively and then head over to notebooks/fixedpoint_visualization.ipynb. Visualization of test accuracy taken from final report:
Dataset GCN SSE GCN with joint loss function
Cora 81.5 79.0 70.3
PubMed 81.2 79.7 69.0
CiteSeer 79.4 75.8 72.5

2) Identifiability

  • executing the experiment to check the node embedding identifiability
python identifiability.py --dataset cora --knn 1 --repeat_times 5 --max_gcn_layers 10

where --dataset is used to determine the dataset in the experiment and can be chosen from cora,pubmed and citeseer. --knn is used to set the k-nearest-neighbourhood search after recovering the input node features. --repeat_times represent how many times the experiment will be repeated. --max_gcn_layers determine the maximal layers of GCN model used in the experiment.

  • Results are visualized in the script notebooks/identifiability_visualization.ipynb. Example visualization results of cora dataset are shown below:

3) GNN-N

  • to compute , we need to execute experiments of 100-layer GCN
python gnn_n_100layerGCN.py --dataset cora --exp_times 10 --num_random_features 10

The parameter --dataset can be chosen from 7 node classification datasets, namely cora, pubmed, citeseer, amazon_photo, amazon_computers, coauthors_cs and coauthors_physics. You can train 100-layer GCN several times and this is decided by --exp_times, while for each training trail the trained model is tested with 10 different random features that can be changed by --num_random_features.

  • to compute , we need to execute experiments of 3-layer MLP
python gnn_n_3layerMLP.py --dataset cora --exp_times 10

Similar as experiments of 100-layer GCN, --dataset can be chosen from 7 node classification datasets and --exp_times determines how many times the experiment process will be repeated.

  • computing GNN-N values
python gnn_n.py --dataset cora --mlp_exp_times 10 --gcn_exp_times 10 --gcn_num_random_features 10

In this step, experimental possibilities and are computed, and then GNN-N value is derived. --mlp_exp_times must be set the same as --exp_times used in the 3-layer MLP experiment. --gcn_exp_times and --gcn_num_random_features on the other hand must be set the same as --exp_times and --num_random_features used in experiments of 100-layer GCN respectively.

  • After executing experiments and computing GNN-N values for all 7 datasets, you can visualize the results using notebooks/gnn_n_3layerMLP_visualization.ipynb. Visualization of GNN-N values for the 7 node classification datasets are listed in the following:
  • The script notebooks/gnn_n_3layerMLP_visualization.ipynb is used to visualize results of 3-layer MLP experiments, for example test accuracy and repeating rates of 3-layer MLP:
  • Results of 100-layer GCN experiments can be visualized in notebooks/gnn_n_100layerGCN_visualization.ipynb. The following image shows the visualization results of accuracy, R-RR and RO-RR.

Meanwhile, this script also includes the visualization results of TT-RR with heatmaps. The following image shows TT-RR of cora dataset and is taken from appendix of final report.