PyTorch implementation of Heterogeneous Graph Neural Networks with attention aggregation for node classification on academic networks.
This repository contains implementations of Heterogeneous Graph Neural Networks with two aggregation strategies:
- Mean Aggregation: Simple averaging across different message types (paper-author and paper-subject connections)
- Attention Aggregation: Learned semantic-level attention weights inspired by HAN (Wang et al., 2019)
Both models are tested on the ACM academic paper dataset, where the task is to classify papers into research areas based on their content and heterogeneous relationships.
Performance on ACM dataset (3 classes, 3025 nodes):
Model | Test Micro-F1 | Test Macro-F1 | Configuration | Training Time |
---|---|---|---|---|
Mean | 84.99% | 84.41% | 2 layers, 64 hidden | < 2 min |
Attention | 86.54% | 86.28% | 2 layers, 64 hidden, 32 attn | < 3 min |
Learned Attention Weights:
- Layer 1: Author connections (48.8%), Subject connections (51.2%)
- Layer 2: Author connections (61.7%), Subject connections (38.3%)
# Clone the repo
git clone https://github.com/yourusername/hetero-gnn.git
cd hetero-gnn
# Create a virtualenv (recommended)
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
Below are example commands to train and evaluate the heterogeneous GNN models.
The ACM academic paper dataset is automatically downloaded on first run. If the download fails, you can manually download acm.pkl
from https://www.dropbox.com/scl/fi/cbe0pkadicrv128mi5odz/acm.pkl?rlkey=ppxff7rystkuafc7y9ae0qax1&dl=0 and place it in the data/
directory.
Dataset details:
- 3,025 academic papers with 1,870-dimensional features
- Two relationship types: paper-author and paper-subject connections
- 3 research area classes for classification
python src/main.py --config configs/mean_aggregation.yaml
python src/main.py --config configs/attention_aggregation.yaml
This implementation draws inspiration from several key papers in heterogeneous graph learning:
- GraphSAGE (Hamilton et al., 2017): Foundation for inductive graph learning
- HAN (Wang et al., 2019): Hierarchical attention networks for heterogeneous graphs
Parts of this code were adapted from the Stanford CS224W "Machine Learning with Graphs" course materials:
Leskovec, Jure (Instructor). CS224W: Machine Learning with Graphs. Stanford University.
URL: https://web.stanford.edu/class/cs224w/