Holographic Reduced Representations (HRR) are a form vector representation that can be decomposed into their constituent factors, while keeping the size of the vector representation constant. This repository contains the implementation of the paper Learning with Holographic Reduced Representations (HRR) published @ NeurIPS 2021.
Following is an example of how HRR can be utilized. The example consists a dog
represented as a composition of its various body parts where each attribute can be represented with its own vector representation. HRR allows us to create a single representation of the concept of a dog
, making it possible to query the representation to answer questions such as Does the dog have legs?
.
Example of a Concept Hierarchy | HRR Representation of Concept Hierarchy |
---|---|
def binding(x, y):
return torch.real(ifft(torch.multiply(fft(x), fft(y))))
def approx_transpose(x):
x = torch.flip(x, dims=[-1])
return torch.roll(x, 1, dims=-1)
unbind = lambda x, y: binding(s, approx_transpose(y))
def projection(x):
f = torch.abs(fft(x)) + 1e-5
p = torch.real(ifft(fft(x) / f))
return torch.nan_to_num(p) #defensive
This repository contains the implementation of Holographic Reduced Representations (HRR) for eXtreme Multi-Label classification (XML). We implement / use different network architectures in order to showcase how HRRs can used to train and performance inference while reducing the size of the output layer of the network.
We experiment with 3
different architectures:
- A standard 2-layer feedforward network (FFN).
- XML-CNN: A convolutional neural network architecture.
- Attention-XML: A convolutional neural network architecture with attention.
├── attention-xml/ # HRR implementation with AttentionXML.
├── data/<dataset name> # Location where dataset is stored. Example: data/Bibtex
├── analysis/ # Ipython notebooks with analysis of randomly generated label vectors.
├── lib/ # Library functions and implementation of HRR.
├── xml-cnn/ # HRR implementation with XML-CNN.
├── LICENSE
└── README.md
Consider the example dataset: Wiki10-31K
For training the model with HRR:
python run_classifier.py --data-file None --tr-split data/Wiki10/train.txt --te-split data/Wiki10/test.txt \
--save data/model+results/Wiki10/ --name neurips-2021-results/ --batch-size 64 \
--test-batch-size 64 --th 0.3 --debug
For training the baseline model:
python run_classifier.py --data-file None --tr-split data/Wiki10/train.txt --te-split data/Wiki10/test.txt \
--save data/model+results/Wiki10/ --name neurips-2021-results/ --batch-size 64 \
--test-batch-size 64 --th 0.3 --debug --baseline
The default run_classifier.py both updates the p & n vectors and has a loss with negative labels (i.e. labels not present for a given sample text). In order to train the model without gradient update to p & n vectors, add the --no-grad
option. To train the model with positive labels only, add the --without-negative
option to the above command. The default dimension size is 400
. Change the label dimension size with --spn-dim
.
To compare the HRR-FFN (FFN trained with HRR) against a standard multi-label FFN, the --baseline
option can used.
Following are the commands to replicate experiments (example Bibtex
dataset) using a feedforward network:
- Training HRR-based model with label gradient and negation.
python run_classifier.py --data-file None --tr-split data/Wiki10/train.txt --te-split data/Wiki10/test.txt \
--save data/model+results/Wiki10/ --name neurips-2021-results/ --batch-size 64 \
--test-batch-size 64 --th 0.3 --debug --spn-dim 750
- Training HRR-based model with label gradient and WITHOUT negation.
python run_classifier.py --data-file None --tr-split data/Wiki10/train.txt --te-split data/Wiki10/test.txt \
--save data/model+results/Wiki10/ --name neurips-2021-results/ --batch-size 64 \
--test-batch-size 64 --th 0.3 --debug --spn-dim 750 --without-negative
- Training HRR-based model WITHOUT label gradient and WITHOUT negation.
python run_classifier.py --data-file None --tr-split data/Wiki10/train.txt --te-split data/Wiki10/test.txt \
--save data/model+results/Wiki10/ --name neurips-2021-results/ --batch-size 64 \
--test-batch-size 64 --th 0.3 --debug --spn-dim 750 --without-negative --no-grad
To train the model using SLURM (i.e., in case there is a cluster available), use scripts experiments.sh
(the slurm script is train.slurm.sh
).
./experiments.sh Bibtex grad-and-neg all 400 0.3 grad with-negative
./experiments.sh Bibtex grad-and-no-neg all 400 0.3 grad without-negative
For all combinations of training:
./experiments.sh Bibtex no-grad-and-no-neg all 400 0.3 no-grad without-negative
./experiments.sh Bibtex grad-and-no-neg all 400 0.3 grad without-negative
./experiments.sh Bibtex no-grad-and-neg all 400 0.3 no-grad with-negative
./experiments.sh Bibtex grad-and-neg all 400 0.3 grad with-negative
./experiments.sh Bibtex <dim_size>-grad-and-neg all <dim size> 0.3 grad with-negative
./experiments.sh Bibtex <dim_size>-no-grad-and-neg all <dim size> 0.3 no-grad with-negative
usage: run_classifier.py [-h] [--name NAME] [--batch-size N] [--test-batch-size N] [--epochs N] [--lr LR] [--th TH] [--gamma M] [--no-cuda] [--seed S] [--topk S] [-a A] [-b B]
[--log-interval N] [--save SAVE] [--data-file DATA_FILE] [--tr-split TR_SPLIT] [--te-split TE_SPLIT] [--test] [--reduce-dims] [--debug] [--baseline]
[--spn-dim S] [--no-grad] [--without-negative] [--load-vec LOAD_VEC]
Extreme Multi-label Classification.
optional arguments:
-h, --help show this help message and exit
--name NAME A unique experiment name.
--batch-size N input batch size for training (default: 64)
--test-batch-size N input batch size for testing (default: 250)
--epochs N number of epochs to train (default: 100)
--lr LR learning rate (default: 1.0)
--th TH Theshold for label inference.
--gamma M Learning rate step gamma (default: 0.7)
--no-cuda disables CUDA training
--seed S random seed (default: 100)
--topk S Retreive top k labels (Default: 5).
-a A Inverse propensity value A (Default: 0.55).
-b B Inverse propensity value A (Default: 1.5).
--log-interval N how many batches to wait before logging training status
--save SAVE Directory to save model and results.
--data-file DATA_FILE
Location of the data CSV file.
--tr-split TR_SPLIT Get the training split for dataset.
--te-split TE_SPLIT Get the test split for dataset.
--test Perform tests on pretrained model.
--reduce-dims Reduce dimensions of the features.
--debug Print debug statements for verification.
--baseline Use Baseline Network.
--spn-dim S Label vector dimensions (Default: 400)
--no-grad Update Label vectors.
--without-negative disable negative loss.
--load-vec LOAD_VEC Location of pretrained vectors.
Bibtex:
@inproceedings{ganesan2021learning,
author = {Ganesan, Ashwinkumar and Gao, Hang and Gandhi, Sunil and Raff, Edward and Oates, Tim and Holt, James and McLean, Mark},
booktitle = {Advances in Neural Information Processing Systems},
eprint = {2109.02157},
title = {{Learning with Holographic Reduced Representations}},
url = {http://arxiv.org/abs/2109.02157},
year = {2021}
}