Some implementations of Deep Learning algorithms in PyTorch.
Feed forward NN, minimize document pairwise cross entropy loss function
to train the model
python ranking/RankNet.py --lr 0.001 --debug --standardize
--debug
print the parameter norm and parameter grad norm. This enable to evaluate whether there is gradient vanishing and gradient exploding problem
--standardize
makes sure input are scaled to have 0 as mean and 1.0 as standard deviation
NN structure: 136 -> 64 -> 16 -> 1, ReLU6 as activation function
optimizer | lr | epoch | loss (train) | loss (eval) | ndcg@10 | ndcg@30 | sec/epoch | Factorization | pairs/sec |
---|---|---|---|---|---|---|---|---|---|
adam | 0.001 | 25 | 0.63002 | 0.635508 | 0.41785 | 0.49337 | 312 | loss func | 203739 |
adam | 0.001 | 50 | 0.62595 | 0.633082 | 0.42392 | 0.49771 | 312 | loss func | 203739 |
adam | 0.001 | 100 | 0.62282 | 0.632495 | 0.42438 | 0.49817 | 312 | loss func | 203739 |
adam | 0.01 | 25 | 0.62668 | 0.631554 | 0.42658 | 0.50032 | 312 | loss func | 203739 |
adam | 0.01 | 50 | 0.62118 | 0.629217 | 0.43317 | 0.50533 | 312 | loss func | 203739 |
adam | 0.01 | 25 | 0.62349 | 0.633035 | 0.42979 | 0.50108 | 202 | gradient | 314687 |
adam | 0.01 | 50 | 0.61781 | 0.630417 | 0.43397 | 0.50540 | 202 | gradient | 314687 |
Feed forward NN. Gradient is proportional to NDCG change of swapping two pairs of document
to choose the optimal learning rate, use smaller dataset:
python ranking/LambdaRank.py --lr 0.01 --ndcg_gain_in_train exp2 --small_dataset --debug --standardize
otherwise, use normal dataset:
OUTPUT_DIR=/tmp/ranking_output/
python ranking/LambdaRank.py --lr 0.01 --ndcg_gain_in_train exp2 --standardize \
--output_dir=$OUTPUT_DIR
to switch identity gain in NDCG in training, use --ndcg_gain_in_train identity
Total pairs per epoch are 63566774 currently each pairs are calculated twice. The following ndcg number are at eval phase and are using exp2 gain
optimizer | lr | epoch | loss (eval) | ndcg@10 | ndcg@30 | sec/epoch | Gain func | pairs/sec |
---|---|---|---|---|---|---|---|---|
adam | 0.001 | 25 | 0.638664 | 0.42470 | 0.49858 | 204 | identity | 311602 |
adam | 0.001 | 50 | 0.637417 | 0.42910 | 0.50267 | 204 | identity | 311602 |
adam | 0.01 | 25 | 0.635290 | 0.43667 | 0.50687 | 204 | identity | 311602 |
adam | 0.01 | 50 | 0.639860 | 0.43874 | 0.50896 | 204 | identity | 311602 |
adam | 0.01 | 5 | 0.645545 | 0.43627 | 0.50459 | 208 | exp2 | 304876 |
adam | 0.01 | 25 | 0.646903 | 0.44155 | 0.51165 | 208 | exp2 | 304876 |
adam | 0.01 | 35 | 0.644680 | 0.44454 | 0.51364 | 208 | exp2 | 304876 |
As the result compared with RankNet, LambdaRank's NDCG is generally better than RankNet, but cross entropy loss is higher This is mainly due to LambdaRank maximizing the NDCG, while RankNet minimizing the pairwise cross entropy loss.
tensorboard --logdir $OUTPUT_DIR --port=6006
if in a remote machine, run the tunnel through
ssh -fN $REMOTE_MACHINE -L 6006:127.0.0.1:6006
- pytorch-1.11
- pandas
- numpy
- sklearn
install from anaconda:
conda create -n pytorch python=3.7
on Mac, use
conda install -c pytorch pytorch==1.11
use nvcc --version to check the cuda version (e.g. 9.0)
conda install pytorch torchvision cudatoolkit=9.0 -c pytorch
conda install -c anaconda pandas scikit-learn tensorboard ipython
conda install -c conda-forge matplotlib
use ranking/download_data.sh
to prepare the data and put in the following directory
ranking/data
├── expedia
│ ├── basicPythonBenchmark.zip
│ ├── randomBenchmark.zip
│ ├── test.zip
│ ├── testOrderBenchmark.zip
│ └── train.zip
└── mslr-web10k
├── Fold1
│ ├── test.txt
│ ├── train.txt
│ └── vali.txt
└── MSLR-WEB10K.zip