examples of training models in pytorch

Some implementations of Deep Learning algorithms in PyTorch.

Ranking - Learn to Rank

RankNet

Feed forward NN, minimize document pairwise cross entropy loss function

to train the model

python ranking/RankNet.py --lr 0.001 --debug --standardize

--debug print the parameter norm and parameter grad norm. This enable to evaluate whether there is gradient vanishing and gradient exploding problem --standardize makes sure input are scaled to have 0 as mean and 1.0 as standard deviation

NN structure: 136 -> 64 -> 16 -> 1, ReLU6 as activation function

optimizer	lr	epoch	loss (train)	loss (eval)	ndcg@10	ndcg@30	sec/epoch	Factorization	pairs/sec
adam	0.001	25	0.63002	0.635508	0.41785	0.49337	312	loss func	203739
adam	0.001	50	0.62595	0.633082	0.42392	0.49771	312	loss func	203739
adam	0.001	100	0.62282	0.632495	0.42438	0.49817	312	loss func	203739
adam	0.01	25	0.62668	0.631554	0.42658	0.50032	312	loss func	203739
adam	0.01	50	0.62118	0.629217	0.43317	0.50533	312	loss func	203739
adam	0.01	25	0.62349	0.633035	0.42979	0.50108	202	gradient	314687
adam	0.01	50	0.61781	0.630417	0.43397	0.50540	202	gradient	314687

LambdaRank

Feed forward NN. Gradient is proportional to NDCG change of swapping two pairs of document

to choose the optimal learning rate, use smaller dataset:

python ranking/LambdaRank.py --lr 0.01 --ndcg_gain_in_train exp2 --small_dataset --debug --standardize

otherwise, use normal dataset:

OUTPUT_DIR=/tmp/ranking_output/
python ranking/LambdaRank.py --lr 0.01 --ndcg_gain_in_train exp2 --standardize \
--output_dir=$OUTPUT_DIR

to switch identity gain in NDCG in training, use --ndcg_gain_in_train identity

Total pairs per epoch are 63566774 currently each pairs are calculated twice. The following ndcg number are at eval phase and are using exp2 gain

optimizer	lr	epoch	loss (eval)	ndcg@10	ndcg@30	sec/epoch	Gain func	pairs/sec
adam	0.001	25	0.638664	0.42470	0.49858	204	identity	311602
adam	0.001	50	0.637417	0.42910	0.50267	204	identity	311602
adam	0.01	25	0.635290	0.43667	0.50687	204	identity	311602
adam	0.01	50	0.639860	0.43874	0.50896	204	identity	311602
adam	0.01	5	0.645545	0.43627	0.50459	208	exp2	304876
adam	0.01	25	0.646903	0.44155	0.51165	208	exp2	304876
adam	0.01	35	0.644680	0.44454	0.51364	208	exp2	304876

As the result compared with RankNet, LambdaRank's NDCG is generally better than RankNet, but cross entropy loss is higher This is mainly due to LambdaRank maximizing the NDCG, while RankNet minimizing the pairwise cross entropy loss.

visualize with tensorboard

tensorboard --logdir $OUTPUT_DIR --port=6006

if in a remote machine, run the tunnel through

ssh -fN $REMOTE_MACHINE -L 6006:127.0.0.1:6006

Dependencies:

pytorch-1.11
pandas
numpy
sklearn

install from anaconda:

conda create -n pytorch python=3.7

on Mac, use

conda install -c pytorch pytorch==1.11

use nvcc --version to check the cuda version (e.g. 9.0)

conda install pytorch torchvision cudatoolkit=9.0 -c pytorch
conda install -c anaconda pandas scikit-learn tensorboard ipython
conda install -c conda-forge matplotlib

Datasets:

use ranking/download_data.sh to prepare the data and put in the following directory

ranking/data
├── expedia
│   ├── basicPythonBenchmark.zip
│   ├── randomBenchmark.zip
│   ├── test.zip
│   ├── testOrderBenchmark.zip
│   └── train.zip
└── mslr-web10k
    ├── Fold1
    │   ├── test.txt
    │   ├── train.txt
    │   └── vali.txt
    └── MSLR-WEB10K.zip

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
images		images
ranking		ranking
uplift_model		uplift_model
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

examples of training models in pytorch

Ranking - Learn to Rank

RankNet

LambdaRank

visualize with tensorboard

Dependencies:

Datasets:

About

Releases

Packages

Languages

haowei01/pytorch-examples

Folders and files

Latest commit

History

Repository files navigation

examples of training models in pytorch

Ranking - Learn to Rank

RankNet

LambdaRank

visualize with tensorboard

Dependencies:

Datasets:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages