|
1 | 1 | # learning DLsite trend function by rankNet
|
2 | 2 |
|
3 |
| -```log |
4 |
| -$ python src/train.py |
5 |
| -trainTestWithSameRankingPage |
6 |
| -epoch0 trainCurrentRate: 0.65 |
7 |
| -epoch1 trainCurrentRate: 0.69 |
8 |
| -epoch2 trainCurrentRate: 0.70 |
9 |
| -testCurrentRate: 0.71 |
10 |
| -============================= |
11 |
| -trainTestWithOtherRankingPage (original) |
12 |
| -epoch0 trainCurrentRate: 0.64 |
13 |
| -epoch1 trainCurrentRate: 0.69 |
14 |
| -epoch2 trainCurrentRate: 0.71 |
15 |
| -testCurrentRate: 0.59 |
16 |
| -============================= |
17 |
| -trainTestWithOtherRankingPage (modified) |
18 |
| -epoch0 trainCurrentRate: 0.56 |
19 |
| -epoch1 trainCurrentRate: 0.61 |
20 |
| -epoch2 trainCurrentRate: 0.63 |
21 |
| -testCurrentRate: 0.63 |
| 3 | +<a href="https://github.com/avengerandy/rankNet/actions"><img src="https://github.com/avengerandy/rankNet/actions/workflows/tests.yml/badge.svg" alt="tests"></a> |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | +A study about learning DLsite trend function by rankNet and its distribution shift. For more detailed instructions, please see my blog post. |
| 8 | + |
| 9 | +## Table of content |
| 10 | + |
| 11 | +- [Install](#install) |
| 12 | +- [Testing](#testing) |
| 13 | +- [Dataset](#dataset) |
| 14 | +- [Result](#result) |
| 15 | +- [Distribution Shift](#distribution-shift) |
| 16 | +- [License](#license) |
| 17 | + |
| 18 | +## Install |
| 19 | + |
| 20 | +```bash |
| 21 | +pip install -r requirements.txt |
| 22 | +pip install -r pytorchRequirements.txt --index-url https://download.pytorch.org/whl/cu121 |
| 23 | +``` |
| 24 | + |
| 25 | +The pytorchRequirements.txt only install pytorch. Change --index-url to suit your hardware and software (CPU、GPU、cuda) |
| 26 | + |
| 27 | +## Testing |
| 28 | + |
22 | 29 | ```
|
| 30 | +$ python -m unittest |
| 31 | +....... |
| 32 | +---------------------------------------------------------------------- |
| 33 | +Ran 7 tests in 0.027s |
| 34 | +
|
| 35 | +OK |
| 36 | +``` |
| 37 | + |
| 38 | +Run dataset preprocessing unittest (some tests depend on timezone `Asia/Taipei`). |
| 39 | + |
| 40 | +## Dataset |
| 41 | + |
| 42 | +```python |
| 43 | +with open('./dataset/asmrAllItemDict.json', 'r') as infile: |
| 44 | + itemDict = json.load(infile) |
| 45 | +with open('./dataset/asmrAllRankItem.json', 'r') as infile: |
| 46 | + rankItem = json.load(infile) |
| 47 | + |
| 48 | +rankItem, testRankItem = orderedTrainTestSplit(rankItem, 0.1) |
| 49 | +postivePairsDataset, minMaxScaler = getNormalizedDataset(rankItem, itemDict) |
| 50 | +dataloader = DataLoader(postivePairsDataset, batch_size = BATCH_SIZE, shuffle = True) |
| 51 | +``` |
| 52 | + |
| 53 | +* `RankItem.json` dataset save items ranking (by order). |
| 54 | +* `ItemDict.json` dataset save items features. |
| 55 | + |
| 56 | +This repository does not provide the real dataset (I do not own the copyright). But you can get dataset structure in `dataset/toyItemDict.json` and `dataset/toyRankItem.json`. |
| 57 | + |
| 58 | +I grab dataset directly from the DLsite website. |
| 59 | + |
| 60 | + |
| 61 | + |
| 62 | +I write `dataset/getRankItem.js` to help me get `RankItem.json`. |
| 63 | + |
| 64 | + |
| 65 | + |
| 66 | +`ItemDict.json` is from DLsite API. |
| 67 | + |
| 68 | +## Result |
| 69 | + |
| 70 | +### train test with same ranking page |
| 71 | + |
| 72 | + |
| 73 | + |
| 74 | + |
| 75 | + |
| 76 | +### train test with different ranking page |
| 77 | + |
| 78 | + |
| 79 | + |
| 80 | + |
| 81 | + |
| 82 | +## Distribution Shift |
23 | 83 |
|
24 | 84 | ```log
|
25 | 85 | $ python src/hypothesisTesting.py
|
@@ -52,31 +112,28 @@ KruskalResult(statistic=0.9354796779285227, pvalue=0.3334430705576221)
|
52 | 112 | KruskalResult(statistic=54.25046497908814, pvalue=1.7649537781872029e-13)
|
53 | 113 | ```
|
54 | 114 |
|
55 |
| -```log |
56 |
| -$ python src/experiment.py |
| 115 | +Hypothesis Testing result shows training、testing with different ranking page will occer distribution shift. |
| 116 | + |
| 117 | + |
| 118 | + |
| 119 | +``` |
| 120 | +========================== |
57 | 121 | originalCurrentRateRecords
|
58 |
| -originalMean: 0.59 |
59 |
| -originalStd: 0.01 |
| 122 | +originalMean: 0.56 |
| 123 | +originalStd: 0.02 |
60 | 124 | ==========================
|
61 |
| -modifiedCurrentRateRecords |
62 |
| -modifiedMean: 0.62 |
| 125 | +modifiedCurrentRateRecords (L1、L2 regularization and normalization testing data) |
| 126 | +modifiedMean: 0.61 |
63 | 127 | modifiedStd: 0.01
|
64 | 128 | ==========================
|
65 | 129 | hypothesisTesting
|
66 |
| -Ttest_indResult(statistic=-7.422601959112843, pvalue=6.640167006148289e-09) |
67 |
| -KruskalResult(statistic=22.41274160255207, pvalue=2.199102600927154e-06) |
| 130 | +Ttest_indResult(statistic=-11.262083804733697, pvalue=1.1346763672447975e-13) |
| 131 | +KruskalResult(statistic=28.839030684057438, pvalue=7.865007317601521e-08) |
| 132 | +========================== |
68 | 133 | ```
|
69 | 134 |
|
70 |
| -```log |
71 |
| -trainTestWithSameRankingPage |
72 |
| -$ python src/eval.py |
73 |
| -0.9401113752916035 |
| 135 | +Add L1、L2 regularization and normalization testing data to improve distribution shift. |
74 | 136 |
|
75 |
| -trainTestWithOtherRankingPage (original) |
76 |
| -$ python src/eval.py |
77 |
| -0.9051408339908514 |
| 137 | +# License |
78 | 138 |
|
79 |
| -trainTestWithOtherRankingPage (modified) |
80 |
| -$ python src/eval.py |
81 |
| -0.9353777722502167 |
82 |
| -``` |
| 139 | +MIT License |
0 commit comments