Skip to content

Commit 523d323

Browse files
committed
write README.md
1 parent 1fd9a5b commit 523d323

15 files changed

+95
-38
lines changed

README.md

Lines changed: 95 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,85 @@
11
# learning DLsite trend function by rankNet
22

3-
```log
4-
$ python src/train.py
5-
trainTestWithSameRankingPage
6-
epoch0 trainCurrentRate: 0.65
7-
epoch1 trainCurrentRate: 0.69
8-
epoch2 trainCurrentRate: 0.70
9-
testCurrentRate: 0.71
10-
=============================
11-
trainTestWithOtherRankingPage (original)
12-
epoch0 trainCurrentRate: 0.64
13-
epoch1 trainCurrentRate: 0.69
14-
epoch2 trainCurrentRate: 0.71
15-
testCurrentRate: 0.59
16-
=============================
17-
trainTestWithOtherRankingPage (modified)
18-
epoch0 trainCurrentRate: 0.56
19-
epoch1 trainCurrentRate: 0.61
20-
epoch2 trainCurrentRate: 0.63
21-
testCurrentRate: 0.63
3+
<a href="https://github.com/avengerandy/rankNet/actions"><img src="https://github.com/avengerandy/rankNet/actions/workflows/tests.yml/badge.svg" alt="tests"></a>
4+
5+
![05_rankNetArch](https://raw.githubusercontent.com/avengerandy/rankNet/master/img/05_rankNetArch.png")
6+
7+
A study about learning DLsite trend function by rankNet and its distribution shift. For more detailed instructions, please see my blog post.
8+
9+
## Table of content
10+
11+
- [Install](#install)
12+
- [Testing](#testing)
13+
- [Dataset](#dataset)
14+
- [Result](#result)
15+
- [Distribution Shift](#distribution-shift)
16+
- [License](#license)
17+
18+
## Install
19+
20+
```bash
21+
pip install -r requirements.txt
22+
pip install -r pytorchRequirements.txt --index-url https://download.pytorch.org/whl/cu121
23+
```
24+
25+
The pytorchRequirements.txt only install pytorch. Change --index-url to suit your hardware and software (CPU、GPU、cuda)
26+
27+
## Testing
28+
2229
```
30+
$ python -m unittest
31+
.......
32+
----------------------------------------------------------------------
33+
Ran 7 tests in 0.027s
34+
35+
OK
36+
```
37+
38+
Run dataset preprocessing unittest (some tests depend on timezone `Asia/Taipei`).
39+
40+
## Dataset
41+
42+
```python
43+
with open('./dataset/asmrAllItemDict.json', 'r') as infile:
44+
itemDict = json.load(infile)
45+
with open('./dataset/asmrAllRankItem.json', 'r') as infile:
46+
rankItem = json.load(infile)
47+
48+
rankItem, testRankItem = orderedTrainTestSplit(rankItem, 0.1)
49+
postivePairsDataset, minMaxScaler = getNormalizedDataset(rankItem, itemDict)
50+
dataloader = DataLoader(postivePairsDataset, batch_size = BATCH_SIZE, shuffle = True)
51+
```
52+
53+
* `RankItem.json` dataset save items ranking (by order).
54+
* `ItemDict.json` dataset save items features.
55+
56+
This repository does not provide the real dataset (I do not own the copyright). But you can get dataset structure in `dataset/toyItemDict.json` and `dataset/toyRankItem.json`.
57+
58+
I grab dataset directly from the DLsite website.
59+
60+
![10_datasetRank](https://raw.githubusercontent.com/avengerandy/rankNet/master/img/10_datasetRank.png")
61+
62+
I write `dataset/getRankItem.js` to help me get `RankItem.json`.
63+
64+
![11_datasetFeature](https://raw.githubusercontent.com/avengerandy/rankNet/master/img/11_datasetFeature.png")
65+
66+
`ItemDict.json` is from DLsite API.
67+
68+
## Result
69+
70+
### train test with same ranking page
71+
72+
![12_orderedTrainTestSplit](https://raw.githubusercontent.com/avengerandy/rankNet/master/img/12_orderedTrainTestSplit.png")
73+
74+
![13_samePageResult](https://raw.githubusercontent.com/avengerandy/rankNet/master/img/13_samePageResult.png")
75+
76+
### train test with different ranking page
77+
78+
![14_otherPageTrainTest](https://raw.githubusercontent.com/avengerandy/rankNet/master/img/14_otherPageTrainTest.png")
79+
80+
![15_otherPageResult](https://raw.githubusercontent.com/avengerandy/rankNet/master/img/15_otherPageResult.png")
81+
82+
## Distribution Shift
2383

2484
```log
2585
$ python src/hypothesisTesting.py
@@ -52,31 +112,28 @@ KruskalResult(statistic=0.9354796779285227, pvalue=0.3334430705576221)
52112
KruskalResult(statistic=54.25046497908814, pvalue=1.7649537781872029e-13)
53113
```
54114

55-
```log
56-
$ python src/experiment.py
115+
Hypothesis Testing result shows training、testing with different ranking page will occer distribution shift.
116+
117+
![23_otherPageResultL1L2Nor](https://raw.githubusercontent.com/avengerandy/rankNet/master/img/23_otherPageResultL1L2Nor.png")
118+
119+
```
120+
==========================
57121
originalCurrentRateRecords
58-
originalMean: 0.59
59-
originalStd: 0.01
122+
originalMean: 0.56
123+
originalStd: 0.02
60124
==========================
61-
modifiedCurrentRateRecords
62-
modifiedMean: 0.62
125+
modifiedCurrentRateRecords (L1、L2 regularization and normalization testing data)
126+
modifiedMean: 0.61
63127
modifiedStd: 0.01
64128
==========================
65129
hypothesisTesting
66-
Ttest_indResult(statistic=-7.422601959112843, pvalue=6.640167006148289e-09)
67-
KruskalResult(statistic=22.41274160255207, pvalue=2.199102600927154e-06)
130+
Ttest_indResult(statistic=-11.262083804733697, pvalue=1.1346763672447975e-13)
131+
KruskalResult(statistic=28.839030684057438, pvalue=7.865007317601521e-08)
132+
==========================
68133
```
69134

70-
```log
71-
trainTestWithSameRankingPage
72-
$ python src/eval.py
73-
0.9401113752916035
135+
Add L1、L2 regularization and normalization testing data to improve distribution shift.
74136

75-
trainTestWithOtherRankingPage (original)
76-
$ python src/eval.py
77-
0.9051408339908514
137+
# License
78138

79-
trainTestWithOtherRankingPage (modified)
80-
$ python src/eval.py
81-
0.9353777722502167
82-
```
139+
MIT License

img/0.90.png

-2.8 MB
Binary file not shown.

img/0.93.png

-2.86 MB
Binary file not shown.

img/0.94.png

-1.15 MB
Binary file not shown.

img/05_rankNetArch.png

567 KB
Loading

img/10_datasetRank.png

753 KB
Loading

img/11_datasetFeature.png

1.04 MB
Loading

img/12_orderedTrainTestSplit.png

2.18 MB
Loading

img/13_samePageResult.png

1.45 MB
Loading

img/14_otherPageTrainTest.png

2.2 MB
Loading

img/15_otherPageResult.png

2.38 MB
Loading

img/23_otherPageResultL1L2Nor.png

1.99 MB
Loading

img/itemDict.png

-402 KB
Binary file not shown.

img/rankItem.png

-732 KB
Binary file not shown.

img/trainTestList.png

-2.66 MB
Binary file not shown.

0 commit comments

Comments
 (0)