learning DLsite trend function by rankNet

A study about learning DLsite trend function by rankNet and its distribution shift. For more detailed instructions, please see my blog post.

Table of content

Install
Testing
Dataset
Result
Distribution Shift
License

Install

pip install -r requirements.txt
pip install -r pytorchRequirements.txt --index-url https://download.pytorch.org/whl/cu121

The pytorchRequirements.txt only install pytorch. Change --index-url to suit your hardware and software (CPU、GPU、cuda)

Testing

$ python -m unittest
.......
----------------------------------------------------------------------
Ran 7 tests in 0.027s

OK

Run dataset preprocessing unittest (some tests depend on timezone Asia/Taipei).

Dataset

with open('./dataset/asmrAllItemDict.json', 'r') as infile:
    itemDict = json.load(infile)
with open('./dataset/asmrAllRankItem.json', 'r') as infile:
    rankItem = json.load(infile)

rankItem, testRankItem = orderedTrainTestSplit(rankItem, 0.1)
postivePairsDataset, minMaxScaler = getNormalizedDataset(rankItem, itemDict)
dataloader = DataLoader(postivePairsDataset, batch_size = BATCH_SIZE, shuffle = True)

RankItem.json dataset save items ranking (by order).
ItemDict.json dataset save items features.

This repository does not provide the real dataset (I do not own the copyright). But you can get dataset structure in dataset/toyItemDict.json and dataset/toyRankItem.json.

I grab dataset directly from the DLsite website.

I write dataset/getRankItem.js to help me get RankItem.json.

ItemDict.json is from DLsite API.

Result

train test with same ranking page

train test with different ranking page

Distribution Shift

$ python src/hypothesisTesting.py
sameRankingPage:
KruskalResult(statistic=1.0812276101266591, pvalue=0.2984230935024443)
KruskalResult(statistic=1.2293546877250272, pvalue=0.2675325726507133)
KruskalResult(statistic=4.739247376673478, pvalue=0.02948196620438234)
KruskalResult(statistic=0.2498109742924989, pvalue=0.6172082077070977)
KruskalResult(statistic=0.1342936115544768, pvalue=0.7140211629799282)
KruskalResult(statistic=3.7993428755934544, pvalue=0.051272701398055905)
KruskalResult(statistic=2.3208088635951003, pvalue=0.12765363132515045)
KruskalResult(statistic=0.20121346233637416, pvalue=0.6537431519333335)
KruskalResult(statistic=0.16173364059725473, pvalue=0.6875653860322029)
KruskalResult(statistic=0.00029763515339625575, pvalue=0.9862354939706918)
KruskalResult(statistic=2.6357363673068495, pvalue=0.10448360817354285)
KruskalResult(statistic=5.334489670983912, pvalue=0.020907460507742472)
================================================================
otherRankingPage:
KruskalResult(statistic=5.707629175795012, pvalue=0.016891336683534482)
KruskalResult(statistic=5.662375604642771, pvalue=0.017332631063601354)
KruskalResult(statistic=112.85008127691606, pvalue=2.3272265345661586e-26)
KruskalResult(statistic=0.1183531380745446, pvalue=0.7308275534196667)
KruskalResult(statistic=25.143537615271654, pvalue=5.321769076621548e-07)
KruskalResult(statistic=17.96143852177434, pvalue=2.2542565418223413e-05)
KruskalResult(statistic=10.723489544548972, pvalue=0.0010578398572342951)
KruskalResult(statistic=7.095480025151965, pvalue=0.007727859157872331)
KruskalResult(statistic=1.2862035283540991, pvalue=0.25674877258145284)
KruskalResult(statistic=5.224849009453471, pvalue=0.022266379457297883)
KruskalResult(statistic=0.9354796779285227, pvalue=0.3334430705576221)
KruskalResult(statistic=54.25046497908814, pvalue=1.7649537781872029e-13)

Hypothesis Testing result shows training、testing with different ranking page will occer distribution shift.

==========================
originalCurrentRateRecords
originalMean: 0.56
originalStd: 0.02
==========================
modifiedCurrentRateRecords (L1、L2 regularization and normalization testing data)
modifiedMean: 0.61
modifiedStd: 0.01
==========================
hypothesisTesting
Ttest_indResult(statistic=-11.262083804733697, pvalue=1.1346763672447975e-13)
KruskalResult(statistic=28.839030684057438, pvalue=7.865007317601521e-08)
==========================

Add L1、L2 regularization and normalization testing data to improve distribution shift.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
dataset		dataset
img		img
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pytorchRequirements.txt		pytorchRequirements.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

learning DLsite trend function by rankNet

Table of content

Install

Testing

Dataset

Result

train test with same ranking page

train test with different ranking page

Distribution Shift

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

avengerandy/rankNet

Folders and files

Latest commit

History

Repository files navigation

learning DLsite trend function by rankNet

Table of content

Install

Testing

Dataset

Result

train test with same ranking page

train test with different ranking page

Distribution Shift

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages