forked from castorini/anserini
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fixed "neural hype" tuning experiments after Lucene 8.0 upgrade (cast…
…orini#736) + retuned retrieval models for Lucene 8.0 Refactored tuning script: + made command-line parameters more consistent + broke fold settings into external config files for greater generality + removed unintuitive distinction between "model" and "basemodel": there's just "model" now.
- Loading branch information
Showing
34 changed files
with
298 additions
and
94,923 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,74 +1,119 @@ | ||
# Anserini: SIGIR Forum 2018 Experiments | ||
# Anserini: "Neural Hype" Baseline Experiments | ||
|
||
This page documents code for replicating results from the following article: | ||
This page provides documentation for replicating results from two "neural hype" papers, which questioned whether neural ranking models actually represent improvements in _ad hoc_ retrieval effectiveness over well-tuned "competitive baselines" in limited data scenarios: | ||
|
||
+ Jimmy Lin. [The Neural Hype and Comparisons Against Weak Baselines.](http://sigir.org/wp-content/uploads/2019/01/p040.pdf) SIGIR Forum, 52(2):40-51, 2018. | ||
+ Wei Yang, Kuang Lu, Peilin Yang, and Jimmy Lin. [Critically Examining the "Neural Hype": Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models.](https://cs.uwaterloo.ca/~jimmylin/publications/Yang_etal_SIGIR2019.pdf) _SIGIR 2019_. | ||
|
||
Note that the commit [2c8cd7a](https://github.com/castorini/Anserini/commit/2c8cd7a550faca0fc450e4159a4a874d4795ac25) referenced in the article is out of date with respect to the latest experimental results. | ||
See "History" section below. | ||
The "competitive baseline" referenced in the two above papers is BM25+RM3, with proper parameter tuning, on the test collection from the TREC 2004 Robust Track (Robust04). | ||
Scripts referenced on this page encode automated regressions that allow users to recreate and verify the results reported below. | ||
|
||
**Requirements**: Python>=2.6 or Python>=3.5 `pip install -r src/main/python/requirements.txt` | ||
The SIGIR Forum article references commit [`2c8cd7a`](https://github.com/castorini/Anserini/commit/2c8cd7a550faca0fc450e4159a4a874d4795ac25) (11/16/2018), the results of which changed slightly with an upgrade to Lucene 7.6 at commit [`e71df7a`](https://github.com/castorini/Anserini/commit/e71df7aee42c7776a63b9845600a4075632fa11c) (12/18/2018). | ||
The SIGIR 2019 paper contains experiments performed post upgrade. | ||
|
||
Folds: | ||
The Anserini upgrade to Lucene 8.0 at commit [`75e36f9`](https://github.com/castorini/anserini/commit/75e36f97f7037d1ceb20fa9c91582eac5e974131) (6/12/2019) broke the regression tests, which was later fixed at commit [`xxxxxxx`](https://github.com/castorini/anserini/commit/xxxxxxx) (x/x/xxx). | ||
This commit represents the latest state of the code and the results that can be currently replicated. | ||
See summary in "History" section below. | ||
|
||
|
||
## Expected Results | ||
|
||
Retrieval models are tuned with respect to following fold definitions: | ||
|
||
+ [Folds for 2-fold cross-validation used in "paper 1"](../src/main/resources/fine_tuning/robust04-paper1-folds.json) | ||
+ [Folds for 5-fold cross-validation used in "paper 2"](../src/main/resources/fine_tuning/robust04-paper2-folds.json) | ||
|
||
Here are expected results for various retrieval models: | ||
|
||
AP | Paper 1 | Paper 2 | | ||
:------------------|---------|---------| | ||
BM25 (default) | 0.2531 | 0.2531 | | ||
BM25 (tuned) | 0.2539 | 0.2531 | | ||
QL (default) | 0.2467 | 0.2467 | | ||
QL (tuned) | 0.2520 | 0.2499 | | ||
BM25+RM3 (default) | 0.2903 | 0.2903 | | ||
BM25+RM3 (tuned) | 0.3043 | 0.3021 | | ||
BM25+Ax (default) | 0.2896 | 0.2896 | | ||
BM25+Ax (tuned) | 0.2940 | 0.2950 | | ||
|
||
|
||
## Parameter Tuning | ||
|
||
First, change the index path at `src/main/resources/fine_tuning/collections.yaml`. | ||
The script will go through the `index_roots` and concatenate with the collection's `index_path` and take the first match as the index path. | ||
Before starting, modify the index path at `src/main/resources/fine_tuning/collections.yaml`. | ||
The tuning script will go through the `index_roots`, concatenate with the collection's `index_path`, and take the first match as the location of the index. | ||
|
||
BM25 Robust04 (runs + eval + print results): | ||
Tuning BM25: | ||
|
||
``` | ||
python src/main/python/fine_tuning/run_batch.py --collection robust04 --model bm25 --n 44 --run --use_drr_fold | ||
python src/main/python/fine_tuning/run_batch.py --collection robust04 --model bm25 --threads 18 --run | ||
python src/main/python/fine_tuning/run_batch.py --collection robust04 --model bm25 --threads 18 --run --fold_settings src/main/resources/fine_tuning/robust04-paper1-folds.json --verbose | ||
python src/main/python/fine_tuning/run_batch.py --collection robust04 --model bm25 --threads 18 --run --fold_settings src/main/resources/fine_tuning/robust04-paper2-folds.json --verbose | ||
``` | ||
|
||
QL Robust04 (runs + eval + print results): | ||
The first command runs the parameter sweeps and prints general statistics. | ||
The second and third commands use a specific fold setting to perform cross-validation and print out model parameters. | ||
|
||
Tuning QL (commands similarly organized): | ||
|
||
``` | ||
python src/main/python/fine_tuning/run_batch.py --collection robust04 --basemodel ql --model ql --n 44 --run --use_drr_fold | ||
python src/main/python/fine_tuning/run_batch.py --collection robust04 --model ql --threads 18 --run | ||
python src/main/python/fine_tuning/run_batch.py --collection robust04 --model ql --threads 18 --run --fold_settings src/main/resources/fine_tuning/robust04-paper1-folds.json --verbose | ||
python src/main/python/fine_tuning/run_batch.py --collection robust04 --model ql --threads 18 --run --fold_settings src/main/resources/fine_tuning/robust04-paper2-folds.json --verbose | ||
``` | ||
|
||
BM25+RM3 Robust04 (runs + eval + print results): | ||
Tuning BM25+RM3 (commands similarly organized): | ||
|
||
``` | ||
python src/main/python/fine_tuning/run_batch.py --collection robust04 --model bm25+rm3 --n 44 --run --use_drr_fold | ||
python src/main/python/fine_tuning/run_batch.py --collection robust04 --model bm25+rm3 --threads 18 --run | ||
python src/main/python/fine_tuning/run_batch.py --collection robust04 --model bm25+rm3 --threads 18 --run --fold_settings src/main/resources/fine_tuning/robust04-paper1-folds.json --verbose | ||
python src/main/python/fine_tuning/run_batch.py --collection robust04 --model bm25+rm3 --threads 18 --run --fold_settings src/main/resources/fine_tuning/robust04-paper2-folds.json --verbose | ||
``` | ||
|
||
BM25+AxiomaticReranking Robust04 (runs + eval + print results): | ||
Tuning BM25+Ax (commands similarly organized): | ||
|
||
``` | ||
python src/main/python/fine_tuning/run_batch.py --collection robust04 --model bm25+axiom --n 44 --run --use_drr_fold | ||
python src/main/python/fine_tuning/run_batch.py --collection robust04 --model bm25+axiom --threads 18 --run | ||
python src/main/python/fine_tuning/run_batch.py --collection robust04 --model bm25+axiom --threads 18 --run --fold_settings src/main/resources/fine_tuning/robust04-paper1-folds.json --verbose | ||
python src/main/python/fine_tuning/run_batch.py --collection robust04 --model bm25+axiom --threads 18 --run --fold_settings src/main/resources/fine_tuning/robust04-paper2-folds.json --verbose | ||
``` | ||
|
||
## Tuned Run | ||
|
||
Tuned parameter values: | ||
## Tuned Runs | ||
|
||
Tuned parameter values for BM25+RM3: | ||
|
||
+ [For the 2-fold cross-validation used in "paper 1", in terms of MAP](../src/main/resources/fine_tuning/robust04-paper1-folds-map-params.json) | ||
+ [For tor 5-fold cross-validation used in "paper 2", in terms of MAP](../src/main/resources/fine_tuning/robust04-paper2-folds-map-params.json) | ||
+ [For the 2-fold cross-validation used in "paper 1", in terms of MAP](../src/main/resources/fine_tuning/params/params.map.robust04-paper1-folds.bm25+rm3.json) | ||
+ [For the 5-fold cross-validation used in "paper 2", in terms of MAP](../src/main/resources/fine_tuning/params/params.map.robust04-paper2-folds.bm25+rm3.json) | ||
|
||
To be clear, these are the tuned parameters on _that_ fold, trained on the remaining folds. | ||
|
||
The follow script will reconstruct the tuned runs for BM25 + RM3: | ||
The following script will reconstruct the tuned runs for BM25+RM3: | ||
|
||
``` | ||
python src/main/python/fine_tuning/reconstruct_robus04_tuned_run.py \ | ||
--index lucene-index.robust04.pos+docvectors+rawdocs \ | ||
--folds src/main/resources/fine_tuning/robust04-paper2-folds.json \ | ||
--params src/main/resources/fine_tuning/robust04-paper2-folds-map-params.json | ||
--folds src/main/resources/fine_tuning/robust04-paper1-folds.json \ | ||
--params src/main/resources/fine_tuning/params/params.map.robust04-paper1-folds.bm25+rm3.json \ | ||
--output run.robust04.bm25+rm3.paper1.txt | ||
``` | ||
|
||
Change `paper2` to `paper1` to reconstruct using the folds in paper 1. | ||
Change `paper1` to `paper2` to reconstruct using the folds in paper 2. | ||
|
||
To reconstruct runs from other retrieval models, use the parameter definitions in [`src/main/resources/fine_tuning/params/`](../src/main/resources/fine_tuning/params/), plugging them into the above command as appropriate. | ||
|
||
Note that applying `trec_eval` to these reconstructed runs might yield AP that is a tiny bit different from the values reported above (difference of 0.0001 at the most). | ||
This difference arises from rounding when averaging across the folds. | ||
|
||
|
||
## History | ||
|
||
+ commit [407f308](https://github.com/castorini/Anserini/commit/407f308cc543286e39701caf0acd1afab39dde2c) (2019/1/2) - Added results for axiomatic semantic term matching. | ||
+ commit [e71df7a](https://github.com/castorini/Anserini/commit/e71df7aee42c7776a63b9845600a4075632fa11c) (2018/12/18) - Upgrade to Lucene 7.6. | ||
+ commit [18c3211](https://github.com/castorini/Anserini/commit/18c3211117f35f72cbc1019c125ff885f51056ea) (2018/12/9) - minor fixes. | ||
+ commit [2c8cd7a](https://github.com/castorini/Anserini/commit/2c8cd7a550faca0fc450e4159a4a874d4795ac25) (2018/11/16) - commit id referenced in SIGIR Forum article. | ||
The following documents commits that have altered effectiveness figures: | ||
|
||
|
||
+ commit [`xxxxxxx`](https://github.com/castorini/anserini/commit/xxxxxxx) (x/xx/xxxx) - Regression experiments here fixed. | ||
+ commit [`75e36f9`](https://github.com/castorini/anserini/commit/75e36f97f7037d1ceb20fa9c91582eac5e974131) (6/12/2019) - Upgrade to Lucene 8.0 breaks regression experiments here. | ||
+ commit [`407f308`](https://github.com/castorini/Anserini/commit/407f308cc543286e39701caf0acd1afab39dde2c) (1/2/2019) - Added results for axiomatic semantic term matching. | ||
+ commit [`e71df7a`](https://github.com/castorini/Anserini/commit/e71df7aee42c7776a63b9845600a4075632fa11c) (12/18/2018) - Upgrade to Lucene 7.6. | ||
+ commit [`2c8cd7a`](https://github.com/castorini/Anserini/commit/2c8cd7a550faca0fc450e4159a4a874d4795ac25) (11/16/2018) - commit id referenced in SIGIR Forum article. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.