This Repo Is For MSMARCO Document/Passage Reranking Task Using GPT-2/T5 Model
- MS MARCO Home Page
- Passage Ranking Git Repo
- Document Ranking Git Repo
- Conversational Search Git Repo
- Jimmy Lin Anserini Passage Retrieval
- Jimmy Lin Anserini Document Retrieval
- TREC 2020 Deep Learning Task
- castorini docTTTTTquery
- Google T5
- OpenAI GPT-2
- Hugging Face Framework
- Google Pegasus
- Facebook BART
- RoBERTa
git clone --recurse-submodules git@github.com:ielab/GPT_Ranker.git
Model Name | Top K | Recall | MRR@100 |
---|---|---|---|
T5-Base | Top 100 (Tuned BM25) |
||
Top 200 (Tuned BM25) |
|||
Top 500 (Tuned BM25) |
|||
Top 1000 (Tuned BM25) |
|||
GPT-2 | Top 100 (Tuned BM25) |
||
Top 200 (Tuned BM25) |
|||
Top 500 (Tuned BM25) |
|||
Top 1000 (Tuned BM25) |
|||
BM25 Initial Retrieval (Tuned k1=3.44 b=0.87 ) |
N/A | R@5 0.4024 R@10 0.4946 R@15 0.5640 R@20 0.6095 R@30 0.6649 R@100 0.7874 R@200 0.8373 R@500 0.8850 R@1000 0.9187 |
0.27880910 |
MS MARCO Top 1000 (Provided Run) |
N/A |
Model Name | Top K | Recall | MRR@10 |
---|---|---|---|
T5-Base | Top 100 (Tuned BM25) |
R@5 0.4294 R@10 0.5321 R@15 0.5778 R@20 0.6076 R@30 0.6354 R@100 0.6701 R@200 0.6701 R@500 0.6701 R@1000 0.6701 |
0.28134977 |
Top 200 (Tuned BM25) |
R@5 0.4413 R@10 0.5496 R@15 0.6037 R@20 0.6371 R@30 0.6719 R@100 0.7333 R@200 0.7383 R@500 0.7383 R@1000 0.7383 |
0.286065970 |
|
Top 500 (Tuned BM25) |
R@5 0.4502 R@10 0.5630 R@15 0.6195 R@20 0.6561 R@30 0.6987 R@100 0.7812 R@200 0.8040 R@500 0.8116 R@1000 0.8116 |
0.2904781 |
|
Top 1000 (Tuned BM25) |
R@5 0.4553 R@10 0.5708 R@15 0.6318 R@20 0.6700 R@30 0.7148 R@100 0.8093 R@200 0.8390 R@500 0.8553 R@1000 0.8573 |
0.2920570 |
|
Top 1000 (Tuned BM25 + Lowercase P) |
R@5 0.4546 R@10 0.5733 R@15 0.6346 R@20 0.6679 R@30 0.7133 R@100 0.8095 R@200 0.8384 R@500 0.8552 R@1000 0.8573 |
0.2945217 |
|
Top 1000 (docTTTTTquery init run) |
R@5 0.4665 R@10 0.5857 R@15 0.6537 R@20 0.6968 R@30 0.7485 R@100 0.8643 R@200 0.9089 R@500 0.9404 R@1000 0.9471 |
0.29763735 |
|
GPT-2 | Top 100 (Tuned BM25) |
||
Top 200 (Tuned BM25) |
|||
Top 500 (Tuned BM25) |
|||
Top 1000 (Tuned BM25) |
|||
BM25 Initial Retrieval (Tuned k1=0.82 b=0.68 ) |
N/A | R@5 0.2944 R@10 0.3916 R@15 0.4459 R@20 0.4842 R@30 0.5307 R@100 0.6701 R@200 0.7383 R@500 0.8116 R@1000 0.8573 |
0.187412 |
MS MARCO Top 1000 (Provided Run) |
N/A | R@5 0.0093 R@10 0.0150 R@15 0.0196 R@20 0.0224 R@30 0.0270 R@100 0.1026 R@200 0.1641 R@500 0.3893 R@1000 0.8140 |
0.00456946 |
docTTTTTquery Top 1000 (40 Samples) |
N/A | R@5 0.4244 R@10 0.5411 R@15 0.6033 R@20 0.6484 R@30 0.6987 R@100 0.8190 R@200 0.8688 R@500 0.9164 R@1000 0.9471 |
0.2767497 |
For query:
how many tables can sql server join
Our model ranks this at top 1:
id: 7485889
contents: How many tables can I have in 1 Sql Azure Database. I know in Sql Server, Tables per database Limited by number of objects in a database, Database objects include objects such as tables, views, stored procedures, user-defined functions, triggers, rules, defaults, and constraints. The sum of the number of objects in a database cannot exceed 2,147,483,647..
The actual relevant document is (we rank this at 949, BM25 rank this at 300):
id: 7485894
contents: SQL JOIN. A JOIN clause is used to combine rows from two or more tables, based on a related column between them. Let's look at a selection from the Orders table: Then, look at a selection from the Customers table: Notice that the CustomerID column in the Orders table refers to the CustomerID in the Customers table. The relationship between the two tables above is the CustomerID column. Then, we can create the following SQL statement (that contains an INNER JOIN), that selects records that have matching values in both tables: Example SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
For query:
what does it mean when you dream about babies
Our model ranks this at top 1:
id: 6680536
contents: A baby in general. Dreams that include babies are positive signs. Dreaming about interacting with a baby or simply seeing a baby in a dream can mean that pleasant surprises and fortuitous occurrences are about to occur in your life. This dream doesn't specify what, but something unexpectedly good is on your horizon.
The actual relevant document is (we rank this at 2, BM25 rank this at 993):
id: 7551052
contents: To see a baby in your dream signifies innocence, warmth and new beginnings. Babies symbolize something in your own inner nature that is pure, vulnerable, helpless and/or uncorrupted. If you dream that the baby is smiling at you, then it suggests that you are experiencing pure joy.
BM25 ranks this at top 1 (we rank this at 47):
id: 3347686
contents: Health related question in topics Psychology .We found some answers as below for this question What does it mean when you dream about somebody having a baby,you can compare them. When you dream of someone having a baby, it means new beginnings. It also means hidden potential related to you is being released. [ Source: http://www.chacha.com/question/what-does-it-mean-when-you-dream-about-somebody-having-a-baby ] More Answers to What does it mean when you dream about somebody having a baby.
Comparison with BM25 model:
The file structure should be same as this:
GPT_Ranker/
+--- anserini/
+--- data/
| +--- doc_rerank/
| | +--- collection_jsonl/
| | | +--- docs00.json
| | | +--- docs01.json
| | | +--- docs02.json
| | | +--- docs03.json
| | | +--- docs04.json
| | | +--- docs05.json
| | | +--- docs06.json
| | +--- index/
| | | +--- lucene-index.msmarco-doc.pos+docvectors+rawdocs/
| | | | +--- HERE CONTAINS THE ANSERINI MSMARCO DOCUMENT INDEX
| | +--- qrels/
| | | +--- qrels.msmarco-doc.dev.txt
| | +--- query/
| | | +--- doc-msmarco-dev-queries.json
| | | +--- doc-msmarco-test2020-queries.json
| | +--- formatted_run.msmarco-doc.dev.bm25.tuned.txt
| | +--- formatted_run.msmarco-doc.test.bm25.tuned.txt
| +--- pass_rerank
| | +--- collection_jsonl/
| | | +--- docs00.json
| | | +--- docs01.json
| | | +--- docs02.json
| | | +--- docs03.json
| | | +--- docs04.json
| | | +--- docs05.json
| | | +--- docs06.json
| | | +--- docs07.json
| | | +--- docs08.json
| | +--- index/
| | | +--- lucene-index-msmarco/
| | | | +--- HERE CONTAINS THE ANSERINI MSMARCO PASSAGE INDEX
| | +--- qrels/
| | | +--- qrels.dev.small.tsv
| | | +--- pass-qrels.msmarco-dev.full.txt
| | +--- query/
| | | +--- pass-queries.dev.json
| | | +--- pass-queries.eval.json
| | | +--- pass-query-dev.small.json
| | +--- run.msmarco-passage.dev.small.tsv
| | +--- run.msmarco-passage.dev.full.tsv
| | +--- run.msmarco-passage.dev.eval.tsv
| +--- pass_train/
| | +--- doc_query_pairs.train.tsv
| +--- doc_train/
| | +--- doc_query_pairs.train.tsv
+--- logs/
| +--- gpt2/
| | +--- CONTAIN GPT2 TRAINING LOGS
| +--- t5/
| | +--- CONTAIN T5 TRAINING LOGS
+--- model/
| +--- gpt-2/
| | +--- CONTAIN PRETRAINED GPT-2 (UNTUNED)
| +--- t5-base/
| | +--- CONTAIN T5 FINE TUNED ON MSMARCO PASSAGE
| +--- t5-base-tuned/
| | +--- tuned_on_doc/
| | | +--- CONTAIN T5 FINE TUNED ON MSMARCO DOCUMENT
| +--- gpt-2-tuned/
| | +--- tuned_on_pass/
| | | +--- CONTAIN GPT-2 FINE TUNED ON MSMARCO PASSAGE
| | +--- tuned_on_doc/
| | | +--- CONTAIN GPT-2 FINE TUNED ON MSMARCO DOCUMENT
+--- result/
| +--- doc_rerank/
| | +--- gpt2/
| | +--- t5/
| +--- pass_rerank/
| | +--- gpt2/
| | +--- t5/
+--- notes/
+--- scripts/
| +--- fine_tuning.py
| +--- passage_msmarco_eval.py
| +--- doc_msmarco_eval.py
| +--- SOME OTHER SCRIPTS
+--- config.json
+--- helper.py
+--- main.py
+--- middleware.py
+--- ranker.py
+--- anserini_retriever.py
+--- README.md
+--- SOME OTHER FILES