Awesome Robustness in Information Retrieval

A curated list of awesome papers related to robustness, adversarial attacks & defenses, and out-of-distribution data for information retrieval(IR). If I missed any papers, feel free to open a PR to include them! Any feedback and contributions are welcome!

We thank all the great contributors very much.

Adversarial Attack

Classical Black-hat SEO (Spamming)

Web Spam Taxonomy. Zoltan Gyongyi et.al. AIRWeb 2005.(Web Spamming)
International Workshop on Adversarial Information Retrieval on the Web. AIRWeb 2005-2009.
Adversarial web search. Castillo, Carlos, and Brian D. Davison FnTIR 2011.
MAWSEO: Adversarial Wiki Search Poisoning for Illicit Online Promotion. Zilong Lin et.al. S&P 2024. (Adversarial Revisions)

White-hat SEO

Ranking-Incentivized Quality Preserving Content Modification. Goren Gregory et.al. SIGIR 2020.

Neural Methods for IR Attack

One word at a time: adversarial attacks on retrieval models. Raval, Nisarg, and Manisha Verma Arxiv 2020.(White-box)
Adversarial Semantic Collisions. Congzheng Song et.al. EMNLP 2020.(White-box)
Bert rankers are brittle: A study using adversarial document perturbations. Yumeng Wang et.al. ICTIR 2022.(White-box)
PRADA: Practical Black-Box Adversarial Attacks against Neural Ranking Models. Chen Wu et.al. TOIS 2022.(Black-box, Word substitution)
Order-Disorder: Imitation Adversarial Attacks for Black-box Neural Ranking Models. Jiawei Liu et.al. CCS 2022.(Black-box, Trigger)
TRAttack: Text Rewriting Attack Against Text Retrieval Junshuai Song et.al. RepL4NLP 2022. (Rewriting Attack, Matching Model)
Topic-oriented Adversarial Attacks against Black-box Neural Ranking Models. Yu-An Liu et.al. SIGIR 2023.(Black-box, TARA task)
Towards Imperceptible Document Manipulations against Neural Ranking Models. Xuanang Chen et.al. ACL 2023 findings.(Black-box, Prompt)
Black-box Adversarial Attacks against Dense Retrieval Models: A Multi-view Contrastive Learning Method Yu-An Liu et.al. CIKM 2023.(Black-box, Dense Retrieval Attack)
Boosting Big Brother: Attacking Search Engines with Encodings. Nicholas Boucher et.al. RAID 2023.(Encoding attack)
Backdoor Attacks on Dense Passage Retrievers for Disseminating Misinformation. Quanyu Long et.al. Arxiv 2024.(Backdoor attack)
Poisoning Retrieval Corpora by Injecting Adversarial Passages. Zexuan Zhong et.al. EMNLP 2023.(Dense Retrieval attack)
Multi-granular Adversarial Attacks against Black-box Neural Ranking Models. Yu-An Liu et.al. SIGIR 2024.(Multi-granular attack)
Analyzing Adversarial Attacks on Sequence-to-Sequence Relevance Models. Andrew Parry et.al. ECIR 2024.(Attacking T5)

Defense

Out-of-distribution

Data Augmentation

Data augmentation for sample efficient and robust document ranking Abhijit Anand et al. TOIS 2023.
Data augmentation and transfer learning for brain tumor detection in magnetic resonance imaging A. Anaya-Isaza et al. IEEE Access 2022.
InPars: Data Augmentation for Information Retrieval using Large Language Models Bonifacio et al. Arxiv 2022.
HypeR: Multitask Hyper-Prompted Training Enables Large-Scale Retrieval Generalization Cai et al. ICLR 2023.
DUQGen: Effective Unsupervised Domain Adaptation of Neural Rankers by Diversifying Synthetic Query Generation Ramraj Chandradevan et al. Arxiv 2024.
Cross-domain augmentation networks for click-through rate prediction Xu Chen et al. Arxiv 2023.(CDAnet)
Promptagator: Few-shot dense retrieval from 8 examples Zhuyun Dai et al. Arxiv 2022.(PROMPTAGATOR)
Augmenting zero-shot dense retrievers with plug-in mixture-of-memories Suyu Ge et al. EMNLP 2023.(MoMA)
Unsupervised dense information retrieval with contrastive learning Gautier Izacard et al. Arxiv 2021.
InRanker: Distilled Rankers for Zero-shot Information Retrieval Thiago Laitz et al. Arxiv 2024. (InRanker)
Domain Adaptation for Dense Retrieval and Conversational Dense Retrieval through Self-Supervision by Meticulous Pseudo-Relevance Labeling Minghan Li and Eric Gaussier. LREC-COLING 2024.(DoDress)
Embedding-based zero-shot retrieval through query generation Davis Liang et al. Arxiv 2020.
Challenges in generalization in open domain question answering Linqing Liu et al. NAACL 2022.
Zero-shot neural passage retrieval via domain-targeted synthetic question generation Ji Ma et al. EACL 2021.
Text and code embeddings by contrastive pre-training Arvind Neelakantan et al. Arxiv 2022.
Data augmentation for neural machine translation using generative language model Seokjin Oh et al. Arxiv 2023.
Learning to retrieve passages without supervision Ori Ram et al. NAACL 2022.(Spider)
Towards robust neural retrieval models with synthetic pre-training Revanth Gangi Reddy et al. Arxiv 2021.
Questions are all you need to train a dense passage retriever Devendra Singh Sachan et al. TACL. (ART)
Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models Nandan Thakur et al. NeurIPS 2021. (BEIR)
GPL: Generative pseudo labeling for unsupervised domain adaptation of dense retrieval Kexin Wang et al. NAACL 2022. (GPL)
- Coco-dr: Combating distribution shifts in zero-shot dense retrieval with contrastive and distributionally robust learning Arxiv 2022. (Coco-dr)

Domain Modeling

Coco-dr: Combating distribution shifts in zero-shot dense retrieval with contrastive and distributionally robust learning Yu et al. Arxiv 2022. (Coco-dr)
Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy SeongKu Kang et al. Arxiv 2024.(ToTER)
Learning list-level domain-invariant representations for ranking Ruicheng Xian et al. NeurIPS 2023.
Zero-shot dense retrieval with momentum adversarial domain invariant representations Ji Xin et al. ACL 2022. (MoDIR)
BERM: Training the balanced and extractable representation for matching to improve generalization ability of dense retrieval Shicheng Xu et al. ACL 2023. (BERM)
Disentangled modeling of domain and relevance for adaptable dense retrieval Jingtao Zhan et al. (DDR)

Architectural Modifications

Out-of-domain semantics to the rescue! zero-shot hybrid retrieval models Tao Chen et al. ECIR 2022.
From distillation to hard negative sampling: Making sparse neural ir models more effective Thibault Formal et al. SIGIR 2022.
Zero-shot retrieval with search agents and hybrid environments Michelle Chen Huebscher et al. Arxiv 2022.
DESIRE-ME: Domain-Enhanced Supervised Information Retrieval Using Mixture-of-Experts Pranav Kasela et al. ECIR 2024. (DESIRE-ME)
Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense Encoders Hyunji Lee et al. Arxiv 2023.

Scaling up the Model Capacity

Ernie-search: Bridging cross-encoder with dual-encoder via self on-the-fly distillation for dense passage retrieval Yuxiang Lu et al. Arxiv 2022.
Large dual encoders are generalizable retrievers Jianmo Ni et al. EMNLP 2022.

Continual Learning for Generative Retrieval

Continual learning for generative retrieval over dynamic corpora Jiangui Chen et al. CIKM 2023. (CLEVER)
Corpusbrain++: A continual generative pre-training framework for knowledge-intensive language tasks Jiafeng Guo et al. Arxiv 2024. (CorpusBrain++)
Incdsi: incrementally updatable document retrieval Varsha Kishore el al. PMLR 2023. (IncDSI)
DSI++: Updating transformer memory with new documents Sanket Mehta et al. EMNLP 2023. (DSI++)
Continually Updating Generative Retrieval on Dynamic Corpora Soyoung Yoon et al. Arxiv 2023.

Continual Learning for Dense Retrieval

L2R: Lifelong Learning for First-stage Retrieval with Backward-Compatible Representations Yinqiong Cai et al. CIKM 2023. (L$^2$R)

Query Variations

Dealing with Typos for BERT-based Passage Retrieval and Ranking Shengyao Zhuang et al. EMNLP 2021.(DRTA)
Towards Robust Dense Retrieval via Local Ranking Alignment Xuanang Chen et al. IJCAI 2022.(RoDR)
Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators Penha Gustavo et al. ECIR 2022.
CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos Shengyao Zhuang et al. SIGIR 2022.(CBST)
Analysing the Robustness of Dual Encoders for Dense Retrieval Against Misspellings Sidiropoulos Georgios et al. SIGIR 2022.(DACL)
MIRS: [MASK] Insertion Based Retrieval Stabilizer for Query Variations Junping Liu et al. DEXA 2023.(MIRS)
Typos-aware bottlenecked pre-training for robust dense retrieval Shengyao Zhuang et al. SIGIR-AP 2023(ToRoDer)
Contrastive fine-tuning improves robustness for neural rankers Xiaofei Ma et al. ACL-IJCNLP 2021.
Improving the Robustness of Dense Retrievers Against Typos via Multi-Positive Contrastive Learning Georgios Sidiropoulos et al. ECIR 2024.
Noise-robust dense retrieval via contrastive alignment post training Daniel Campos et al. Arxiv 2023.(CAPOT)
Towards Robust Neural Rankers with Large Language Model: A Contrastive Training Approach Ziyang Pan et al. Applied Sciences 2023.
Typo-robust representation learning for dense retrieval Panuthep Tasawong et al. ACL 2023.(DST)

Unseen Query Type

Learning to Jointly Transform and Rank Difficult Queries Amin Bigdeli et al. ECIR 2024.
Cross domain regularization for neural ranking models using adversarial learning Daniel Cohen et al. SIGIR 2018.
Ms-shift: An analysis of ms marco distribution shifts on neural retrieval Simon Lupart at al. ECIR 2023.(MS-Shift)
Contrastive fine-tuning improves robustness for neural rankers Xiaofei Ma et al. ACL-IJCNLP 2021.
Simple entity-centric questions challenge dense retrievers Christopher Sciavolino et al. EMNLP 2021.

Benchmark and Evaluation

Are Neural Ranking Models Robust? Chen Wu et.al. TOIS 2022
Evaluating Interpolation and Extrapolation Performance of Neural Retrieval Models. Jingtao Zhan et.al. CIKM 2022

Perspective Papers

Competitive Search. Oren Kurland et.al. SIGIR 2022.(Competitive Search)
A Game Theoretic Analysis of the Adversarial Retrieval Setting. Basat, Ran Ben et.al. JAIR 2017.(PRP is sub-optimal)

Adversarial Attack and Defense for Image Retrieval

Targeted Mismatch Adversarial Attack: Query with a Flower to Retrieve the Tower Giorgos Tolias et al. ICCV 2019.(TMA)
Universal Perturbation Attack Against Image Retrieval Jie Li et al. ICCV 2019.(UAP)
Adversarial Ranking Attack and Defense Mo Zhou et al. ECCV 2020.(Candidate Attack and Query Attack)
You See What I Want You to See: Exploring Targeted Black-Box Transferability Attack for Hash-based Image Retrieval Systems Yanru Xiao et al. CVPR 2021.(Hash-based: Noise-induced Adversarial Generation)
Practical Relative Order Attack in Deep Ranking Mo Zhou et al. ICCV 2021.
QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval Xiaodan Li et al. CVPR 2021.(Query-based Attack against Image Retrieval)
ARRA: Absolute-Relative Ranking Attack against Image Retrieval Siyuan Li et al. MM 2022.(ARRA)
RetrievalGuard: Provably Robust 1-Nearest Neighbor Image Retrieval Yihan Wu et al. ICML 2022.(RetrievalGuard)

Other Resources

CREDENCE: Counterfactual Explanations for Document Ranking Rorseth Joel et al. Arxiv 2023. Webpage of interactive tool

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
imgs		imgs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Robustness in Information Retrieval

Contents

Adversarial Attack

Classical Black-hat SEO (Spamming)

White-hat SEO

Neural Methods for IR Attack

Defense

Adversarial Training

Certified Defense

Detection

Out-of-distribution

Data Augmentation

Domain Modeling

Architectural Modifications

Scaling up the Model Capacity

Continual Learning for Generative Retrieval

Continual Learning for Dense Retrieval

Query Variations

Unseen Query Type

Benchmark and Evaluation

Perspective Papers

Adversarial Attack and Defense for Image Retrieval

Other Resources

About

Releases

Packages

Contributors 3

Languages

Davion-Liu/Awesome-Robustness-in-Information-Retrieval

Folders and files

Latest commit

History

Repository files navigation

Awesome Robustness in Information Retrieval

Contents

Adversarial Attack

Classical Black-hat SEO (Spamming)

White-hat SEO

Neural Methods for IR Attack

Defense

Adversarial Training

Certified Defense

Detection

Out-of-distribution

Data Augmentation

Domain Modeling

Architectural Modifications

Scaling up the Model Capacity

Continual Learning for Generative Retrieval

Continual Learning for Dense Retrieval

Query Variations

Unseen Query Type

Benchmark and Evaluation

Perspective Papers

Adversarial Attack and Defense for Image Retrieval

Other Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages