A curated list of awesome papers related to robustness, adversarial attacks & defenses, and out-of-distribution data for information retrieval(IR). If I missed any papers, feel free to open a PR to include them! Any feedback and contributions are welcome!
We thank all the great contributors very much.
- Adversarial Attack
- Defense
- Out-of-distribution
- Benchmark and Evaluation
- Perspective Papers
- Adversarial Attack and Defense for Image Retrieval
- Other Resources
- Web Spam Taxonomy. Zoltan Gyongyi et.al. AIRWeb 2005.(Web Spamming)
- International Workshop on Adversarial Information Retrieval on the Web. AIRWeb 2005-2009.
- Adversarial web search. Castillo, Carlos, and Brian D. Davison FnTIR 2011.
- MAWSEO: Adversarial Wiki Search Poisoning for Illicit Online Promotion. Zilong Lin et.al. S&P 2024. (Adversarial Revisions)
- Ranking-Incentivized Quality Preserving Content Modification. Goren Gregory et.al. SIGIR 2020.
- One word at a time: adversarial attacks on retrieval models. Raval, Nisarg, and Manisha Verma Arxiv 2020.(White-box)
- Adversarial Semantic Collisions. Congzheng Song et.al. EMNLP 2020.(White-box)
- Bert rankers are brittle: A study using adversarial document perturbations. Yumeng Wang et.al. ICTIR 2022.(White-box)
- PRADA: Practical Black-Box Adversarial Attacks against Neural Ranking Models. Chen Wu et.al. TOIS 2022.(Black-box, Word substitution)
- Order-Disorder: Imitation Adversarial Attacks for Black-box Neural Ranking Models. Jiawei Liu et.al. CCS 2022.(Black-box, Trigger)
- TRAttack: Text Rewriting Attack Against Text Retrieval Junshuai Song et.al. RepL4NLP 2022. (Rewriting Attack, Matching Model)
- Topic-oriented Adversarial Attacks against Black-box Neural Ranking Models. Yu-An Liu et.al. SIGIR 2023.(Black-box, TARA task)
- Towards Imperceptible Document Manipulations against Neural Ranking Models. Xuanang Chen et.al. ACL 2023 findings.(Black-box, Prompt)
- Black-box Adversarial Attacks against Dense Retrieval Models: A Multi-view Contrastive Learning Method Yu-An Liu et.al. CIKM 2023.(Black-box, Dense Retrieval Attack)
- Boosting Big Brother: Attacking Search Engines with Encodings. Nicholas Boucher et.al. RAID 2023.(Encoding attack)
- Backdoor Attacks on Dense Passage Retrievers for Disseminating Misinformation. Quanyu Long et.al. Arxiv 2024.(Backdoor attack)
- Poisoning Retrieval Corpora by Injecting Adversarial Passages. Zexuan Zhong et.al. EMNLP 2023.(Dense Retrieval attack)
- Multi-granular Adversarial Attacks against Black-box Neural Ranking Models. Yu-An Liu et.al. SIGIR 2024.(Multi-granular attack)
- Analyzing Adversarial Attacks on Sequence-to-Sequence Relevance Models. Andrew Parry et.al. ECIR 2024.(Attacking T5)
- IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models Wang, Jun, et al. SIGIR 2017.(IRGAN)
- Adversarial Sampling and Training for Semi-Supervised Information Retrieval Park, Dae Hoon, Yi Chang WWW 2019.(AdvIR)
- Adversarial Retriever-Ranker for dense text retrieval Zhang, Hang, et al. ICLR 2022.(AR2)
- Towards Robust Ranker for Text Retrieval Yucheng, Zhou, et al. Arxiv 2022.(R2ANKER)
- Dealing with textual noise for robust and effective BERT re-ranking Chen, Xuanang, et al. IPM 2023.
- A Study on FGSM Adversarial Training for Neural Retrieval Lupart, Simon, Stéphane Clinchant ECIR 2023.
- Perturbation-Invariant Adversarial Training for Neural Ranking Models: Improving the Effectiveness-Robustness Trade-Off Yu-An Liu et.al. AAAI 2024.(Perturbation-invariance theory)
- Certified Robustness to Word Substitution Ranking Attack for Neural Ranking Models Chen Wu et.al. CIKM 2022
- Defense of Adversarial Ranking Attack in Text Retrieval: Benchmark and Baseline via Detection Xuanang Chen et.al. Arxiv 2023
- Data augmentation for sample efficient and robust document ranking Abhijit Anand et al. TOIS 2023.
- Data augmentation and transfer learning for brain tumor detection in magnetic resonance imaging A. Anaya-Isaza et al. IEEE Access 2022.
- InPars: Data Augmentation for Information Retrieval using Large Language Models Bonifacio et al. Arxiv 2022.
- HypeR: Multitask Hyper-Prompted Training Enables Large-Scale Retrieval Generalization Cai et al. ICLR 2023.
- DUQGen: Effective Unsupervised Domain Adaptation of Neural Rankers by Diversifying Synthetic Query Generation Ramraj Chandradevan et al. Arxiv 2024.
- Cross-domain augmentation networks for click-through rate prediction Xu Chen et al. Arxiv 2023.(CDAnet)
- Promptagator: Few-shot dense retrieval from 8 examples Zhuyun Dai et al. Arxiv 2022.(PROMPTAGATOR)
- Augmenting zero-shot dense retrievers with plug-in mixture-of-memories Suyu Ge et al. EMNLP 2023.(MoMA)
- Unsupervised dense information retrieval with contrastive learning Gautier Izacard et al. Arxiv 2021.
- InRanker: Distilled Rankers for Zero-shot Information Retrieval Thiago Laitz et al. Arxiv 2024. (InRanker)
- Domain Adaptation for Dense Retrieval and Conversational Dense Retrieval through Self-Supervision by Meticulous Pseudo-Relevance Labeling Minghan Li and Eric Gaussier. LREC-COLING 2024.(DoDress)
- Embedding-based zero-shot retrieval through query generation Davis Liang et al. Arxiv 2020.
- Challenges in generalization in open domain question answering Linqing Liu et al. NAACL 2022.
- Zero-shot neural passage retrieval via domain-targeted synthetic question generation Ji Ma et al. EACL 2021.
- Text and code embeddings by contrastive pre-training Arvind Neelakantan et al. Arxiv 2022.
- Data augmentation for neural machine translation using generative language model Seokjin Oh et al. Arxiv 2023.
- Learning to retrieve passages without supervision Ori Ram et al. NAACL 2022.(Spider)
- Towards robust neural retrieval models with synthetic pre-training Revanth Gangi Reddy et al. Arxiv 2021.
- Questions are all you need to train a dense passage retriever Devendra Singh Sachan et al. TACL. (ART)
- Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models Nandan Thakur et al. NeurIPS 2021. (BEIR)
- GPL: Generative pseudo labeling for unsupervised domain adaptation of dense retrieval Kexin Wang et al. NAACL 2022. (GPL)
- Coco-dr: Combating distribution shifts in zero-shot dense retrieval with contrastive and distributionally robust learning Yu et al. Arxiv 2022. (Coco-dr)
- Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy SeongKu Kang et al. Arxiv 2024.(ToTER)
- Learning list-level domain-invariant representations for ranking Ruicheng Xian et al. NeurIPS 2023.
- Zero-shot dense retrieval with momentum adversarial domain invariant representations Ji Xin et al. ACL 2022. (MoDIR)
- BERM: Training the balanced and extractable representation for matching to improve generalization ability of dense retrieval Shicheng Xu et al. ACL 2023. (BERM)
- Disentangled modeling of domain and relevance for adaptable dense retrieval Jingtao Zhan et al. (DDR)
- Out-of-domain semantics to the rescue! zero-shot hybrid retrieval models Tao Chen et al. ECIR 2022.
- From distillation to hard negative sampling: Making sparse neural ir models more effective Thibault Formal et al. SIGIR 2022.
- Zero-shot retrieval with search agents and hybrid environments Michelle Chen Huebscher et al. Arxiv 2022.
- DESIRE-ME: Domain-Enhanced Supervised Information Retrieval Using Mixture-of-Experts Pranav Kasela et al. ECIR 2024. (DESIRE-ME)
- Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense Encoders Hyunji Lee et al. Arxiv 2023.
- Ernie-search: Bridging cross-encoder with dual-encoder via self on-the-fly distillation for dense passage retrieval Yuxiang Lu et al. Arxiv 2022.
- Large dual encoders are generalizable retrievers Jianmo Ni et al. EMNLP 2022.
- Continual learning for generative retrieval over dynamic corpora Jiangui Chen et al. CIKM 2023. (CLEVER)
- Corpusbrain++: A continual generative pre-training framework for knowledge-intensive language tasks Jiafeng Guo et al. Arxiv 2024. (CorpusBrain++)
- Incdsi: incrementally updatable document retrieval Varsha Kishore el al. PMLR 2023. (IncDSI)
- DSI++: Updating transformer memory with new documents Sanket Mehta et al. EMNLP 2023. (DSI++)
- Continually Updating Generative Retrieval on Dynamic Corpora Soyoung Yoon et al. Arxiv 2023.
- L2R: Lifelong Learning for First-stage Retrieval with Backward-Compatible Representations Yinqiong Cai et al. CIKM 2023. (L$^2$R)
- Dealing with Typos for BERT-based Passage Retrieval and Ranking Shengyao Zhuang et al. EMNLP 2021.(DRTA)
- Towards Robust Dense Retrieval via Local Ranking Alignment Xuanang Chen et al. IJCAI 2022.(RoDR)
- Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators Penha Gustavo et al. ECIR 2022.
- CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos Shengyao Zhuang et al. SIGIR 2022.(CBST)
- Analysing the Robustness of Dual Encoders for Dense Retrieval Against Misspellings Sidiropoulos Georgios et al. SIGIR 2022.(DACL)
- MIRS: [MASK] Insertion Based Retrieval Stabilizer for Query Variations Junping Liu et al. DEXA 2023.(MIRS)
- Typos-aware bottlenecked pre-training for robust dense retrieval Shengyao Zhuang et al. SIGIR-AP 2023(ToRoDer)
- Contrastive fine-tuning improves robustness for neural rankers Xiaofei Ma et al. ACL-IJCNLP 2021.
- Improving the Robustness of Dense Retrievers Against Typos via Multi-Positive Contrastive Learning Georgios Sidiropoulos et al. ECIR 2024.
- Noise-robust dense retrieval via contrastive alignment post training Daniel Campos et al. Arxiv 2023.(CAPOT)
- Towards Robust Neural Rankers with Large Language Model: A Contrastive Training Approach Ziyang Pan et al. Applied Sciences 2023.
- Typo-robust representation learning for dense retrieval Panuthep Tasawong et al. ACL 2023.(DST)
- Learning to Jointly Transform and Rank Difficult Queries Amin Bigdeli et al. ECIR 2024.
- Cross domain regularization for neural ranking models using adversarial learning Daniel Cohen et al. SIGIR 2018.
- Ms-shift: An analysis of ms marco distribution shifts on neural retrieval Simon Lupart at al. ECIR 2023.(MS-Shift)
- Contrastive fine-tuning improves robustness for neural rankers Xiaofei Ma et al. ACL-IJCNLP 2021.
- Simple entity-centric questions challenge dense retrievers Christopher Sciavolino et al. EMNLP 2021.
- Are Neural Ranking Models Robust? Chen Wu et.al. TOIS 2022
- Evaluating Interpolation and Extrapolation Performance of Neural Retrieval Models. Jingtao Zhan et.al. CIKM 2022
- Competitive Search. Oren Kurland et.al. SIGIR 2022.(Competitive Search)
- A Game Theoretic Analysis of the Adversarial Retrieval Setting. Basat, Ran Ben et.al. JAIR 2017.(PRP is sub-optimal)
- Targeted Mismatch Adversarial Attack: Query with a Flower to Retrieve the Tower Giorgos Tolias et al. ICCV 2019.(TMA)
- Universal Perturbation Attack Against Image Retrieval Jie Li et al. ICCV 2019.(UAP)
- Adversarial Ranking Attack and Defense Mo Zhou et al. ECCV 2020.(Candidate Attack and Query Attack)
- You See What I Want You to See: Exploring Targeted Black-Box Transferability Attack for Hash-based Image Retrieval Systems Yanru Xiao et al. CVPR 2021.(Hash-based: Noise-induced Adversarial Generation)
- Practical Relative Order Attack in Deep Ranking Mo Zhou et al. ICCV 2021.
- QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval Xiaodan Li et al. CVPR 2021.(Query-based Attack against Image Retrieval)
- ARRA: Absolute-Relative Ranking Attack against Image Retrieval Siyuan Li et al. MM 2022.(ARRA)
- RetrievalGuard: Provably Robust 1-Nearest Neighbor Image Retrieval Yihan Wu et al. ICML 2022.(RetrievalGuard)
- CREDENCE: Counterfactual Explanations for Document Ranking Rorseth Joel et al. Arxiv 2023. Webpage of interactive tool