Skip to content

Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy (WWW'24)

License

Notifications You must be signed in to change notification settings

SeongKu-Kang/ToTER_WWW24

Repository files navigation

Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy

This repository provides the source code of "Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy" accepted in TheWebConf (WWW2024) as a research paper.

1. Overview

We introduce a new plug-and-play ToTER framework which improves PLM-based retrieval using a corpus topical taxonomy.

(Training phase) Taxonomy-guided topic class relevance learning

The taxonomy reveals the latent structure of the whole corpus. To exploit it for retrieval, we first connect the corpus-level knowledge to individual documents. We formulate this step as an unsupervised multi-label classification, assessing the relevance of each document to each topic class without document-topic labels.

(Inference phase) Topical taxonomy-enhanced retrieval

ToTER consists of three strategies to complement the existing retrieve-then-rerank pipeline: (1) search space adjustment, (2) class relevance matching, and (3) query enrichment by core phrases. Each strategy is designed to gradually focus on fine-grained ranking.

2. How to use

Please refer to 'Guide to using ToTER.ipynb' file.

3. Resources

About

Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy (WWW'24)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published