Official repository for "KTRL+F: Knowledge-Augmented In-Document Search"
We introduce a new problem KTRL+F, a knowledge-augmented in-document search task that necessitates real-time identification of all semantic targets within a document with the awareness of external sources through a single natural query. This task addresses following unique challenges for in-document search: 1) utilizing knowledge outside the document for extended use of additional information about targets to bridge the semantic gap between the query and the targets, and 2) balancing between real-time applicability with the performance.
In the absence of suitable benchmarks for evaluating the KTRL+F, we introduce a data construction pipeline that aims to collect sets of key components for KTRL+F: input document, query, corresponding targets, and external knowledge. Full pipeline codes are under ktrlf_dataset_pipeline.
The dataset is available under ktrlf_dataset_pipeline/ktrlf_dataset.
We release Chrome Extension Plugin built on our model under ktrlf_plugin folder.
Please cite if you use the data or code in this repo.
@inproceedings{oh-etal-2024-ktrl,
title = "{KTRL}+{F}: Knowledge-Augmented In-Document Search",
author = "Oh, Hanseok and
Shin, Haebin and
Ko, Miyoung and
Lee, Hyunji and
Seo, Minjoon",
editor = "Duh, Kevin and
Gomez, Helena and
Bethard, Steven",
booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
month = jun,
year = "2024",
address = "Mexico City, Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.naacl-long.134",
doi = "10.18653/v1/2024.naacl-long.134",
pages = "2416--2436"
}