Joint work between Adobe Research and Auburn University
Thang Pham, Seunghyun Yoon, Trung Bui, and Anh Nguyen.
Table of Contents
Phrase in Context is a curated benchmark for phrase understanding and semantic search, consisting of three tasks of increasing difficulty: Phrase Similarity (PS), Phrase Retrieval (PR) and Phrase Sense Disambiguation (PSD). The datasets are annotated by 13 linguistic experts on Upwork and verified by two groups: ~1000 AMT crowdworkers and another set of 5 linguistic experts. PiC benchmark is distributed under CC-BY-NC 4.0.
🌟 Official implementation to reproduce most of the main results in our paper PiC: A Phrase-in-Context Dataset for Phrase Understanding and Semantic Search.
🌞 Project Link: https://phrase-in-context.github.io/
🔥 Online Web Demo: https://aub.ie/phrase-search
If you use our PiC dataset or software, please consider citing:
@article{pham2022PiC,
title={PiC: A Phrase-in-Context Dataset for Phrase Understanding and Semantic Search},
author={Pham, Thang M and Yoon, Seunghyun and Bui, Trung and Nguyen, Anh},
journal={arXiv preprint arXiv:2207.09068},
year={2022}
}
- Anaconda 4.10 or higher
- Python 3.9 or higher
- pip version 21 or higher
-
Create a new folder and clone the repo
mkdir phrase-in-context && cd "$_" git clone https://github.com/Phrase-in-Context/eval.git && cd eval
-
Create and activate a Conda environment
conda create -n pic_eval python=3.9 conda activate pic_eval
-
Install required libraries
pip install -r requirements.txt bash extra_requirements.sh
Change directory to the similarity
folder
cd similarity/
bash run_eval_ranking.sh
bash run_eval_cls.sh
- Please note that the default setting for both approaches is non-contextualized phrase embeddings. For the contextualized setting, we need to uncomment the argumnet
--contextual
in the script. - The results are stored under the folder
../results/phrase_similarity/ranking
for the first approach and../results/phrase_similarity/classification
for the second approach.
Change directory to the retrieval_ranking
folder
cd retrieval_ranking/
For this approach, we do not train the models. Instead, we evaluate them by comparing the phrase representations of a query with that of phrase candidates encoded by the models' encoders.
export DATASET="phrase_retrieval"
export DATASET_CONFIG="PR-pass"
export MODEL="BERT-base"
export CONTEXTUAL=False # Set it to True for contextualized setting
bash run_eval.sh evaluate_model "${DATASET}" "${DATASET_CONFIG}" "${MODEL}" "${CONTEXTUAL}"
- Note that the default setting is non-contextualized phrase embeddings. For the contextualized setting, we need to change the export value of
CONTEXTUAL
to True. DATASET_CONFIG
can be exported to the followings:PR-pass
orPR-page
MODEL
can be exported to the followings: :BERT-base
,BERT-large
,PhraseBERT
,SpanBERT
,SentenceBERT
,SimCSE
, andUSE
.- The results and log file are stored under the folder
../results/phrase_retrieval/${DATASET_CONFIG}/ranking
Change directory to the retrieval_qa
folder
cd retrieval_qa/
First, we train a Q/A model (e.g., BERT-base) on one of two versions of PR dataset: PR-pass or PR-page
export DATASET="phrase_retrieval"
export DATASET_CONFIG="PR-pass"
export MODEL="BERT-base"
bash train_qa.sh finetune_model "${DATASET}" "${DATASET_CONFIG}" "${MODEL}"
Then, we evaluate the newly trained Q/A model as follows
bash eval_qa.sh evaluate_model "${DATASET}" "${DATASET_CONFIG}" "${MODEL}"
DATASET_CONFIG
can be exported to the followings:PR-pass
orPR-page
MODEL
can be exported to the followings: :BERT-base
,BERT-large
,PhraseBERT
,SpanBERT
,SentenceBERT
,SimCSE
,Longformer-base
andLongformer-large
.- The results and log file of training and evaluation are stored under the folder
../results/phrase_retrieval/${DATASET_CONFIG}/qa
Training and evaluating a Q/A model on PSD dataset is quite similar to PR's approach 2. All we need is to update the dataset and its config as follows and keep the rest part unchanged.
export DATASET="phrase_sense_disambiguation"
export DATASET_CONFIG=""
export MODEL="BERT-base"
# For training
bash train_qa.sh finetune_model "${DATASET}" "${DATASET_CONFIG}" "${MODEL}"
# For evaluation
bash eval_qa.sh evaluate_model "${DATASET}" "${DATASET_CONFIG}" "${MODEL}"
DATASET_CONFIG
can only be exported to an empty string""
since PSD has only one version. RegardingMODEL
, you can follow the provided list in the PR section to train and evaluate other models.- The results and log file of training and evaluation are stored under the folder
../results/phrase_sense_disambiguation/qa
See the open issues for a full list of proposed features (and known issues).
Distributed under the MIT License.
The entire code was done and maintained by Thang Pham, @pmthangxai - tmp0038@auburn.edu. Contact us via email or create github issues if you have any questions/requests. Thanks!
- Huggingface. 2022. transformers/examples/pytorch/question-answering at main · huggingface/transformers. https://github.com/huggingface/transformers/tree/main/ examples/pytorch/question-answering.
- Shufan Wang, Laure Thompson, and Mohit Iyyer. 2021. Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10837–10851, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Jinhyuk Lee, Mujeen Sung, Jaewoo Kang, and Danqi Chen. 2021. Learning Dense Representations of Phrases at Scale. In Association for Computational Linguistics (ACL) DensePhrase
- Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The long-document transformer.