Dataset and code for EMNLP 2022 paper ''COPEN: Probing Conceptual Knowledge in Pre-trained Language Models''. COPEN is a COnceptual knowledge Porobing bENchmark which aims to analyze the conceptual understanding capabilities of Pre-trained Language Models (PLMs). Specifically, COPEN consists of three tasks:
- Conceptual Similarity Judgment (CSJ). Given a query entity and several candidate entities, the CSJ task requires to select the most conceptually similar candidate entity to the query entity.
- Conceptual Property Judgment (CPJ). Given a statement describing a property of a concept, PLMs need to judge whether the statement is true.
- Conceptualization in Contexts (CiC). Given a sentence, an entity mentioned in the sentence, and several concept chains of the entity, PLMs need to select the most appropriate concept according to the context for the entity.
Extensive experiments on different sizes and types of PLMs show that existing PLMs systematically lack conceptual knowledge and suffer from various spurious correlations. We believe this is a critical bottleneck for realizing human-like cognition in PLMs. More concept-aware objectives or architectures are needed to develop conceptual knowledgeable PLMs.
To get the test results, you need to submit your results to codalab.
The code repository is based on Pytorch
and Transformers
. Please use the following command to install all
the necessary dependcies.
pip install -r requirements.txt
The COPEN benchmark is placed on Tsinghua Cloud, please use the following command to download the datasets and place them in the propor path.
cd data/
wget --content-disposition https://cloud.tsinghua.edu.cn/f/f0b33fb429fa4575aa7f/?dl=1
unzip copen_data.zip
mkdir task1/data
mkdir task2/data
mkdir task3/data
mv copen_data/task1/* task1/data
mv copen_data/task2/* task2/data
mv copen_data/task3/* task3/data
cd task1
python probing_data_processor.py
cd ../
cd task2
python probing_data_processor.py
cd ../
cd task3
python probing_data_processor.py
cd ../
python processor_utils.py task1 mc
python processor_utils.py task2 sc
python processor_utils.py task3 mc
cd code/probing
bash task1/run.sh 0 bert bert-base-uncased
bash task2/run.sh 0 bert bert-base-uncased
bash task3/run.sh 0 bert bert-base-uncased
cd code/finetuning
cd task1/
bash ../run.sh 0 bert bert-base-uncased task1 mc 42
cd task2/
bash ../run.sh 0 bert bert-base-uncased task2 sc 42
cd task3/
bash ../run.sh 0 bert bert-base-uncased task3 mc 42
If our codes or benchmark help you, please cite us:
@inproceedings{peng2022copen,
title={COPEN: Probing Conceptual Knowledge in Pre-trained Language Models},
author={Peng, Hao and Wang, Xiaozhi and Hu, Shengding and Jin, Hailong and Hou, Lei and Li, Juanzi and Liu, Zhiyuan and Liu, Qun},
booktitle={Proceedings of EMNLP},
year={2022}
}