Skip to content

Latest commit

 

History

History
32 lines (22 loc) · 3.6 KB

README.md

File metadata and controls

32 lines (22 loc) · 3.6 KB

Predicting concreteness in context for English and Italian by using distributional models and behavioural norms

We provide an implementation of the models with which we participated (as team Andi) in the CONcreTEXT task of EVALITA 2020. The task involved predicting subjective ratings of concreteness for words presented in context. Our approach, which ranked first in both the English and Italian tracks, relies on a combination of context-dependent and context-independent distributional models, together with behavioural norms.

If you want to test our models, you can run the English and Italian demos (see the two Jupyter notebooks). Please feel free to experiment with your own combinations of stimuli, norms, and models, once you make sure that they are in the proper format (see the information provided below). If you get interesting results, please let us know! 🙂

Before you start

In order to be able to successfully run the demos, you first need to do the following things:

  1. Create a dedicated Python environment (highly recommended) and install the necessary libraries. Start by installing pytorch. Next, run the following command:
pip install notebook pandas scipy scikit-learn transformers
  1. Place the necessary files in their corresponding directories, as follows:
  • Put the files 'CONcreTEXT_trial_EN.tsv' and 'CONcreTEXT_trial_IT.tsv' in the 'stimuli' folder. The two files can be obtained from the dedicated OSF project.

  • (Optional) Put the behavioural norms in the 'behavioural-norms' folder. Each file must be in .CSV format and have a header with the variable names (e.g., Word,Frequency,SemanticDiversity,...). The first column ('Word') must contain the normed words, while the other columns must contain the behavioural data. For copyright reasons, we cannot upload the norms we used for our submission, but you can download them yourself by using the following links (just remember to convert them to the right format, keeping only the columns of interest):

  • (Optional) Put the context-independent embeddings (i.e., models) in the 'context-independent-models' folder. Each file must be in .CSV format, but with no header. The first column must contain the words, while the other columns must contain the word vectors. The models we used for our submission can be downloaded from the dedicated OSF project.

  1. (Optional) Make sure you have enough disk space for the context-dependent models (i.e., Hugging Face transformers). If you wish to change the location where the models are stored, uncomment the first two lines in the demo code and replace <new_cache_folder_path> with your chosen location. If you decide to use such models, keep in mind that it might take some time for the download, given that the size of most models is around 500MB.