Repository for short paper accepted in EMNLP 2021 Findings, you may find our paper here
You may download our trained models from here in directory trained_models/
, and uncompress it.
Get Wikidata5m Pre-trained embeddings (TransE, DistMult, ComplEx, RotatE) from here, and put inside the directory data/wiki5m
. Since we only work around human-related triples, we filtered and saved needed entities and relations as human_ent_rel_sorted_list.pkl
in directory data/wiki5m
.
Run the following commands to first save human-relate embeddings, and then wrap into its corresponding pykeen trained model which will be saved in the directory trained_models/wiki5m
python process_wiki5m.py
mkdir -p trained_models/wiki5m
python wrap_wiki5m.py
To classify the entities according to the target relation, please refer to the code in experiments/run_tail_prediction.py In the paper as well as the code files, the target relation is profession - meaning that we train a classifier on the task of predicting the profession for each entity.
Pre-computed dataframes with the tail predictions (i.e. classifications) for profession in each of the embedding methods can be found under the folder preds_dfs. These can be used to directly calculate the bias measurements.