Source Code: The base code for this repository has been taken from https://github.com/allenai/propara
The TechTrack dataset is track properties of diverse set of entities in technical procedural documents. To understand the format of the dataset, refer to the ProPara dataset:
Reasoning about Actions and State Changes by Injecting Commonsense Knowledge, Niket Tandon, Bhavana Dalvi Mishra, Joel Grus, Wen-tau Yih, Antoine Bosselut, Peter Clark, EMNLP 2018
These models are built using the PyTorch-based deep-learning NLP library, AllenNLP.
- ProLocal: A simple local model that takes a sentence and entity as input and predicts state changes happening to the entity.
- Bert: A Bert-based classifier that takes as input a natural query and step text and builds a linear classifier on top of CLS embedding.
ProLocal and Bert models are described in our paper. The setups are also described in brief in dataset/README.md
.
- Create the
techtrack
environment using Anaconda
conda create -n techtrack python=3.7
- Activate the environment
source activate techtrack
- Install the requirements in the environment:
pip install -r requirements.txt
Detailed instructions are given in the following READMEs:
Make sure to place the data files for the desired setup from dataset/
to data/Inputs
. Continue to Scripts to use section for more instructions.
Use various scripts in the root folder to run training and testing of various models and datasets
run_all.sh
: train Bert model on all properties (including combined model)run_all_comp.sh
: train all models with all-False rows includedrun_all_test.sh
: train isOpened model and test on all foldersrun_bert_test.sh
: train combined Bert model and test on all propertiesrun_prolocal_all.sh
: train ProLocal model for all properties
For each setup i and model M the data files are in dataset/setup_{i}/{M}/
which has to be placed in data/Inputs/
. This can also be done using the helper script transport_data.sh
whose synopsis is:
./transport_data.sh setup-num model
where
setup-num
is the setup number from{1,2,3}
.model
is the model name. Choose from{bert,prolocal}
forsetup-num = 1
and{bert}
forsetup-num = 2, 3
.
Use the scripts from "data processing scripts" folder to parse data from wikihow pages and parse Brat output files to usable dataset formats
- State change type dataset for ProLocal model
- Natural Query, step and change type dataset for Bert model
The raw subfolder also contains some un-documented, intermediate and raw scripts which need not be used but are present in case needed.