update readme and requirements

Yh Tian · Yh Tian · commit 5ea826a95cda · 2020-07-08T00:07:29.000-07:00
diff --git a/README.md b/README.md
@@ -26,6 +26,18 @@ Our code works with the following environment.
 * `python=3.6`
 * `pytorch=1.1`
 
+To run [Stanford CoreNLP Toolkit](https://stanfordnlp.github.io/CoreNLP/cmdline.html), you need 
+* `Java 8`
+
+To run [Berkeley Neural Parser](https://github.com/nikitakit/self-attentive-parser), you need
+* `tensorfolw==1.13.1`
+* `benepar[cpu]`
+* `cython`
+
+Note that Berkeley Neural Parser does not support `TensorFlow 2.0`.
+
+You can refer to their websites for more information.
+
 ## Downloading BERT and ZEN
 
 In our paper, we use BERT ([paper](https://www.aclweb.org/anthology/N19-1423/)) and ZEN ([paper](https://arxiv.org/abs/1911.00720)) as the encoder.
diff --git a/data_preprocessing/README.md b/data_preprocessing/README.md
@@ -9,8 +9,8 @@ Run `getdata.sh` under that directory to obtain and pre-process the data. This s
 
 This script will also download the [Stanford CoreNLP Toolkit v3.9.2](https://stanfordnlp.github.io/CoreNLP/history.html) (SCT) and [Berkeley Neural Parser](https://github.com/nikitakit/self-attentive-parser) (BNP) from their official website, which are used to obtain the auto-analyzed syntactic knowledge. If you only want to use the knowledge from SCT, you can comment out the script to download BNP in `getdata.sh`. If you want to use the auto-analyzed knowledge from BNP, you need to download both SCT and BNP, because BNP relies on the segmentation results from SCT. 
 
-To run SCT, you need `java 1.8`; to run BNP, you need `tensorflow`.
+To run SCT, you need `java 8`; to run BNP, you need `tensorflow==1.1.3`.
 
-You can refer to their website for more information.
+You can refer to their websites for more information.
 
 All processed data will appear in `data` directory organized by the datasets, where each of them contains the files with the same file names in the `sample_data` folder.
diff --git a/data_preprocessing/getdata.sh b/data_preprocessing/getdata.sh
@@ -1,6 +1,7 @@
 ############## process data ##############
 
 # download Universal Dependencies 2.4
+# If this step fails, you can manually download the file and put it under this directory
 wget https://lindat.mff.cuni.cz/repository/xmlui/bitstream/handle/11234/1-2988/ud-treebanks-v2.4.tgz
 
 tar zxvf ud-treebanks-v2.4.tgz
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,12 @@
+pytorch == 1.1.0
+tensorflow-gpu==1.13.1
+tqdm
+nltk
+pandas
+boto3
+requests
+regex
+seqeval
+psutil
+cython
+benepar[cpu]