You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+5-5
Original file line number
Diff line number
Diff line change
@@ -9,14 +9,14 @@ We will keep updating this repository these days.
9
9
If you use or extend our work, please cite our paper at ACL2020.
10
10
11
11
```
12
-
@inproceedings{tian-etal-2020-improving,
13
-
title = "Improving {C}hinese Word Segmentation with Wordhood Memory Networks",
14
-
author = "Tian, Yuanhe and Song, Yan and Xia, Fei and Zhang, Tong and Wang, Yonggang",
12
+
@inproceedings{tian-etal-2020-joint,
13
+
title = "Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-way Attentions of Auto-analyzed Knowledge",
14
+
author = "Tian, Yuanhe and Song, Yan and Ao, Xiang and Xia, Fei and Quan, Xiaojun and Zhang, Tong and Wang, Yonggang",
15
15
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
16
16
month = jul,
17
17
year = "2020",
18
18
address = "Online",
19
-
pages = "8274--8285",
19
+
pages = "8286--8296",
20
20
}
21
21
```
22
22
@@ -42,7 +42,7 @@ Run `run_sample.sh` to train a model on the small sample data under the `sample_
42
42
43
43
We use [CTB5](https://catalog.ldc.upenn.edu/LDC2005T01), [CTB6](https://catalog.ldc.upenn.edu/LDC2007T36), [CTB7](https://catalog.ldc.upenn.edu/LDC2010T07), [CTB9](https://catalog.ldc.upenn.edu/LDC2016T13), and [Universal Dependencies 2.4](https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2988) (UD) in our paper.
44
44
45
-
To obtain and pre-process the data, you can go to `data_preprocessing` directory and run `getdata.sh`. This script will download and process the official data from UD. For CTB5 (LDC05T01), CTB6 (LDC07T36), CTB7 (LDC10T07), and CTB9 (LDC2016T13), you need to obtain the official data yourself, and then put the raw data directory under the `data_preprocessing` directory.
45
+
To obtain and pre-process the data, you can go to `data_preprocessing` directory and run `getdata.sh`. This script will download and process the official data from UD. For CTB5 (LDC05T01), CTB6 (LDC07T36), CTB7 (LDC10T07), and CTB9 (LDC2016T13), you need to obtain the official data yourself, and then put the raw data folder under the `data_preprocessing` directory.
46
46
47
47
The script will also download the [Stanford CoreNLP Toolkit v3.9.2](https://stanfordnlp.github.io/CoreNLP/history.html) (SCT) and [Berkeley Neural Parser](https://github.com/nikitakit/self-attentive-parser) (BNP) to obtain the auto-analyzed syntactic knowledge. You can refer to their website for more information.
0 commit comments