This is the code for the EMNLP 2020 paper "Partially-Aligned Data-to-Text Generation with Distant Supervision". Traditional text generation task requires well-aligned data which is expensive to annotate. We relax the strict restrictions and propose this new task aiming at utilizing automatically made partially-aligned data. This method considerably expands the application domains where only automatically partially-aligned data is available.
- GCC >= 4.8
- Python >= 3.7
git clone https://github.com/fuzihaofzh/distant_supervision_nlg.git
cd distant_supervision_nlg
./scripts/setup.sh
./scripts/preprocess.sh wita50k
The model will be evaluated automatically during training.
# Train S2ST model
./scripts/train.sh wita50k base
# Check Score
tail -n 1 output/eval/wita50k__base/eval.100.txt
The model will be evaluated automatically during training.
# Step 1. SE Training
./scripts/train.sh wita50k endorsement,pretrain
# Step 2. S2SG Training
./scripts/train.sh wita50k endorsement,beam_endorse
# Check Score
tail -n 1 output/eval/wita50k__endorsement,beam_endorse/eval.100.txt
@inproceedings{fu2020partially,
title={Partially-Aligned Data-to-Text Generation with Distant Supervision},
author={Fu, Zihao and Shi, Bei and Lam, Wai and Bing, Lidong and Liu, Zhiyuan},
booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
pages={9183--9193},
year={2020}
}