This repository contains a Japanese corpus used in the following paper:
Shohei Tanaka, Koichiro Yoshino, Katsuhito Sudoh, Satoshi Nakamura. ``ARTA: Collection and Classification of Ambiguous Requests and Thoughtful Actions'', The 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), July, 2021, Singapore
https://arxiv.org/abs/2106.07999
The corpus containing 27,230 user requests has been split into train.json:valid.json:test.json = 24,430:1,400:1,400. You can refer to load_data.py for data loading. Each line is a json format dictionary contains following keys.
- idx: unique data idx
- utterance: user utterance (request)
- response: system response (action)
- function: system action function
- category: system action category
- multilabel: additional system action categories (only the test data and a part of valid data contains this key.)
ARTA Corpus (c) by Shohei Tanaka, Koichiro Yoshino, Katsuhito Sudoh, Satoshi Nakamura
Copyright (c) 2021- Augmented Human Communication Laboratory, Nara Institute of Science and Technology
ARTA Corpus is licensed under a
Creative Commons Attribution 4.0 International (CC BY 4.0)
Please cite the following paper when you use the corpus.
@misc{tanaka2021arta,
title={ARTA: Collection and Classification of Ambiguous Requests and Thoughtful Actions},
author={Shohei Tanaka and Koichiro Yoshino and Katsuhito Sudoh and Satoshi Nakamura},
year={2021},
eprint={2106.07999},
archivePrefix={arXiv},
primaryClass={cs.CL}
}