Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

保护性场景的数据集 #12

Open
logic-rz opened this issue Jul 26, 2024 · 2 comments
Open

保护性场景的数据集 #12

logic-rz opened this issue Jul 26, 2024 · 2 comments

Comments

@logic-rz
Copy link

作者您好,请问文章中提到的保护性场景的数据在哪里,我看代码中没有,找不到对应的文件夹

@choosewhatulike
Copy link
Owner

Training datasets can be downloaded at 🤗 this Link, which contains nine characters experience data used to train Character-LLMs. To download the dataset, please run the following code with Python, and you can find the downloaded data in /path/to/local_dir.

from huggingface_hub import snapshot_download
snapshot_download(
    local_dir_use_symlinks=True, 
    repo_type="dataset",
    repo_id="fnlp/character-llm-data", 
    local_dir="/path/to/local_dir")

The prompted/ contains datasets that can be used for supervised fine-tuning directly. And generated/ consists of raw data that generated by gpt-3.5-turbo, which can be converted into prompted style. Here is the statistics of the training data.

数据集的获取方法已经放在README中,下载的generated/下为正常dialogue数据和保护性数据。

@logic-rz
Copy link
Author

logic-rz commented Jul 27, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants