Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hello, I would like to ask, when you are training the model, do you only use the first round of dialogue from the ultrachat_200k? #37

Open
jackwwy opened this issue Jul 21, 2024 · 1 comment

Comments

@jackwwy
Copy link

jackwwy commented Jul 21, 2024

def load_and_process_data_ultrachat(dataset_name, split): try: dataset = load_dataset(dataset_name, split=split) reformatted_data = [{ 'generated': [message['messages'][0], {"role": "assistant", "content": ""}], 'real': [message['messages'][0], message['messages'][1]] } for message in dataset] return reformatted_data except Exception as e: logging.error(f"Error loading or processing dataset: {e}") return []

@junming-yang
Copy link

Yes. Only the first round of real dialogue dataset is sampled from ultrachat 200k.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants