Skip to content

Script for data generation #2

@LuLuLuyi

Description

@LuLuLuyi

Hi,

Thank you for your work! The paper and the implementation are incredibly interesting. I noticed that you provided a constructed dataset of 12,000 samples in the project, with a maximum length of 3,000 tokens for each sample.

However, for my experimental needs, I need to generate data with longer token lengths. Could you kindly provide the script used for data generation or explain how to modify the existing code to generate longer samples?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions