-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Hi, I notice the data preprocessing code in "verl/dataset/bird.py":
# 过滤test_dataset中prompt长度超过8000的项
test_dataset = test_dataset.filter(lambda x: len(x["instruction"]) <= 10000)
train_dataset = train_dataset.filter(lambda x: len(x["instruction"]) <= 11000)
length for train.parquet 8454
length for test.parquet 1405
Does this mean the experiment scores were completed on a subset of BIRD-DEV, after removing some of the very long ones?
Metadata
Metadata
Assignees
Labels
No labels