-
Notifications
You must be signed in to change notification settings - Fork 34
Open
Description
Thank you for your excellent work on the recent paper. I noticed that some training hyperparameters were provided, such as:
- Learning rate: 1e-4
- Iterations: 1500
- Number of data pairs: 6,000
I’m writing to ask if you could kindly share more details about the DPO training setup—specifically the batch size, gradient accumulation steps, and the value of beta_dpo used during training.
Thank you in advance for your time and assistance.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels