-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DPO support for DeepSpeed-Chat #828
base: master
Are you sure you want to change the base?
Conversation
Accidentally closed the PR.. Sorry :( |
Hi, Thanks for sharing this. Any updates about this PR? Are we planning to merge this feature in? |
Yes, of course! The code for step2 DPO seems to be running as expected. I have done my best to make the code easy to understand and maintain the same style as the other steps. After reviewing the PR again just now, it appears to be missing a README. I will complete it ASAP. I really appreciate your reminder :) |
@stceum, thanks for this great contribution. Apologies for the delayed review. I have approved. Please let me know when it is good to merge. |
Considering the advantages of DPO(Direct Preference Optimization) as being "stable, performant, and computationally lightweight, eliminating the need for fitting a reward model, sampling from the LM during fine-tuning, or performing significant hyperparameter tuning", we add DPO support for DeepSpeed-Chat.