You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At iteration 0, the p_{\theta_0}=p_{SFT} and the global optimal p_{\theta_1} after iteration 1 of following objecitve will still be p_{SFT} . Thus, the following iterations of p_{\theta} will always be p_{SFT}.
In your theoretical analysis, you also prove that p_{\theta*}=p_{data}; however, p_{SFT}=p_{\theta*} by minimizing the cross-entropy. I believe the reason why SPIN outperforms p_{SFT} is not clear. Could you please explain this?
The text was updated successfully, but these errors were encountered:
RL4LLM
changed the title
Theoretical Analysis and Idea of SPIN may not make sense??
Theoretical Analysis and Idea of SPIN are quite weird??
Apr 20, 2024
RL4LLM
changed the title
Theoretical Analysis and Idea of SPIN are quite weird??
Theoretical Analysis and Idea of SPIN are quite weird (may not make senses)??
Apr 20, 2024
The text was updated successfully, but these errors were encountered: