New options for preference tuning: rpo alpha, logprobs normalization, reference-free, simpo gamma #842
Annotations
1 error
|
pre-commit
Process completed with exit code 1.
|