Pivotal Token Search
-
Updated
Jul 15, 2025 - Python
Pivotal Token Search
A Survey of Direct Preference Optimization (DPO)
Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning", AAAI 2025
[ICML 2025] TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization
A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications
Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)
RankPO: Rank Preference Optimization
The Rap Music Generator project is an innovative LLM-based tool designed to create rap lyrics. It offers multiple fine-tuning approaches to accommodate diverse rap generation techniques, providing users with a versatile platform for generating unique and stylistically varied content.
Evaluate how LLaMA 3.1 8B handles paraphrased adversarial prompts targeting refusal behavior.
Experiments, and how-to guide for the lecture "Large language models for Scientometrics"
Notebooks to create an instruction following version of Microsoft's Phi 1.5 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)
[CC 2025] [Official code] - Engaging preference optimization alignment in large language model for continual radiology report generation: A hybrid approach
[ICML 2025 Workshop FM4BS] AnnoDPO: Protein Functional Annotation Learning with Direct Preference Optimization
EPFLLaMA: A lightweight language model fine-tuned on EPFL curriculum content. Specialized for STEM education and multiple-choice question answering. Implements advanced techniques like SFT, DPO, and quantization.
Enhancing paraphrase-type generation using Direct Preference Optimization (DPO) and Reinforcement Learning from Human Feedback (RLHF), with large-scale HPC support. This project aligns model outputs to human-ranked data for robust, safety-focused NLP.
EPFLLaMA: A lightweight language model fine-tuned on EPFL curriculum content. Specialized for STEM education and multiple-choice question answering. Implements advanced techniques like SFT, DPO, and quantization.
Add a description, image, and links to the direct-preference-optimization topic page so that developers can more easily learn about it.
To associate your repository with the direct-preference-optimization topic, visit your repo's landing page and select "manage topics."