Build software better, together

codelion / pts

Pivotal Token Search

Updated Jul 15, 2025
Python

liushunyu / awesome-direct-preference-optimization

A Survey of Direct Preference Optimization (DPO)

review survey alignment preference-learning dpo large-language-models llm llms large-language-model reinforcement-learning-from-human-feedback direct-preference-optimization

Updated Jul 4, 2025

mlvlab / VidChain

Star

Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning", AAAI 2025

dense-video-captioning long-video-understanding multimodal-large-language-models direct-preference-optimization aaai2025

Updated Jan 26, 2025
Python

dvlab-research / TGDPO

Star

[ICML 2025] TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization

alignment preference-learning large-language-models llm rlhf preference-alignment direct-preference-optimization preference-optimization

Updated Jul 15, 2025
Python

Mr-Loevan / DPO-Survey

Star

A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications

reading-list awesome-list direct-preference-optimization

Updated Jul 14, 2025

rasyosef / phi-2-sft-and-dpo

Star

Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

transformers pytorch huggingface trl llm supervised-finetuning direct-preference-optimization

Updated Nov 27, 2024
Jupyter Notebook

yflyzhang / RankPO

Star

RankPO: Rank Preference Optimization

information-retrieval dpo large-language-models llm rlhf rlaif reinforcement-learning-human-feedback direct-preference-optimization

Updated Mar 17, 2025
Python

artaasd95 / rap-music-generator

Star

The Rap Music Generator project is an innovative LLM-based tool designed to create rap lyrics. It offers multiple fine-tuning approaches to accommodate diverse rap generation techniques, providing users with a versatile platform for generating unique and stylistically varied content.

python machine-learning huggingface large-language-models supervised-finetuning llm-training direct-preference-optimization

Updated May 18, 2025
Jupyter Notebook

cluebbers / adverserial-paraphrasing

Star

Evaluate how LLaMA 3.1 8B handles paraphrased adversarial prompts targeting refusal behavior.

reinforcement-learning deep-learning redteam direct-preference-optimization

Updated May 26, 2025
Jupyter Notebook

akhilpandey95 / LLMSciSci

Sponsor

Star

Experiments, and how-to guide for the lecture "Large language models for Scientometrics"

reproducibility scientometrics in-context-learning llms finetuning-llms direct-preference-optimization

Updated Apr 16, 2025
Jupyter Notebook

rasyosef / phi-1_5-instruct

Star

Notebooks to create an instruction following version of Microsoft's Phi 1.5 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

transformers pytorch trl llm supervised-finetuning direct-preference-optimization

Updated Aug 17, 2024

AI-14 / r2gpoallm

Star

[CC 2025] [Official code] - Engaging preference optimization alignment in large language model for continual radiology report generation: A hybrid approach

natural-language-processing bioinformatics deep-learning transformer medical-image-analysis alignment-strategies chest-xrays large-language-models radiology-report-generation direct-preference-optimization

Updated May 4, 2025
Python

AzusaXuan / AnnoDPO

Star

[ICML 2025 Workshop FM4BS] AnnoDPO: Protein Functional Annotation Learning with Direct Preference Optimization

multimodal-learning protein-function-prediction protein-language-models direct-preference-optimization

Updated Jun 13, 2025
Python

eliashornberg / EPFLLaMA

Star

EPFLLaMA: A lightweight language model fine-tuned on EPFL curriculum content. Specialized for STEM education and multiple-choice question answering. Implements advanced techniques like SFT, DPO, and quantization.

natural-language-processing pytorch artificial-intelligence lora large-language-models supervised-finetuning direct-preference-optimization

Updated Sep 15, 2024
Jupyter Notebook

cluebbers / dpo-rlhf-paraphrase-types

Star

Enhancing paraphrase-type generation using Direct Preference Optimization (DPO) and Reinforcement Learning from Human Feedback (RLHF), with large-scale HPC support. This project aligns model outputs to human-ranked data for robust, safety-focused NLP.

reinforcement-learning deep-learning transformers alignment paraphrase-generation human-feedback direct-preference-optimization paraphrase-type-generation

Updated Jun 4, 2025
Jupyter Notebook

AliBakly / EPFLLaMA

Star

EPFLLaMA: A lightweight language model fine-tuned on EPFL curriculum content. Specialized for STEM education and multiple-choice question answering. Implements advanced techniques like SFT, DPO, and quantization.

natural-language-processing artificial-intelligence lora large-language-models supervised-finetuning direct-preference-optimization

Updated Sep 23, 2024
Jupyter Notebook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

direct-preference-optimization

Here are 16 public repositories matching this topic...

codelion / pts

liushunyu / awesome-direct-preference-optimization

mlvlab / VidChain

dvlab-research / TGDPO

Mr-Loevan / DPO-Survey

rasyosef / phi-2-sft-and-dpo

yflyzhang / RankPO

artaasd95 / rap-music-generator

cluebbers / adverserial-paraphrasing

akhilpandey95 / LLMSciSci

rasyosef / phi-1_5-instruct

AI-14 / r2gpoallm

AzusaXuan / AnnoDPO

eliashornberg / EPFLLaMA

cluebbers / dpo-rlhf-paraphrase-types

AliBakly / EPFLLaMA

Improve this page

Add this topic to your repo