Reinforcement Learning for Mathematical Reasoning (RLHF4Math)

This repository is the reading list on Reinforcement Learning Human Feedback for Mathematical Reasoning (RLHF4Math). Related papers, web blogs, datasets, and classic books are included.

Contributors: yehangcheng @XXX, Wenhao Yan @NEU, Chen Bo @XXX

🔔 If you have any suggestions or notice something we missed, please don't hesitate to let us know. You can directly email Wenhao Yan (wenhao19990616@gmail.com), or post an issue on this repo.

🦁 Papers

Related Survey

This section mainly covers surveys about RLHF for math, but also covers other areas.

Reasoning with Language Model Prompting: A Survey, [paper]
Towards Reasoning in Large Language Models: A Survey, [paper]
A Survey of Deep Learning for Mathematical Reasoning, [paper]
A Survey on In-context Learning, [paper]
A Survey on Transformers in Reinforcement Learning, [paper]

Math Reasoning

This section covers RLHF for math reasoning.

Solving Math Word Problems via Cooperative Reasoning induced Language Models, [paper]
Interpretable Math Word Problem Solution Generation via Step-by-step Planning, [paper]
Compositional Mathematical Encoding for Math Word Problems, [paper]
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models, [paper]
GeoDRL: A Self-Learning Framework for Geometry Problem Solving using Reinforcement Learning in Deductive Reasoning, [paper]
A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models, [paper]
Math Word Problem Solving by Generating Linguistic Variants of Problem Statements, [paper]
Making Language Models Better Reasoners with Step-Aware Verifier, [paper]
ALERT: Adapting Language Models to Reasoning Tasks, [paper]
Distilling Reasoning Capabilities into Smaller Language Models, [paper]
Preference Ranking Optimization for Human Alignment, [paper]
Direct Preference Optimization:Your Language Model is Secretly a Reward Model, [paper]
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning, [paper]
SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions, [paper]
OFFLINE REINFORCEMENT LEARNING WITH IMPLICIT Q-LEARNING, [paper]
OFFLINE RL FOR NATURAL LANGUAGE GENERATION WITH IMPLICIT LANGUAGE Q LEARNING, [paper]
💐 Chain of Thought Prompting Elicits Reasoning in Large Language Models, [paper]
LEAST-TO-MOST PROMPTING ENABLES COMPLEX REASONING IN LARGE LANGUAGE MODELS, [paper]
💐 Let’s Verify Step by Step, [paper]
MathPrompter Mathematical Reasoning using Large Language Models, [paper]
Measuring Mathematical Problem Solving With the MATH Dataset, [paper]
💐 Solving math word problems with process and outcome-based feedback, [paper]
💐 STaR Self-Taught Reasoner Bootstrapping Reasoning With Reasoning, [paper]
💐 Training Verifiers to Solve Math Word Problems, [paper]

RLHF suplement

This section covers other research and general content for RLHF.

CodeRL Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning, [paper]
RAFT Reward rAnked FineTuning for Generative Foundation Model Alignment, [paper]
IS REINFORCEMENT LEARNING (NOT) FOR NATURAL LANGUAGE PROCESSING BENCHMARKS, BASELINES, AND BUILDING BLOCKS FOR NATURAL LANGUAGE POLICY OPTIMIZATION, [paper]
WizardCoder: Empowering Code Large Language Models with Evol-Instruct, [paper]
Constitutional AI: Harmlessness from AI Feedback, [paper]

RL supplement

This section covers RL prerequisite foundation for RLHF.

Trust Region Policy Optimization, [paper]
💐 Proximal Policy Optimization Algorithms, [paper]
Mastering the Game of Go without Human Knowledge, [paper]
Deep reinforcement learning from human preferences, [paper]
Thinking Fast and Slow with Deep Learning and Tree Search, [paper]

NLP supplement

This section covers NLP prerequisite foundation for RLHF.

add title, [[paper](add link)]

LLM supplement

This section covers LLM supplement content.

Improving Language Understanding by Generative Pre-Training, [paper]
Language Models are Unsupervised Multitask Learners, [paper]
Language Models are Few-Shot Learners, [paper]
GPT-4 Technical Report, [paper]
💐 Training language models to follow instructions with human feedback, [paper]
💐 GLM: General Language Model Pretraining with Autoregressive Blank Infilling, [paper]
💐 GLM-130B: AN OPEN BILINGUAL PRE-TRAINED MODEL, [paper]

🎶 Blogs & Docs & Books

🍑 [blog] Lil'Log, [RL,NLP,and so on]
🍑 [blog] Yao Fu's Blog, [Open-source the knowledge]
[docs] Stable-Baselines3 Docs, [Reliable Reinforcement Learning Implementations]
[docs] OpenAI Spinning Up, [An educational resource produced by OpenAI to learning RL easier]
[book] Reinforcement Learning: An Introduction(Chinese Version), [Classical Works in RL]

🐮 Datasets

Math

Other

Anthropic Helpful and Harmless RLHF, reward modeling [git] [paper]
OpenAI Summarize, reward modeling [git] [paper]
OpenAI Web GPT, reward modeling [blog] [paper] [hugging face]
stack-exchange-preferences, reward modeling [hugging face]
Stanford Human Preferences Dataset (SHP), reward modeling [hugging face]
TruthfulQA, evaluate truthful [hugging face]
ToxiGen, evaluate toxicity [hugging face]
Bias in Open-ended Language Generation Dataset (BOLD), evaluate fairness [hugging face]

🎨 Code

[WizardLM] WizardLM: An Instruction-following LLM Using Evol-Instruct, [git]
[Self-Instruct] Self-Instruct: Aligning LM with Self Generated Instructions, [git]
[ConstitutionalHarmlessness] Constitutional AI: Harmlessness from AI Feedback, [git]
[project-name] project full name, [where]

Citation

If you find this repo useful, please kindly cite our survey:

@article{lu2022dl4math,
  title={A Survey of Deep Learning for Mathematical Reasoning},
  author={Lu, Pan and Qiu, Liang and Yu, Wenhao and Welleck, Sean and Chang, Kai-Wei},
  journal={arXiv preprint arXiv:2212.10535},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning for Mathematical Reasoning (RLHF4Math)

🦁 Papers

Related Survey

Math Reasoning

RLHF suplement

RL supplement

NLP supplement

LLM supplement

🎶 Blogs & Docs & Books

🐮 Datasets

Math

Other

🎨 Code

Citation

About

Releases

Packages

TaciturnMute/RLHF4Math

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning for Mathematical Reasoning (RLHF4Math)

🦁 Papers

Related Survey

Math Reasoning

RLHF suplement

RL supplement

NLP supplement

LLM supplement

🎶 Blogs & Docs & Books

🐮 Datasets

Math

Other

🎨 Code

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages