Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
-
Updated
Feb 19, 2025 - Python
Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
A comprehensive collection of process reward models.
Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
[ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.
[KDD 2025] Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners
This repository includes code and materials for the paper "Generalizable Process Reward Models via Formally Verified Training Data".
This repository contains the official implementation of Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision.
Add a description, image, and links to the process-reward-model topic page so that developers can more easily learn about it.
To associate your repository with the process-reward-model topic, visit your repo's landing page and select "manage topics."