Skip to content

This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by google

Notifications You must be signed in to change notification settings

sanowl/Self-Correcting-LLM--Reinforcement-Learning-

Repository files navigation

SCoRe: Self-Correcting Language Model with Reinforcement Learning

This project implements a self-correcting language model that uses reinforcement learning to improve its outputs through multiple attempts.

Features

  • Two-stage training process with reinforcement learning
  • Support for both mathematical and coding tasks
  • Comprehensive evaluation metrics including BLEU, ROUGE, and cyclomatic complexity
  • Mixed precision training support
  • Modular and extensible architecture

Installation

  1. Clone the repository:
cd Self-Correcting-LLM--Reinforcement-Learning-
  1. Install the package:
pip install -e .

Usage

Training

To train the model on mathematical tasks:

python main.py --task MATH --data_path ./data --output_dir ./outputs

To train on coding tasks:

python main.py --task CODE --data_path ./data --output_dir ./outputs

Additional options:

  • --model_variant: Specify the model variant (default: 'decapoda-research/llama-7b-hf')
  • --mixed_precision: Enable mixed precision training
  • --no_bleu: Disable BLEU score computation
  • --no_rouge: Disable ROUGE score computation
  • --no_cyclomatic: Disable cyclomatic complexity computation

Project Structure

.
├── main.py              # Main training script
├── setup.py            # Package setup file
├── src/
│   └── score_model/    # Main package directory
│       ├── __init__.py
│       ├── config.py   # Configuration classes
│       ├── model.py    # Model implementation
│       ├── dataset.py  # Dataset classes
│       ├── trainer.py  # Training logic
│       └── utils.py    # Utility functions
├── data/               # Data directory
└── outputs/            # Output directory

Requirements

  • Python >= 3.8
  • PyTorch >= 2.0.0
  • Transformers >= 4.30.0
  • Other dependencies listed in setup.py

About

This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by google

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages