This project implements a self-correcting language model that uses reinforcement learning to improve its outputs through multiple attempts.
- Two-stage training process with reinforcement learning
- Support for both mathematical and coding tasks
- Comprehensive evaluation metrics including BLEU, ROUGE, and cyclomatic complexity
- Mixed precision training support
- Modular and extensible architecture
- Clone the repository:
cd Self-Correcting-LLM--Reinforcement-Learning-
- Install the package:
pip install -e .
To train the model on mathematical tasks:
python main.py --task MATH --data_path ./data --output_dir ./outputs
To train on coding tasks:
python main.py --task CODE --data_path ./data --output_dir ./outputs
Additional options:
--model_variant
: Specify the model variant (default: 'decapoda-research/llama-7b-hf')--mixed_precision
: Enable mixed precision training--no_bleu
: Disable BLEU score computation--no_rouge
: Disable ROUGE score computation--no_cyclomatic
: Disable cyclomatic complexity computation
.
├── main.py # Main training script
├── setup.py # Package setup file
├── src/
│ └── score_model/ # Main package directory
│ ├── __init__.py
│ ├── config.py # Configuration classes
│ ├── model.py # Model implementation
│ ├── dataset.py # Dataset classes
│ ├── trainer.py # Training logic
│ └── utils.py # Utility functions
├── data/ # Data directory
└── outputs/ # Output directory
- Python >= 3.8
- PyTorch >= 2.0.0
- Transformers >= 4.30.0
- Other dependencies listed in setup.py