Skip to content

DeepSoftwareAnalytics/EffiReasonTrans

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EffiReasonTrans: RL-Optimized Reasoning for Code Translation

We introduce EfffReasonTrans, a training framework for code translation that aims to enhance translation accuracy while balancing inference latency.

Overview of EffiReasonTrans

EffiReasonTrans consists of the following three components:

  • Data synthesis: We construct a reasoning-augmented dataset through two steps—first collecting clean source programs with reliable test cases, and then generating (source code, reasoning, target code) triplets using a reasoning-capable LLM (DeepSeek-R1), filtered by automated syntax checks and functional validation. The final dataset we constructed is data_synthesis/training_data/filtered_training_data.jsonl.

  • Supervised fine-tuning: Based on the synthesized data, we perform supervised fine-tuning to provide the model with a strong initialization.

  • Reinforcement learning: To further enhance translation performance, we apply reinforcement learning using the GRPO algorithm, guided by a custom dual-objective reward that balances execution correctness (test case pass rate) and output conciseness (length tolerance).

Source code

Environment

You can create and activate the environment using the provided environment.yml:

conda env create -f environment.yml

Data synthesis

Step 1: Generate Raw Reasoning Outputs

First, use data_synthesis/generate_reasoning_data.py to generate raw outputs containing reasoning-augmented translations.

python data_synthesis/generate_reasoning_data.py \
  --apikey YOUR_API_KEY \
  --src_lang SRC_LANG \
  --tgt_lang TGT_LANG \
  --k 8 \
  --start 1 \
  --model deepseek-reasoner \
  --data_path data_synthesis/raw_data/raw_dataset.jsonl \
  --output_dir outputs

The generated raw data will be saved under: outputs/deepseek-reasoner.

Step 2: Process Generated Code into Executable Scripts

Then, use data_synthesis/process_generated_data.py to extract and convert the generated target code into executable scripts.

python data_synthesis/process_generated_data.py \
  --src_lang SRC_LANG \
  --tgt_lang TGT_LANG \
  --model deepseek-reasoner \
  --output_dir outputs

The processed executable scripts will be saved under: outputs/deepseek-reasoner

Step 3: Filter and Collect Valid Reasoning Contents

Finally, run data_synthesis/collect_trans_conversation.py to execute the scripts, discard incorrect generations, and collect only valid reasoning pairs.

python data_synthesis/collect_trans_conversation.py \
  --src_lang SRC_LANG \
  --tgt_lang TGT_LANG \
  --timeout 1 \
  --model deepseek-reasoner \
  --input_dir outputs\
  --output_dir data_synthesis/training_data/raw

Training

Stage 1: Supervied Fine-Tuning In this stage, we perform supervised fine-tuning on a reasoning-augmented dataset to initialize the model.

python training_scripts/sft_dsr1_distill_qw_1.5b.py \
  --lr $LR \
  --sched $SCHEDULER \
  --epochs $EPOCHS \
  --bs $BATCH_SIZE \
  --gs $GRAD_ACC_STEPS \
  --model_path "$MODEL_PATH" \
  --data_path "$DATA_PATH" \
  --output_path "$OUTPUT_PATH"

You can use --help to see the description of each parameter.

Stage 2: Reinforcement Learning In this stage, we fine-tune the model using reinforcement learning to further optimize performance.

python training_scripts/rl_grpo.py path/to/config.yaml 

A configuration template can be found at: training_scripts/grpo_config_files/config_template.yaml

Evaluation

After training, we evaluate model performance using both accuracy-based metrics and efficiency-related metrics.

bash evaluation/run_eval.sh
python evaluation/evaluation_token_per_sec.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published