We introduce EfffReasonTrans, a training framework for code translation that aims to enhance translation accuracy while balancing inference latency.
EffiReasonTrans consists of the following three components:
-
Data synthesis: We construct a reasoning-augmented dataset through two steps—first collecting clean source programs with reliable test cases, and then generating (source code, reasoning, target code) triplets using a reasoning-capable LLM (DeepSeek-R1), filtered by automated syntax checks and functional validation. The final dataset we constructed is
data_synthesis/training_data/filtered_training_data.jsonl
. -
Supervised fine-tuning: Based on the synthesized data, we perform supervised fine-tuning to provide the model with a strong initialization.
-
Reinforcement learning: To further enhance translation performance, we apply reinforcement learning using the GRPO algorithm, guided by a custom dual-objective reward that balances execution correctness (test case pass rate) and output conciseness (length tolerance).
You can create and activate the environment using the provided environment.yml
:
conda env create -f environment.yml
Step 1: Generate Raw Reasoning Outputs
First, use data_synthesis/generate_reasoning_data.py
to generate raw outputs containing reasoning-augmented translations.
python data_synthesis/generate_reasoning_data.py \
--apikey YOUR_API_KEY \
--src_lang SRC_LANG \
--tgt_lang TGT_LANG \
--k 8 \
--start 1 \
--model deepseek-reasoner \
--data_path data_synthesis/raw_data/raw_dataset.jsonl \
--output_dir outputs
The generated raw data will be saved under: outputs/deepseek-reasoner
.
Step 2: Process Generated Code into Executable Scripts
Then, use data_synthesis/process_generated_data.py
to extract and convert the generated target code into executable scripts.
python data_synthesis/process_generated_data.py \
--src_lang SRC_LANG \
--tgt_lang TGT_LANG \
--model deepseek-reasoner \
--output_dir outputs
The processed executable scripts will be saved under: outputs/deepseek-reasoner
Step 3: Filter and Collect Valid Reasoning Contents
Finally, run data_synthesis/collect_trans_conversation.py
to execute the scripts, discard incorrect generations, and collect only valid reasoning pairs.
python data_synthesis/collect_trans_conversation.py \
--src_lang SRC_LANG \
--tgt_lang TGT_LANG \
--timeout 1 \
--model deepseek-reasoner \
--input_dir outputs\
--output_dir data_synthesis/training_data/raw
Stage 1: Supervied Fine-Tuning In this stage, we perform supervised fine-tuning on a reasoning-augmented dataset to initialize the model.
python training_scripts/sft_dsr1_distill_qw_1.5b.py \
--lr $LR \
--sched $SCHEDULER \
--epochs $EPOCHS \
--bs $BATCH_SIZE \
--gs $GRAD_ACC_STEPS \
--model_path "$MODEL_PATH" \
--data_path "$DATA_PATH" \
--output_path "$OUTPUT_PATH"
You can use --help
to see the description of each parameter.
Stage 2: Reinforcement Learning In this stage, we fine-tune the model using reinforcement learning to further optimize performance.
python training_scripts/rl_grpo.py path/to/config.yaml
A configuration template can be found at: training_scripts/grpo_config_files/config_template.yaml
After training, we evaluate model performance using both accuracy-based metrics and efficiency-related metrics.
bash evaluation/run_eval.sh
python evaluation/evaluation_token_per_sec.py