Official GitHub repository for Training-Free Reward-Guided Image Editing via Trajectory Optimal Control. Please refer to the paper if you want more details.
🎉 The paper is accepted(poster) in ICLR 2026.
Recent advancements in diffusion and flow-matching models have demonstrated remarkable capabilities in high-fidelity image synthesis. A prominent line of research involves reward-guided guidance, which steers the generation process during inference to align with specific objectives. However, leveraging this reward-guided approach to the task of image editing, which requires preserving the semantic content of the source image while enhancing a target reward, is largely unexplored. In this work, we introduce a novel framework for training-free, reward-guided image editing. We formulate the editing process as a trajectory optimal control problem where the reverse process of a diffusion model is treated as a controllable trajectory originating from the source image, and the adjoint states are iteratively updated to steer the editing process. Through extensive experiments across distinct editing tasks, we demonstrate that our approach significantly outperforms existing inversion-based training-free guidance baselines, achieving a superior balance between reward maximization and fidelity to the source image without reward hacking.
conda create -n itoc python=3.11
conda activate itoc
pip install -r requirements.txt
The ImageNet1k-classifier reward function requires the model checkpoint to evaluate, which are too heavy to be included in this repo.
You can download the checkpoint in here(ImageNet, L2-norm, ResNet50, ./model repository.
Check out the arguments in the script files to see more details.
-
--deterministic: If contained, the initial trajectory for the source image is generated with deterministic DDIM inversion. Otherwise, it is generated with Markovian DDPM forward process. (ITOC) -
--n_iter: Number of iterations for the optimization loop. (GA, ITOC) -
--reward_multiplier: Multiplier for the final reward function (and its gradient). (ITOC) -
--depth: The depth of the forward noising process for inversion. (DPS, FreeDoM, TFG, ITOC) -
--lr: Learning rate for the optimization process. (GA, ITOC) -
--tfg_rho: Guidance scale multiplied on$\nabla_{x_t}r(\hat x_{1|t})$ . (DPS, FreeDoM, TFG) -
--tfg_mu: Guidance scale multiplied on$\nabla_{\hat x_{1|t}}r(\hat x_{1|t})$ . (TFG)
The input image --image_path will be edited to achieve a higher human preference alignment with a given text prompt --reward_prompt.
-
ITOC(ours)
python ./src/edit_demo.py --method_name itoc --reward_name ImageReward \ --image_path ./assets/nature.png --reward_prompt "colorful painting, river flowing grass field with flowers." \ --deterministic --reward_multiplier 500 --n_iter 15 --lr 5e-3 --depth 0.5 -
Baselines: Gradient Ascent
python ./src/edit_demo.py --method_name gradient_ascent --reward_name ImageReward \ --image_path ./assets/nature.png --reward_prompt "colorful painting, river flowing grass field with flowers." \ --n_iter 100 --lr 2.0 -
Baselines: Inversion + Guided sampling methods(DPS, FreeDoM, TFG)
python ./src/edit_demo.py --method_name inversion_tfg --reward_name ImageReward \ --image_path ./assets/nature.png --reward_prompt "colorful painting, river flowing grass field with flowers." \ --depth 0.7 --tfg_rho 1.0 --tfg_mu 0.5Change
--method_nametoinversion_dps,inversion_freedom, orinversion_tfgto run the corresponding method.
Tip
- If you want to use StableDiffusion 3, run
edit_demo_sd3.pyinstead ofedit_demo.py. - For the other scenarios below, you can similarly modify
--method_nameand the corresponding hyperparameters to run the baselines.
The input image --image_path will be edited to match the style of a given style image --style_image_path.
- ITOC(ours)
python ./src/edit_demo.py --method_name itoc --reward_name Gram_Diff \ --image_path ./assets/portrait.png --style_image_path ./assets/style_ref.png \ --deterministic --reward_multiplier 1000 --n_iter 15 --lr 10e-3 --depth 0.5
The input image --image_path will be edited to alter the decision of the classifier toward the given target class --reward_class.
- ITOC(ours)
python ./src/edit_demo.py --method_name itoc --reward_name ImageNet1k_classifier \ --image_path ./assets/ladybug.png --reward_class 306 \ --deterministic --reward_multiplier 250 --n_iter 15 --lr 5e-3 --depth 0.5
The input image --image_path will be edited to align with a given text prompt --reward_prompt.
- ITOC(ours)
python ./src/edit_demo.py --method_name itoc --reward_name Clip_Score \ --image_path ./assets/face.png --reward_prompt "a face of a smiling man." \ --deterministic --reward_multiplier 1000 --n_iter 15 --lr 5e-3 --depth 0.5
If you use this code in your research, please consider citing the paper:
@article{chang2025training,
title={Training-Free Reward-Guided Image Editing via Trajectory Optimal Control},
author={Chang, Jinho and Kim, Jaemin and Ye, Jong Chul},
journal={arXiv preprint arXiv:2509.25845},
year={2025}
}- The code for the adjoint state calculation is based on & modified from the official code of Adjoint Matching.
- TFG code for TFG and other baselines(DPS, FreeDoM) is based on & modified from the official code of TFG.

