Skip to content

Commit

Permalink
Fix the bug that MAISI ckpt cannot be loaded after finetune. (#654)
Browse files Browse the repository at this point in the history
Fixes # .

### Description

MAISI output checkpoint after finetuning cannot be used as
``trained_controlnet_path``.
This problem came from the `CheckpointSaver`. When a single key is
provided for `save_dict` such as:
"save_dict": {
                    "controlnet_state_dict": "@ControlNet"
 },
The saved dict does contain the key " "controlnet_state_dict". However,
it directly saves the state_dict of controlnet as the checkpoint.

The workaround is that we also save the optimizer state. For example,
"save_dict": {
                    "controlnet_state_dict": "@ControlNet",
                    "optimizer": "@optimizer"
}. Then, the MAISI output checkpoint after fine-tuning can be properly
loaded.

### Status
**Ready/Work in progress/Hold**

### Please ensure all the checkboxes:
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Codeformat tests passed locally by running `./runtests.sh
--codeformat`.
- [ ] In-line docstrings updated.
- [ ] Update `version` and `changelog` in `metadata.json` if changing an
existing bundle.
- [ ] Please ensure the naming rules in config files meet our
requirements (please refer to: `CONTRIBUTING.md`).
- [ ] Ensure versions of packages such as `monai`, `pytorch` and `numpy`
are correct in `metadata.json`.
- [ ] Descriptions should be consistent with the content, such as
`eval_metrics` of the provided weights and TorchScript modules.
- [ ] Files larger than 25MB are excluded and replaced by providing
download links in `large_file.yml`.
- [ ] Avoid using path that contains personal information within config
files (such as use `/home/your_name/` for `"bundle_root"`).

---------

Signed-off-by: Pengfei Guo <pengfeig@nvidia.com>
  • Loading branch information
guopengf authored Sep 18, 2024
1 parent 6bdfd30 commit 6db359d
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 3 deletions.
3 changes: 2 additions & 1 deletion models/maisi_ct_generative/configs/metadata.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
{
"schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_generator_ldm_20240318.json",
"version": "0.4.1",
"version": "0.4.2",
"changelog": {
"0.4.2": "update train.json to fix finetune ckpt bug",
"0.4.1": "update large files",
"0.4.0": "update to use monai 1.4, model ckpt updated, rm GenerativeAI repo, add quality check",
"0.3.6": "first oss version"
Expand Down
5 changes: 3 additions & 2 deletions models/maisi_ct_generative/configs/train.json
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@
"copy_controlnet_state": "$monai.networks.utils.copy_model_state(@controlnet, @diffusion_unet.state_dict())",
"checkpoint_controlnet": "$torch.load(@trained_controlnet_path)",
"load_controlnet": "$@controlnet.load_state_dict(@checkpoint_controlnet['controlnet_state_dict'], strict=True)",
"scale_factor": "$@checkpoint_controlnet['scale_factor'].to(@device)",
"scale_factor": "$@checkpoint_diffusion_unet['scale_factor'].to(@device)",
"loss": {
"_target_": "torch.nn.L1Loss",
"reduction": "none"
Expand Down Expand Up @@ -214,7 +214,8 @@
"_target_": "CheckpointSaver",
"save_dir": "@ckpt_dir",
"save_dict": {
"controlnet_state_dict": "@controlnet"
"controlnet_state_dict": "@controlnet",
"optimizer": "@optimizer"
},
"save_interval": 1,
"n_saved": 5
Expand Down

0 comments on commit 6db359d

Please sign in to comment.