The output shape during training of ControlNext 1.5 is 1,320,64,64, while the shape during inference of ControlNext is 1,1280,8,8. This leads to differences in the way the sample addition in the UNet is performed. I would like to ask why there is such a difference? Additionally, the ControlNext inference weights for SD 1.5 currently cannot load the ControlNext training weights. So, which script did you use to train the ControlNet 1.5 weights that you released on Hugging Face?