I’ve noticed a discrepancy between the training and inference implementations in SD1.5. Specifically, in the inference code, the ControlNext layer is added at the end of the "mid" block, whereas in the training code, it's added at the end of the "down" block.
Could you clarify which version is correct? Additionally, is there any performance comparison available for these two configurations?