Thanks for the impressive work. I notice that in the training script(ControlNeXt-SVD-v2-Training/train_svd.py). There is a training script for the depth controlnext. Is the model going to be released?
Also, I find the dataset Webvid are cropped to square format to train the depth condition. Is there any special reason for doing this? Will it be better to training with 16:9 if I want to inference with 16:9 ratio?