推理时遇到了问题

Hello, I have successfully configured the environment and run the inference processes of steps 2 and 3 in step 3, but I encountered the following error when running step 1 of step 3. Could you please help me resolve this? Thank you.

(camclone) root@HOST:/PATH/TO/CamCloneMaster# python inference_i2v.py --dataset_path demo/example_csv/infer/example_i2v_testset.csv --ckpt_path models/CamCloneMaster-Wan2.1/Wan-I2V-1.3B-Step8000.ckpt --output_dir demo/i2v_output
Loading models from: ./models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors
    model_name: wan_video_dit model_class: WanModel
        This model is initialized with extra kwargs: {'has_image_input': False, 'patch_size': [1, 2, 2], 'in_dim': 16, 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'text_dim': 4096, 'out_dim': 16, 'num_heads': 12, 'num_layers': 30, 'eps': 1e-06}
    The following models are loaded: ['wan_video_dit'].
Loading models from: ./models/Wan-AI/Wan2.1-T2V-1.3B/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth
    model_name: wan_video_image_encoder model_class: WanImageEncoder
    The following models are loaded: ['wan_video_image_encoder'].
Loading models from: ./models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth
    model_name: wan_video_text_encoder model_class: WanTextEncoder
    The following models are loaded: ['wan_video_text_encoder'].
Loading models from: ./models/Wan-AI/Wan2.1-T2V-1.3B/Wan2.1_VAE.pth
    model_name: wan_video_vae model_class: WanVideoVAE
    The following models are loaded: ['wan_video_vae'].
Using wan_video_text_encoder from ./models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth.
Using wan_video_dit from ./models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors.
Using wan_video_vae from ./models/Wan-AI/Wan2.1-T2V-1.3B/Wan2.1_VAE.pth.
Using wan_video_image_encoder from ./models/Wan-AI/Wan2.1-T2V-1.3B/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth.
/PATH/TO/conda_envs/camclone/lib/python3.10/site-packages/torchvision/transforms/v2/_deprecated.py:42: UserWarning: The transform `ToTensor()` is deprecated and will be removed in a future release. Instead, please use `v2.Compose([v2.ToImage(), v2.ToDtype(torch.float32, scale=True)])`. Output is equivalent up to float precision.
  warnings.warn(
Traceback (most recent call last):
  File "/PATH/TO/CamCloneMaster/inference_i2v.py", line 231, in <module>
    video = pipe(
  File "/PATH/TO/conda_envs/camclone/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/PATH/TO/CamCloneMaster/diffsynth/pipelines/wan_video_i2v.py", line 272, in __call__
    content_latents = torch.zeros_like(ref_latents)
TypeError: zeros_like(): argument 'input' (position 1) must be Tensor, not NoneType


I used AI to erase the personal information from the logs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

推理时遇到了问题 #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

推理时遇到了问题 #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions