Skip to content

I need 1 hr to inference single_example_image.json on 4 3090 GPUs, is there anything I can do to increase the speed? #197

@Haosonn

Description

@Haosonn

export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
GPU_NUM=4
torchrun --nproc_per_node=$GPU_NUM --standalone generate_infinitetalk.py
--ckpt_dir weights/Wan2.1-I2V-14B-480P
--wav2vec_dir 'weights/chinese-wav2vec2-base'
--infinitetalk_dir weights/InfiniteTalk/single/infinitetalk.safetensors
--dit_fsdp --t5_fsdp
--ulysses_size=$GPU_NUM
--input_json examples/single_example_image.json
--size infinitetalk-480
--sample_steps 40
--mode streaming
--motion_frame 9
--save_file infinitetalk_res_multigpu

Here is the scirpt I used. PYTORCH_CUDA_ALLOC_CONF is set becuase of OOM problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions