Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I would like to know about the parameters during training. #131

Open
ootsuka-repos opened this issue Dec 21, 2024 · 6 comments
Open

I would like to know about the parameters during training. #131

ootsuka-repos opened this issue Dec 21, 2024 · 6 comments

Comments

@ootsuka-repos
Copy link

About video_resolution_buckets.
It defaults to 49x512x768, but I think the order is frames, height, width.
Is there an upper limit for frames, height, and width?

https://huggingface.co/Lightricks/LTX-Video

The LTX formula says num_frames=161,fps=24.
In this case, frames is up to 161, but I think I could specify fps when using factory, but internally, 1 second is trained as 24 fps?

I know that each model has a different input dimension.

It's a long story, but I only have one question.
How do I know how far I can set the value of video_resolution_buckets as an upper limit in training HunyuanVideo or LTX-VIDEO?

Translated with DeepL.com (free version)

@sayakpaul
Copy link
Collaborator

I don't think we have a validation method to check for that. Perhaps, we could add some guidelines about how to set that parameter in the README. Would that help? Cc: @a-r-r-o-w

@a-r-r-o-w
Copy link
Owner

a-r-r-o-w commented Dec 23, 2024

Is there an upper limit for frames, height, and width?

Yes, it is the same as the original repository: 257 frames, 720 height, 1280 width. But Diffusers does not yet have framewise encoding/decoding supported in the VAE, so I doubt if one could reach upto 257 frames for finetuning even on an 80GB card. I will look into this soon.

In this case, frames is up to 161, but I think I could specify fps when using factory, but internally, 1 second is trained as 24 fps?

We don't actually support training with the frame_rate parameter yet. I've hardcoded this to 24 for the moment. It should be easy enough to support, but it would help to have a training run where this works as expected and doesn't cause terrible results. I will try and work on it this weekend.

How do I know how far I can set the value of video_resolution_buckets as an upper limit in training HunyuanVideo or LTX-VIDEO?

Whatever the upper limit is specified in the original repository, the same is supported here because we rely on the Diffusers implementations (which are exact matches numerically to the original code). However, we are still working on optimizing memory requirements so it will take some more time before higher resolution training is possible.

@ootsuka-repos
Copy link
Author

Thank you. I roughly understand.
As sayakpaul mentioned, video_resolution_buckets can be complex, so I think having user guidelines would make it easier to understand.

@sayakpaul
Copy link
Collaborator

Thanks @ootsuka-repos. Would you maybe like to help us with a PR?

@ootsuka-repos
Copy link
Author

OK @sayakpaul A PR with documentation maintenance will be issued later.

@ootsuka-repos
Copy link
Author

@sayakpaul @a-r-r-o-w

PR maked
#181

Added parameter documentation.
Please let me know if there are any mistakes due to tentative maintenance.

Also, we are going through AI as we do not know native English. If you can fix any nuances or parts that may cause differences in interpretation, please let me know.

Please let me know if there is anything else that needs to be done outside of the core implementation.
We will respond in our available time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants