-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I would like to know about the parameters during training. #131
Comments
I don't think we have a validation method to check for that. Perhaps, we could add some guidelines about how to set that parameter in the README. Would that help? Cc: @a-r-r-o-w |
Yes, it is the same as the original repository: 257 frames, 720 height, 1280 width. But Diffusers does not yet have framewise encoding/decoding supported in the VAE, so I doubt if one could reach upto
We don't actually support training with the
Whatever the upper limit is specified in the original repository, the same is supported here because we rely on the Diffusers implementations (which are exact matches numerically to the original code). However, we are still working on optimizing memory requirements so it will take some more time before higher resolution training is possible. |
Thank you. I roughly understand. |
Thanks @ootsuka-repos. Would you maybe like to help us with a PR? |
OK @sayakpaul A PR with documentation maintenance will be issued later. |
PR maked Added parameter documentation. Also, we are going through AI as we do not know native English. If you can fix any nuances or parts that may cause differences in interpretation, please let me know. Please let me know if there is anything else that needs to be done outside of the core implementation. |
About video_resolution_buckets.
It defaults to 49x512x768, but I think the order is frames, height, width.
Is there an upper limit for frames, height, and width?
https://huggingface.co/Lightricks/LTX-Video
The LTX formula says num_frames=161,fps=24.
In this case, frames is up to 161, but I think I could specify fps when using factory, but internally, 1 second is trained as 24 fps?
I know that each model has a different input dimension.
It's a long story, but I only have one question.
How do I know how far I can set the value of video_resolution_buckets as an upper limit in training HunyuanVideo or LTX-VIDEO?
Translated with DeepL.com (free version)
The text was updated successfully, but these errors were encountered: