[feat] add Mochi-1 trainer #90

sayakpaul · 2024-11-17T04:49:13Z

A minimal and a simple reimplementation of the Mochi-1 fine-tuner but with diffusers and peft.

Follow the README.md file added in this PR.

Successful runs will be at https://wandb.ai/sayakpaul/mochi-1-lora.

sayakpaul · 2024-11-18T11:23:37Z

training/mochi-1/text_to_video_lora.py

+        torch.cuda.empty_cache()
+        torch.cuda.synchronize(accelerator.device)
+
+    def get_sigmas(timesteps, n_dim=4, dtype=torch.float32):


This probably needs to be revisited because we follow an inverse sigma scheme in Mochi-1. And the sigma linear quadratic schedule, perhaps, needs to be incorporated.

I think we'll have to take a deeper look into this soon, because the current training runs seem to not have worked as output videos are random noise.

I'll try and take a better look soon as well!

Yes, exactly. I am looking into it too.

a-r-r-o-w

Let's jam on this and make the script work asap! Tysm for this

a-r-r-o-w · 2024-11-18T23:21:45Z

training/mochi-1/args.py

@@ -0,0 +1,474 @@
+import argparse


Okay to have a separate file for now for faster iterations. We will be refactoring the repo with a more modular API in the future anyway

a-r-r-o-w · 2024-11-18T23:22:31Z

training/mochi-1/dataset.py

+
+logger = get_logger(__name__)
+
+# TODO (sayakpaul): probably not all buckets are needed for Mochi-1? 


These are just default buckets if you don't specify via CLI args, so I don't think we need to worry here

training/mochi-1/dataset.py

training/mochi-1/prepare_dataset.py

a-r-r-o-w · 2024-11-18T23:33:10Z

training/mochi-1/text_to_video_lora.py

+        r=args.rank,
+        lora_alpha=args.lora_alpha,
+        init_lora_weights=True,
+        target_modules=["to_k", "to_q", "to_v", "to_out.0"],


Not required at the moment, and this is more from the diffusion training community that regularly finetunes image models, but finetuning certain layers can make a model worse. Typically, you would want to understand which layers do what to the video by removing that layer and trying to run inference. We should try to throughly understand this for CogVideoX and Mochi, and try to find which layers does it make sense to finetune (and provide users this configurability instead of finetuning all layers by default) for say aesthetics, new concept, temporality improvements, stylized effects, etc. Would you be interested in doing this analysis for Mochi, and I can take it up for Cog?

Later yes. But we should try to first find something that is the simplest yet reasonable. But I will make it configurable right now.

a-r-r-o-w · 2024-11-18T23:34:09Z

training/mochi-1/text_to_video_lora.py

+        torch.cuda.empty_cache()
+        torch.cuda.synchronize(accelerator.device)
+
+    def get_sigmas(timesteps, n_dim=4, dtype=torch.float32):


I think we'll have to take a deeper look into this soon, because the current training runs seem to not have worked as output videos are random noise.

I'll try and take a better look soon as well!

training/mochi-1/text_to_video_lora.py

training/mochi-1/prepare_dataset.py

training/mochi-1/args.py

training/mochi-1/prepare_dataset.sh

training/mochi-1/dataset.py

sayakpaul · 2024-11-19T06:02:29Z

@a-r-r-o-w I have pushed a couple of updates that better reuses the existing dataset.py module and reduces a significant LoC. I had to do a fresh prepare_dataset.py, though, because we have some major differences, mainly regarding to prompt_attention_mask.

I have also addressed some of your other comments.

Will now look into sigmas.

training/mochi-1/text_to_video_lora.py

sayakpaul · 2024-11-22T18:06:32Z

I decided to pad videos having lower number of frames than the nearest frame bucket.

TrickyBarrel · 2024-11-28T22:16:21Z

@sayakpaul Did you manage to find a fix for the random noise in output videos ?

sayakpaul · 2024-11-29T02:06:49Z

Yeah. Will push my updates soon

a-r-r-o-w

Thanks Sayak! Just some minor comments and we should be good to merge

We can work on the refactors and generic-fication when we find the time later on

training/mochi-1/args.py

a-r-r-o-w · 2024-11-29T06:33:33Z

training/mochi-1/README.md

+- --validation_epochs 1 \
+```
+
+We haven't rigorously tested but without validation enabled, this script should run under 40GBs of GPU VRAM.


This is for how many frames btw?

37 frames (similar to the original example).

training/mochi-1/train.sh

training/mochi-1/text_to_video_lora.py

a-r-r-o-w · 2024-11-29T06:41:05Z

Also could you point me to the logs from your most promising run so far?

sayakpaul · 2024-11-29T07:34:47Z

Also could you point me to the logs from your most promising run so far?

https://wandb.ai/sayakpaul/mochi-1-lora/runs/hu344t6o

Here's one run with the original trainer but on the same dataset:
https://wandb.ai/sayakpaul/mochi-1-lora/runs/de4iwnvf

We can see that the loss dynamics are the same. Additionally, here's a sample derived intermediate with the original trainer:

0_1600.mp4

The one we see here is not too bad: https://wandb.ai/sayakpaul/mochi-1-lora/runs/hu344t6o. Of course, I suspect the other quality issues will go away once huggingface/diffusers#10033 is merged. But there is a blocker with that: https://huggingface.slack.com/archives/C065E480NN9/p1732808679447279?thread_ts=1732688413.727359&cid=C065E480NN9.

LMK if you have any other questions.

Co-authored-by: Aryan <aryan@huggingface.co>

a-r-r-o-w · 2024-11-29T07:48:25Z

Wow, these look very promising! The quality will definitely improve once Dhruv's PR is in, both for training and inference. I think we're good to merge then

sayakpaul · 2024-11-29T07:54:19Z

@a-r-r-o-w thanks! Do you think it'd be prudent to get huggingface/diffusers#10031 in, as it was also critical?

sayakpaul added 2 commits November 17, 2024 08:31

dataprep.

a9adf22

updates

5ba510e

sayakpaul requested a review from a-r-r-o-w November 17, 2024 04:49

sayakpaul added 5 commits November 18, 2024 15:26

updates.

9852c3d

updates

a40ccd2

updates

4c05ea6

updates

a8dd94e

updates

ac83c78

sayakpaul changed the title ~~some minor updates~~ [feat] add Mochi-1 trainer Nov 18, 2024

sayakpaul commented Nov 18, 2024

View reviewed changes

a-r-r-o-w reviewed Nov 18, 2024

View reviewed changes

updates.

1409d47

sayakpaul commented Nov 19, 2024

View reviewed changes

training/mochi-1/prepare_dataset.py Outdated Show resolved Hide resolved

sayakpaul commented Nov 19, 2024

View reviewed changes

training/mochi-1/args.py Outdated Show resolved Hide resolved

sayakpaul commented Nov 19, 2024

View reviewed changes

training/mochi-1/prepare_dataset.sh Outdated Show resolved Hide resolved

sayakpaul added 3 commits November 19, 2024 10:27

nearest_frame_bucket.

c80bd17

revert changes to training/dataset.py

316a705

Merge branch 'main' into mochi-1-tuner

4216705

sayakpaul commented Nov 19, 2024

View reviewed changes

training/mochi-1/dataset.py Outdated Show resolved Hide resolved

sayakpaul added 2 commits November 19, 2024 10:51

better reuse.

440dc25

betterments.

a01592b

sayakpaul added 2 commits November 19, 2024 11:35

dataset_mochi

cb16cba

fix

5208c59

sayakpaul commented Nov 19, 2024

View reviewed changes

training/mochi-1/text_to_video_lora.py Outdated Show resolved Hide resolved

fixes

9eea656

sayakpaul requested a review from a-r-r-o-w November 19, 2024 13:08

sayakpaul mentioned this pull request Nov 20, 2024

[LoRA] enable LoRA for Mochi-1 huggingface/diffusers#9943

Merged

sayakpaul added 2 commits November 20, 2024 10:39

updates

7e203db

updates

9a15eae

updates

2dbddd5

sayakpaul added 2 commits November 26, 2024 12:28

updates

58a0632

updates

2fde026

updates

ced8558

sayakpaul marked this pull request as ready for review November 29, 2024 05:19

updates

4e3bb7a

a-r-r-o-w approved these changes Nov 29, 2024

View reviewed changes

sayakpaul and others added 3 commits November 29, 2024 13:07

better example code.

e1866d8

fix help message

9c86706

Apply suggestions from code review

38f157c

Co-authored-by: Aryan <aryan@huggingface.co>

pin moviepy.

8a32b13

sayakpaul added 4 commits November 29, 2024 13:27

pyav pining.

95775ba

better command

0011fa1

add a preview table

dceded0

Update README.md

7090bcb

sayakpaul merged commit d10963f into main Nov 29, 2024

sayakpaul deleted the mochi-1-tuner branch November 29, 2024 09:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] add Mochi-1 trainer #90

[feat] add Mochi-1 trainer #90

sayakpaul commented Nov 17, 2024 •

edited

Loading

sayakpaul Nov 18, 2024

a-r-r-o-w Nov 18, 2024

sayakpaul Nov 19, 2024

a-r-r-o-w left a comment

a-r-r-o-w Nov 18, 2024

a-r-r-o-w Nov 18, 2024

a-r-r-o-w Nov 18, 2024

sayakpaul Nov 19, 2024

a-r-r-o-w Nov 18, 2024

sayakpaul commented Nov 19, 2024

sayakpaul commented Nov 22, 2024

TrickyBarrel commented Nov 28, 2024

sayakpaul commented Nov 29, 2024

a-r-r-o-w left a comment

a-r-r-o-w Nov 29, 2024

sayakpaul Nov 29, 2024

a-r-r-o-w commented Nov 29, 2024

sayakpaul commented Nov 29, 2024

a-r-r-o-w commented Nov 29, 2024

sayakpaul commented Nov 29, 2024


		logger = get_logger(__name__)

		# TODO (sayakpaul): probably not all buckets are needed for Mochi-1?

[feat] add Mochi-1 trainer #90

[feat] add Mochi-1 trainer #90

Conversation

sayakpaul commented Nov 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul commented Nov 19, 2024

sayakpaul commented Nov 22, 2024

TrickyBarrel commented Nov 28, 2024

sayakpaul commented Nov 29, 2024

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a-r-r-o-w commented Nov 29, 2024

sayakpaul commented Nov 29, 2024

a-r-r-o-w commented Nov 29, 2024

sayakpaul commented Nov 29, 2024

sayakpaul commented Nov 17, 2024 •

edited

Loading