Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding the release date of the video editing code #12

Open
tayton42 opened this issue Dec 2, 2024 · 3 comments
Open

Regarding the release date of the video editing code #12

tayton42 opened this issue Dec 2, 2024 · 3 comments

Comments

@tayton42
Copy link

tayton42 commented Dec 2, 2024

Thank you for your work! Do you plan to release the video editing code this week? If not, please let me know and I will try to modify it myself. If so, I will wait. Thank you!😘

@wangjiangshan0725
Copy link
Owner

Dear @tayton42,

Thank you for your interest! We definitely plan to release this part of the code, but the exact timing will be slightly delayed, might depending on the outcome of the paper's acceptance. We appreciate your patience.

Btw, we are exploring using the more powerful video generation models (such as Mochi) for video editing, which is expected to outperform OpenSora.

@tayton42
Copy link
Author

tayton42 commented Dec 2, 2024

Dear @tayton42,

Thank you for your interest! We definitely plan to release this part of the code, but the exact timing will be slightly delayed, might depending on the outcome of the paper's acceptance. We appreciate your patience.

Btw, we are exploring using the more powerful video generation models (such as Mochi) for video editing, which is expected to outperform OpenSora.

Understood, thank you for your reply, looking forward to your future work!

@tayton42 tayton42 closed this as completed Dec 2, 2024
@tayton42
Copy link
Author

tayton42 commented Dec 4, 2024

Dear @tayton42,

Thank you for your interest! We definitely plan to release this part of the code, but the exact timing will be slightly delayed, might depending on the outcome of the paper's acceptance. We appreciate your patience.

Btw, we are exploring using the more powerful video generation models (such as Mochi) for video editing, which is expected to outperform OpenSora.

Hi! I have tried to modify mochi's sampling code according to the formula, but I encountered difficulties when implementing the inversion. The result of the inversion deviates significantly from the original video. I am sure the problem is not with the VAE. Can you give me some advice? Below is the sampling code I modified based on mochi.

def sample_model_rfsolver(device, dit, conditioning, **args):
    random.seed(args["seed"])
    np.random.seed(args["seed"])
    torch.manual_seed(args["seed"])

    generator = torch.Generator(device=device)
    generator.manual_seed(args["seed"])

    w, h, t = args["width"], args["height"], args["num_frames"]
    sample_steps = args["num_inference_steps"]
    cfg_schedule = args["cfg_schedule"]
    sigma_schedule = args["sigma_schedule"]
    inversion = args["inversion"]
    if inversion:
        sigma_schedule=sigma_schedule[::-1]

    assert_eq(len(cfg_schedule), sample_steps, "cfg_schedule must have length sample_steps")
    assert_eq((t - 1) % 6, 0, "t - 1 must be divisible by 6")
    assert_eq(
        len(sigma_schedule),
        sample_steps + 1,
        "sigma_schedule must have length sample_steps + 1",
    )

    B = 1
    SPATIAL_DOWNSAMPLE = 8
    TEMPORAL_DOWNSAMPLE = 6
    IN_CHANNELS = 12
    latent_t = ((t - 1) // TEMPORAL_DOWNSAMPLE) + 1
    latent_w, latent_h = w // SPATIAL_DOWNSAMPLE, h // SPATIAL_DOWNSAMPLE

    # z = torch.randn(
    #     (B, IN_CHANNELS, latent_t, latent_h, latent_w),
    #     device=device,
    #     dtype=torch.float32,
    # )
    z=args["latent"]

    num_latents = latent_t * latent_h * latent_w
    cond_batched = cond_text = cond_null = None
    if "cond" in conditioning:
        cond_text = conditioning["cond"]
        cond_null = conditioning["null"]
        cond_text["packed_indices"] = compute_packed_indices(device, cond_text["y_mask"][0], num_latents)
        cond_null["packed_indices"] = compute_packed_indices(device, cond_null["y_mask"][0], num_latents)
    else:
        cond_batched = conditioning["batched"]
        cond_batched["packed_indices"] = compute_packed_indices(device, cond_batched["y_mask"][0], num_latents)
        z = repeat(z, "b ... -> (repeat b) ...", repeat=2)

    def model_fn(*, z, sigma, cfg_scale):
        if cond_batched:
            with torch.autocast("cuda", dtype=torch.bfloat16):
                out = dit(z, sigma, **cond_batched)
            out_cond, out_uncond = torch.chunk(out, chunks=2, dim=0)
        else:
            nonlocal cond_text, cond_null
            with torch.autocast("cuda", dtype=torch.bfloat16):
                if cfg_scale==0.0:
                    return dit(z, sigma, **cond_null).to(z)
                out_cond = dit(z, sigma, **cond_text)
                out_uncond = dit(z, sigma, **cond_null)
        assert out_cond.shape == out_uncond.shape
        out_uncond = out_uncond.to(z)
        out_cond = out_cond.to(z)
        return out_uncond + cfg_scale * (out_cond - out_uncond)

    # Euler sampler w/ customizable sigma schedule & cfg scale
    for i in get_new_progress_bar(range(0, sample_steps), desc="Sampling"):
        sigma = sigma_schedule[i]
        dsigma = sigma - sigma_schedule[i + 1]

        # `pred` estimates `z_0 - eps`.
        pred = model_fn(
            z=z,
            sigma=torch.full([B] if cond_text else [B * 2], sigma, device=z.device),
            cfg_scale=cfg_schedule[i],
        )

        z_mid=z + dsigma / 2 * pred
        pred_mid = model_fn(
            z=z_mid,
            sigma=torch.full([B] if cond_text else [B * 2], (sigma - dsigma / 2), device=z.device),
            cfg_scale=cfg_schedule[i],
        )
        #assert pred.dtype == torch.float32
        first_order=(pred_mid - pred) / (dsigma / 2)
        z = z + dsigma * pred + 0.5 * dsigma ** 2 * first_order

    z = z[:B] if cond_batched else z
    if inversion:
        return z
    return dit_latents_to_vae_latents(z)

@tayton42 tayton42 reopened this Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants