Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A problem when training #16

Open
guishe666 opened this issue Aug 21, 2024 · 1 comment
Open

A problem when training #16

guishe666 opened this issue Aug 21, 2024 · 1 comment

Comments

@guishe666
Copy link

I met a problem when training on my single RTX 4090. The predicted target occurs some black sub-images when training after 36k steps. The learning rate is set to 5e-5 and batch size is 64.
Can you give me some advice?

predicted target
predict_target-40001

decode target
decode_target-40001

prime target
prime_target-40001

The training log is here:
2024-08-20 20:54:35,177 - train_ldm.py - autoencoder:
pretrained_path: assets/stable-diffusion/autoencoder_kl.pth
ckpt_root: workdir/flickr192_large/noise_pred_20240820_80004/ckpts
config_name: flickr192_large
dataset:
embed_dim: 1024
grid_size: 12
name: flickr
path: ./dataset/scenery/train_ori/
resolution: 192
hparams: noise_pred_20240820_8000
4
lr_scheduler:
name: customized
warmup_steps: 20000
mixed_precision: fp16
nnet:
depth: 20
embed_dim: 1024
img_size: 24
in_chans: 4
mlp_ratio: 4
mlp_time_embed: false
name: uvit
num_classes: 1001
num_heads: 16
patch_size: 2
qkv_bias: false
use_checkpoint: true
optimizer:
betas: !!python/tuple

  • 0.99
  • 0.99
    lr: 0.0002
    name: adamw
    weight_decay: 0.03
    pred: noise_pred
    sample:
    algorithm: dpm_solver
    cfg: true
    mini_batch_size: 50
    n_samples: 50000
    path: ''
    sample_steps: 50
    scale: 0.4
    sample_dir: workdir/flickr192_large/noise_pred_20240820_80004/samples
    seed: 1234
    train:
    batch_size: 64
    eval_interval: 2000
    log_interval: 10
    mode: cond
    n_steps: 320000
    save_interval: 4000
    workdir: workdir/flickr192_large/noise_pred_20240820_8000
    4
    z_shape: !!python/tuple
  • 4
  • 24
  • 24
@Sherrylone
Copy link
Owner

Try longer iterations and larger batch sizes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants