Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to resume training #9

Open
lilili666-hit opened this issue May 21, 2020 · 20 comments
Open

How to resume training #9

lilili666-hit opened this issue May 21, 2020 · 20 comments

Comments

@lilili666-hit
Copy link

when i was training a lot time ,and i have stop it ,can you tell me how to resume training? And the petrain_network is equal to 1or 0?

@shepnerd
Copy link
Collaborator

Using the option --load_model_dir [Your model path] to continue the training. The option --pretrain_network 0|1 is to decide whether using the all training losses or just the reconstruction loss.

@lilili666-hit
Copy link
Author

thank you very much. I have another question for you, why G_ Does loss have a negative value? After a period of time, it will output a negative value and then change to a positive value.

@lilili666-hit
Copy link
Author

Hi Yi Wang ,Can you share the place2 pretrained model ,thank you very much。

@shepnerd
Copy link
Collaborator

shepnerd commented Jun 4, 2020

It happens as the loss for updating generator in WGAN-GP could be negative.

thank you very much. I have another question for you, why G_ Does loss have a negative value? After a period of time, it will output a negative value and then change to a positive value.

You required model (Places2) can be downloaded from here. You can use it by python test.py --dataset cityscape --data_file TEST_IMAGE_FOLDER --load_model_dir checkpoints/places2-srn-subpixel-gc64 --model srn --feat_expansion_op subpixel --use_cn 1 --random_crop 0 --random_mask 0 --img_shapes 256,512,3 --mask_shapes 256,256 --g_cnum 64 right after you put the pretrained model folder in ./checkpoints.

@lilili666-hit
Copy link
Author

when i was testing ,i have met another problem.
ValueError: Dimension 3 in both shapes must be equal, but are 256 and 1024. Shapes are [3,3,64,256] and [3,3,64,1024]. for 'Assign_30' (op: 'Assign') with input shapes: [3,3,64,256], [3,3,64,1024].

@lilili666-hit
Copy link
Author

12312313
this is my training detail 。thank you very much!

@lilili666-hit
Copy link
Author

The picture size of place2 is 256 * 256. How do you use 256 * 512 for training? Do you directly change the fractional rate of 256 * 256 pictures to 256 * 512?

@shepnerd
Copy link
Collaborator

When using the pretrained model on Places2, please set the image and mask shapes by --img_shapes 256,512,3 --mask_shapes 256,256. About the image resolution in Places2, we use the data in the original resolution instead of the resized one.

@lilili666-hit
Copy link
Author

Can this program be fine-tuned? For example, import a pre-trained model, then freeze certain layers and then train.

@shepnerd
Copy link
Collaborator

Sure. If you wanna freeze some specific layers in the generator, you can remove them by their names in g_vars, then these parameters will not be updated in the following training.

@lilili666-hit
Copy link
Author

Can you give me an example? thank you very much.

@lilili666-hit
Copy link
Author

Hi Yi Wang. What applications do you think this technology has in real life? For example, what kind of practical problems can be solved?

@lilili666-hit
Copy link
Author

I am training from the newly selected data set. To what extent can the first stage of training be identified as convergence?thanks for your help

@shepnerd
Copy link
Collaborator

shepnerd commented Aug 29, 2020

Hi Yi Wang. What applications do you think this technology has in real life? For example, what kind of practical problems can be solved?

Extending images or videos naturally to fit the display device could benefit from such technology. Someone has explored its video application as here.

@shepnerd
Copy link
Collaborator

shepnerd commented Aug 29, 2020

I am training from the newly selected data set. To what extent can the first stage of training be identified as convergence?thanks for your help

We can ensure the training is converged when the reconstruction loss seems stable in the first stage. Quantitatively, for a relatively small-scale dataset (e.g. Paris street view, cityscapes) contains 2k~12k training images, 80000 iterations with batch size 16 should be enough (larger batch size may require fewer training iterations).

The two-stage training is actually a compromise due to the used network only has a small capacity (no more than 4M parameters). If training the model with a large capacity equipped with residual blocks (like SPADE or Pix2pixHD) and new GAN stable tricks (spectral norm, multiple-scale, patchgan, condition projection, etc), it may be trained directly with vgg loss and adversarial loss from scratch.

@lilili666-hit
Copy link
Author

11111111111111111111111

Is the loss function image of this discriminator normal? I trained from scratch with a newly selected dataset.

@lilili666-hit
Copy link
Author

Can you send me all the loss curves of your training at that time? thanks.

@shepnerd
Copy link
Collaborator

shepnerd commented Sep 4, 2020

Can you send me all the loss curves of your training at that time? thanks.

I will search my server for these data and get back to you later.

@shepnerd
Copy link
Collaborator

shepnerd commented Sep 4, 2020

11111111111111111111111

Is the loss function image of this discriminator normal? I trained from scratch with a newly selected dataset.

At least the loss tendency of discriminator should be oscillating instead of converging.

Note it would better to train with aligned data (or data with similar layouts, e.g., aligned face or cityscape-like data). If not, using a bigger model / pretraining / gan stabilization tricks for training.

@lilili666-hit
Copy link
Author

Can you send me all the loss curves of your training at that time? thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants