Skip to content

Latest commit

 

History

History

scripts

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Scripts for reproducing the results of the paper

This folder contains all the scripts needed to train the models presented in the paper, and to run the subsequent comparative analysis between COSMOS, parametric, and Flow-VAE light profiles.

The models are trained using the Galaxy2Galaxy library.

Note: the file names and directories found in this README are specific to the environment used to produce the results. They would need to be adapted for a different environment. In particular, be aware that exported model are placed in a timestamped directory automatically, and need to be moved out of these directories, to the parent directory, at every export step in the procedure listed below.

Step I: Setting up the environment

All results were obtained on the Jean Zay supercomputer of the CNRS IDRIS institute. The environment uses the following modules:

$ module load anaconda-py3/2019.03 cuda/10.0 cudnn/7.6.5.32-cuda-10.1 fftw/3.3.8 r

Once that environment is loaded, all dependencies can be installed using:

$ conda install -c conda-forge galsim
$ pip install tensorflow-gpu==1.15.2
$ pip install git+https://github.com/ml4astro/pixel-cnn.git
$ pip install git+https://github.com/ml4astro/GalFlow.git
$ pip install galaxy2galaxy

Then it will be necessary to download the GalSim COSMOS sample:

$ galsim_download_cosmos -s 25.2

This takes care of all the main dependencies, and all that is needed to train a model.

For the subsequent analysis, additional libraries can be needed:

$ pip install rpy2 matplotlib seaborn

and the SDMTools R library https://cran.r-project.org/src/contrib/Archive/SDMTools

Step II: Generating the dataset

To generate the dataset used to train the model:

$ g2g-datagen --problem=attrs2img_cosmos128 --data_dir=$WORK/g2g/datasets/attrs2img_cosmos128_nopadding

This command will generate a dataset of type attrs2img_cosmos128 i.e. for an "attributes to image" problem type such as the conditional image generation problem presented in the paper. The resulting data files will be stored in --data_dir.

Step III: Train the VAE

The most time consuming part of the process is the training of the VAE. Typically this requires about a day or two depending on the configuration. To train a model, we used the following script:

$ sbatch training_vae.job

This submits the training job on the queue. The job script contains the details of the configuration options used for training.

Once the model is trained, we export it as a TensorFlow Hub module with:

$ sbatch export_vae.job

Step IV: Training a latent Normalizing Flow

Once the VAE is exported, it is possible to train all sorts of Normalizing Flows to model its latent space.

$ sbatch training_flow.job

And the trained flow can be exported as a standalone TensorFlow Hub module with:

$ sbatch export_flow.job

Step V: Concatenating Normalizing Flow and VAE as a single generative model

G2G provides a utility function to combine a latent space model with an existing autoencoder:

Assuming a local clone of g2g is in the current directory:

$ python galaxy2galaxy/bin/concatenate_models.py \
       --decoder_module=$WORK/repo/deep_galaxy_models/modules/vae_16/decoder \
       --flow_module=$WORK/repo/deep_galaxy_models/modules/latent_maf_16/code_sampler \
       --export_dir=$WORK/repo/deep_galaxy_models/modules/flow_vae_maf_16

This will export a new model concatenating the latent Normalizing Flow to the VAE.

Step VI: Run the analysis script

Now that all models are trained an exported, the main analysis script can be run from the root directory:

$ sbatch mk_plots.job

This will populate the results directory, regenerating the results that can be otherwise downloaded from Zenodo.

From there, all plots can be regenerated by using the notebooks at the root of the repository.

Issue reporting

In case of any difficulties in reproducing the results or any questions, please feel free to open an issue on this repository, or directly ask the main author by email.