ConvL2LFlow

A convolutional-flow-based generative machine learning model for calorimeter shower simulation.

pytorch - for the neural network
nflows - for flow transformations
numpy - for some numeric calculations
scipy - to calculate wasserstein distances
matplotlib - for plotting
h5py - reading/writing .h5 files
pyyaml - reading/writing .yaml files
torchinfo - to summarize torch models
torchmetrics - for the evaluation of classifier results
scikit-learn - to calibrate classifier

Run Code

You can use dataset 3 of the Calo Challenges to train the networks. Note that the code was tested on a computer with 512 GB CPU RAM and 40 GB GPU RAM and might need adjustments ot run on machines with less memory.

First set up the environment using pip or conda.

Using conda (recommended):

conda env create -f environment.yml
conda activate convL2LFlows

Or using pip:

# ensure to use python3.10
# consider using a venv (https://docs.python.org/3/library/venv.html)
pip install -r requirements.txt

To download the data set and start the training run the following.

# download data sets
mkdir data
for i in {1..4}
do
    echo "get dataset_3_$i.hdf5"
    wget https://zenodo.org/record/6366324/files/dataset_3_$i.hdf5\?download\=1 -O data/dataset_3_$i.hdf5
done

# prepare data sets
python scripts/convert_challenge.py --shape 45 50 18 data/dataset_3_1.hdf5 data/dataset_3_2.hdf5 data/dataset_3.h5
python scripts/convert_challenge.py --shape 45 50 18 data/dataset_3_3.hdf5 data/dataset_3_4.hdf5 data/dataset_3_test.h5
python scripts/calc_layer_e.py -t 1.515e-5 data/dataset_3.h5
python scripts/calc_layer_e.py -t 1.515e-5 data/dataset_3_test.h5

# generate result directories
python mkresultdir.py conf/energy.yaml
python mkresultdir.py -f 0-44 conf/causal.yaml

There should be a folder named results containing two sub folders one for the energy distribution flow and one for the causal flows. Both contain a run.sh file whish you can use to start the training. After the training has finished you can generate new samples using following command (you have to adapt the names of the result folders):

python src/generator.py -n 100000 -b 10000 results/???_energy results/???_shower --log --veto 2.6

If you want to test the results you can run (you have to adapt the names of the result folders):

# train low level classifier
python src/classifier.py -c 1.515e-5 -m CConv -g data/dataset_3_test.h5 results/???_shower/samples00.h5
# train high level classifier
python src/high_level_classi.py -c 1.515e-5 -g data/dataset_3_test.h5 results/???_shower/samples00.h5
# calculate mean and std over the wasserstein distances between generated and test data
# the wasserstein distances are calculated 10 times using 10 equal sized splits of the data
# output: name, mean, std
python src/wasserstein.py -c 1.515e-5 -g data/dataset_3_test.h5 results/???_shower/samples00.h5

If you want to run the calo challenge evaluation script to produce some plots you have to convert the data format. (you have to adapt the names of the result folders)

python scripts/convert_challenge.py -i -n 2 --t 1.515e-5 results/???_shower/samples00.h5 data/dataset_3_conv_flow_{}.hdf5

If you have cloned the calo challenge evaluation script you can run it from within the code folder using something like:

python evaluate.py --input_file data/dataset_3_conv_flow_1.hdf5 --input_file_2 data/dataset_3_conv_flow_2.hdf5 --reference_file data/dataset_3_3.hdf5 --reference_file_2 data/dataset_3_4.hdf5 --dataset 3 --mode all --save_mem

Configuration File

This is a list of the parameters that can be specified in yaml parameter files conf/*.yaml. Many have default values, such that not all parameters have to be specified.

run parameters

Parameter	Type	Description
run_name	string	name for the output folder

dataset parameters

Parameter	Type	Description
class	string	"ShowerDataset" or "EnergyDataset"
data_file	string	path of the .h5-file containing the training data
noise_mode	string	noise added to all entries ["uniform", "gaussian", "log_normal", null]
noise_mean	float	mean of the noise distribution (for "log_normal" it should be given in log10 space)
noise_std	float	option only valid for "gaussian" and "log_normal"
extra_noise_mode	string	only for "ShowerDataset" noise to fill zero voxels with
extra_noise_mean	float	equivalent to noise_mean
extra_noise_std	float	equivalent to noise_std
samples_trafo	list	preprocessing for sample, will be inverted during generation
cond_trafo	list	preprocessing for incident energy
cond2_trafo	list	preprocessing for layer energy, only for "ShowerDataset"
device	string	where to store the data during training (if not the device you train on, data will be moved bach wise)
random_shift	boolean	enable data augmentation by random shift

trainer parameters

Parameter	Type	Description
learning_rate	float	learning rate
scheduler	string	type of LR scheduling: "Step", "Exponential" or "OneCycle"
weight_decay	float	L2 weight decay
num_epochs	integer	number of training epochs
grad_clip	float	if given, a L2 gradient clip with the given value is applied

dataloader parameters

Parameter	Type	Description
batch_size	integer	batch size
pin_memory	boolean	use memory pinning (only possible if dataset.device is cpu)

flow parameters

Parameter	Type	Description
class	string	"ConvFlow" or "MAFlow"
num_blocks	integer	number of coupling or MADE blocks
num_layers	integer	number of layers in each MADE block (only for "MAFlow")
dropout	float	dropout fraction for the sub-networks (only for "MAFlow")
num_bins	integer	number of spline bins
coupling_block	string	"additive", "affine", (piecewise) "linear", (piecewise) "quadratic", (piecewise) "cubic" or (piecewise) "rational_quadratic" (only for "ConvFlow")
use_act_norm	boolean	use activation norm (only for "ConvFlow")
use_one_by_ones	boolean	replace permutations by GLOW 1x1 convolutions (only for "ConvFlow")
squeeze	integer	squeeze factor (only for "ConvFlow")
subnet_args	dict	arguments for the sub-networks, see U-Net parameters (only for "ConvFlow")

U-Net parameters

Parameter	Type	Description
hidden	integer	hidden layer size
cyclic_padding	boolean	use cyclic instead of zero padding in last dimension
downsamples	list	list containing the kernel sizes of all down sample operations inside the U-Net
identity_init	boolean	initialism the last layer to output zeros always

For questions/comments about the code contact: thorsten.buss@uni-hamburg.de

This code was written for the paper:

Convolutional L2LFlows: Generating Accurate Showers in Highly Granular Calorimeters Using Convolutional Normalizing Flows
https://arxiv.org/abs/2405.20407
Thorsten Buss, Frank Gaede, Gregor Kasieczka, Claudius Krause, David Shih

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConvL2LFlow

Table of Contents

Dependencies

Run Code

Configuration File

run parameters

dataset parameters

trainer parameters

dataloader parameters

flow parameters

U-Net parameters

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
conf		conf
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
mkresultdir.py		mkresultdir.py
requirements.txt		requirements.txt

License

FLC-QU-hep/ConvL2LFlow

Folders and files

Latest commit

History

Repository files navigation

ConvL2LFlow

Table of Contents

Dependencies

Run Code

Configuration File

run parameters

dataset parameters

trainer parameters

dataloader parameters

flow parameters

U-Net parameters

About

Topics

Resources

License

Stars

Watchers

Forks

Languages