Code cleaning (#115)

* update documentation of train.py and predict.py * add new docs * force correct model naming if emulator_label is provided * clean hardcodings * update docs developers * new model cp * fix bugs compressed parameters * chunksize * chunksize * new model with running * new docu under develop. * update with new files to ignore * new structure docs * cov data for lace * modify function to make inference cleaner * name star parameters with _cp * updated notebooks * update cosmopower documentation * Remove DESI_cov from git tracking * ne notebooks version * new notebooks * add desi data to gitignore --------- Co-authored-by: Laura Cabayol Garcia <lauracabayol@Lauras-MacBook-Pro.local> Co-authored-by: Laura Cabayol-Garcia <lcabayol@login13.chn.perlmutter.nersc.gov> Co-authored-by: Laura Cabayol Garcia <lauracabayol@lauras-macbook-pro.home>
igmhub · Dec 19, 2024 · 656f626 · 656f626
1 parent e552f1f
commit 656f626
Show file tree

Hide file tree

Showing 28 changed files with 1,282 additions and 390 deletions.
diff --git a/.gitignore b/.gitignore
@@ -7,3 +7,12 @@ __pycache__
 .github/workflows/.python-tests.yml.swp
 docs/build/
 docs/site/
+data/validation_figures/tests/
+data/cosmopower_models/*.dat
+data/cosmopower_models/*.txt
+data/cosmopower_models/*.npz
+data/DESI_cov/
+
+*.DS_Store
+build/
+data/NNmodels/testing_models/
diff --git a/config_files/config_predict.yaml b/config_files/config_predict.yaml
@@ -3,8 +3,9 @@
 ## Emulator type: you can choose between a NN emulator or a GP emulator.
 emulator_type: "NN"
 
-## Emulator label: if you want to train a predefined model, you can use the emulator_label. Other possible labels are in the README.md file.
-emulator_label: "Nyx_alphap_cov"
+## Emulator label: if you want to train a predefined model, you can use the emulator_label. Other possible labels are in the docs.
+emulator_label: "Cabayol23+"
+training_set: null
 
 ## If the emulator needs to be trained without a specific simulation.
 drop_sim: null
@@ -25,3 +26,18 @@ average_over_z: false
 ## Where to save the trained model from the project root directory
 save_plot_path: "data/validation_figures/tests/test_p1d_err.png" 
 save_predictions_path: null
+
+hyperparameters:
+  kmax_Mpc: 4
+  ndeg: 5
+  nepochs: 100
+  step_size: 75
+  drop_z: null
+  weighted_emulator: true
+  nhidden: 5
+  max_neurons: 50
+  seed: 32
+  lr0: 1e-3
+  batch_size: 100
+  weight_decay: 1e-4
+  z_max: 10
diff --git a/config_files/config_train.yaml b/config_files/config_train.yaml
@@ -4,33 +4,35 @@
 emulator_type: "NN"
 
 ## Emulator label: if you want to train a predefined model, you can use the emulator_label. Other possible labels are in the README.md file.
-emulator_label: "Cabayol23+"
+emulator_label: null #"Cabayol23+"
 
 ## For the data, you can choose between loading and archive or a training set. You can only provide one of them.
 ### If you choose to load an archive, you can choose between Nyx or Gadget and a version.
 archive:
-  file: Gadget  # Can be either "Nyx" or "Gadget"
+  file: "Gadget"  # Can be either "Nyx" or "Gadget"
   version: "Cabayol23" #nyx version or Gadget postprocessing version
 
 training_set: null  #Predefined training sets. You can find the options in the README.md file.
 
+drop_sim: null
+
 ## If no emulator_label is provided, you need to provide the hyperparameters for training a new model and the emulator parameters
 emulator_params: ["Delta2_p", "n_p", "alpha_p", "sigT_Mpc", "gamma", "kF_Mpc"]
 hyperparameters:
-  kmax_Mpc: 4
+  kmax_Mpc: 4.0
   ndeg: 5
-  nepochs: 100
+  nepochs: 1
   step_size: 75
   drop_sim: null
   drop_z: null
   weighted_emulator: true
   nhidden: 5
   max_neurons: 50
   seed: 32
-  lr0: 1e-3
+  lr0: 0.001
   batch_size: 100
-  weight_decay: 1e-4
-  z_max: 10
+  weight_decay: 0.0001
+  z_max: 10.0
 
 # Where to save the trained model from the project root directory
-save_path: "data/NNmodels/testing_models/test.pt" 
+save_path: "data/NNmodels/testing_models/test.pt" 
diff --git a/data/cosmopower_models/Pk_cp_NN_nrun.pkl b/data/cosmopower_models/Pk_cp_NN_nrun.pkl
diff --git a/data/cosmopower_models/Pk_cp_NN_sumnu.pkl b/data/cosmopower_models/Pk_cp_NN_sumnu.pkl
diff --git a/docs/docs/developers/CreateNewEmulator.md b/docs/docs/developers/CreateNewEmulator.md
@@ -90,11 +90,14 @@ emulator = emulator_manager(emulator_label="New_Emulator"
 ```
 
 In the first case, since you are specifying the `model path`, there is no naming convention for the model file. However, in the second case, the saved models must be stored in the following way:
+
 - The folder must be  `data/NNmodels/` from the root of the repository.
 - For a specific emulator label, you need to create a new folder, e.g. `New_Emulator`.
 - For the emulator using all training simulations, the model file is named `New_Emulator.pt`.
 - For the emulator using the training set excluding a given simulation, the model file is named `New_Emulator_drop_sim_{simulation suite}_{simulation index}.pt`. For example, if you exclude the 10th simulation from the mpg training set, the model file is named `New_Emulator_drop_sim_mpg_10.pt`.   
 
+**For this reason, if the emulator label is provided, it will be saved following the naming convention even if another model path is specified.**
+
 The emulator manager will automatically find the correct model file for the given emulator label. To set this up, you need to add the new emulator label to the `folder` dictionary in the `emulator_manager.py` file.
 ```python
 folder = {

diff --git a/docs/docs/developers/trainingOptions.md → docs/docs/developers/UnderDevelopment.md b/docs/docs/developers/trainingOptions.md → docs/docs/developers/UnderDevelopment.md
@@ -1,4 +1,4 @@
-# TRAINING OPTIONS
+# UNDER DEVELOPMENT SOLUTIONS
 
 There are several features that can be used to customize the training of the emulators. This tutorial will guide you through the process of training emulators with different options.
 
@@ -10,9 +10,9 @@ The emulator supports weighting the training simulations with a covariance matri
 
 To train an emulator with a covariance matrix, you need to provide a covariance matrix for the training simulations. Currently, the emulator only supports a diagonal covariance matrix. It is important that the covariance matrix is given in the __k__ binning of the training simulations.
 
-The function '_load_DESIY1_err' in the `nn_emulator.py` file loads a covariance matrix. The covariance must be a json file with the relative error as a function of __z__ for each __k__ bin.
+The function `_load_DESIY1_err` in the `nn_emulator.py` file loads a covariance matrix. The covariance must be a json file with the relative error as a function of __z__ for each __k__ bin.
 
-From the relative error file in 'data/DESI_cov/rel_err_DESI_Y1.npy', we can generate the json file with the following steps:
+From the relative error file in `data/DESI_cov/rel_err_DESI_Y1.npy`, we can generate the json file with the following steps:
 
 First we load the data from the relative error file:
 

diff --git a/docs/docs/developers/documentation.md b/docs/docs/developers/documentation.md
@@ -17,9 +17,9 @@ The `gh-pages` branch is automatically updated when a PR is merged into `main`.
 In order to write documentation, you can use the following structure:
 
 - `docs/docs/developers`: Documentation for developers
-- `docs/docs/`: Documentation for users
+- `docs/docs/users`: Documentation for users
 
-You can add new pages by adding a new `.md` file to the `docs/docs/` folder. Remember to add the new page to the `mkdocs.yml` file so that it is included in the documentation. The new page will automatically be added to the navigation menu. 
-
-To have a cleaner structure, add the new page to the corresponding `index.md` file.
+To add a new page, you should create a new `.md` file to the `docs/docs/` folder. 
+To define where this document should be included and the structure of the documentation, add the new page to the `mkdocs.yml`. The new page will automatically be added to the navigation menu. 
+To have a cleaner structure, add the new page to the corresponding `index.md` file. The documentation is structured with an index file for each section.
 
diff --git a/docs/docs/developers/index.md b/docs/docs/developers/index.md
@@ -5,7 +5,7 @@ Welcome to the LaCE developer documentation! This section contains information f
 ## Contents
 
 - [Creating New Emulators](CreateNewEmulator.md): Learn how to create and add new emulator types to LaCE
-- [Training Options](trainingOptions.md): Implemented solutions to improve the emulators performance
+- [Training Options](UnderDevelopment.md): Implemented solutions to improve the emulators performance
 - [Code Testing](advancedTesting.md): Information to mantain and extend the automated testing
 - [Documentation](documentation.md): How to write and maintain documentation
 

diff --git a/docs/docs/users/Emulators_trainingSets.md b/docs/docs/users/Emulators_trainingSets.md
@@ -0,0 +1,56 @@
+# PREDEFINED EMULATORS AND TRAINING SETS
+
+## PREDEFINED EMULATORS
+LaCE provides a set of predefined emulators that have been validated. These emulators are:
+
+- Neural network emulators:
+    - Gadget emulators: 
+        - Cabayol23: Neural network emulating the optimal P1D of Gadget simulations fitting coefficients to a 5th degree polynomial. It goes to scales of 4Mpc^{-1} and z<=4.5.
+        - Cabayol23+: Neural network emulating the optimal P1D of Gadget simulations fitting coefficients to a 5th degree polynomial. It goes to scales of 4Mpc^{-1} and z<=4.5. Updated version compared to Cabayol+23 paper.
+        - Cabayol23_extended: Neural network emulating the optimal P1D of Gadget simulations fitting coefficients to a 7th degree polynomial. It goes to scales of 8Mpc^{-1} and z<=4.5.
+        - Cabayol23+_extended: Neural network emulating the optimal P1D of Gadget simulations fitting coefficients to a 5th degree polynomial. It goes to scales of 4Mpc^{-1} and z<=4.5. Updated version compared to Cabayol+23 paper.
+    - Nyx emulators:
+        - Nyx_v0: Neural network emulating the optimal P1D of Nyx simulations fitting coefficients to a 6th degree polynomial. It goes to scales of 4Mpc^{-1} and z<=4.5.
+        - Nyx_v0_extended: Neural network emulating the optimal P1D of Nyx simulations fitting coefficients to a 6th degree polynomial. It goes to scales of 8Mpc^{-1} and z<=4.5.
+        - Nyx_alphap: Neural network emulating the optimal P1D of Nyx simulations fitting coefficients to a 6th degree polynomial. It goes to scales of 4Mpc^{-1} and z<=4.5.
+        - Nyx_alphap_extended: Neural network emulating the optimal P1D of Nyx simulations fitting coefficients to a 6th degree polynomial. It goes to scales of 8Mpc^{-1} and z<=4.5.
+        - Nyx_alphap_cov: Neural network under testing for the Nyx_alphap emulator.
+
+- Gaussian Process emulators:
+    - Gadget emulators:
+        - "Pedersen21": Gaussian process emulating the optimal P1D of Gadget simulations. Pedersen+21 paper.
+        - "Pedersen23": Updated version of Pedersen21 emulator. Pedersen+23 paper.
+        - "Pedersen21_ext": Extended version of Pedersen21 emulator.
+        - "Pedersen21_ext8": Extended version of Pedersen21 emulator up to k=8 Mpc^-1.
+        - "Pedersen23_ext": Extended version of Pedersen23 emulator.
+        - "Pedersen23_ext8": Extended version of Pedersen23 emulator up to k=8 Mpc^-1.
+
+## PREDEFINED TRAINING SETS
+
+Similarly, LaCE provides a set of predefined training sets that have been used to train the emulators. These training sets correspond to a simulations suite, a postprocessing and the addition (or not) of mean flux rescalings. The training sets are:
+
+- "Pedersen21": Training set used in [Pedersen+21 paper](https://arxiv.org/abs/2103.05195). Gadget simulations without mean flux rescalings.
+- "Cabayol23": Training set used in [Cabayol+23 paper](https://arxiv.org/abs/2303.05195). Gadget simulations with mean flux rescalings and measuring the P1D along the three principal axes of the simulation box.
+- "Nyx_Oct2023": Training set using Nyx version from October 2023.
+- "Nyx_Jul2024": Training set using Nyx version from July 2024.
+
+## CONNECTION BETWEEN PREDEFINED EMULATORS AND TRAINING SETS
+The following table shows the default training set for each predefined emulator.
+
+| Emulator | Training Set | Simulation | Type | Description |
+|----------|--------------|------------|------|-------------|
+| Cabayol23 | Cabayol23 | Gadget | NN | Neural network emulator trained on Gadget simulations with mean flux rescaling |
+| Cabayol23+ | Cabayol23 | Gadget | NN | Updated version of Cabayol23 emulator |
+| Cabayol23_extended | Cabayol23 | Gadget | NN | Extended version of Cabayol23 emulator (k up to 8 Mpc^-1) |
+| Cabayol23+_extended | Cabayol23 | Gadget | NN | Extended version of Cabayol23+ emulator (k up to 8 Mpc^-1) |
+| Nyx_v0 | Nyx_Oct2023 | Nyx | NN | Neural network emulator trained on Nyx simulations |
+| Nyx_v0_extended | Nyx_Oct2023 | Nyx | NN | Extended version of Nyx_v0 emulator (k up to 8 Mpc^-1) |
+| Nyx_alphap | Nyx_Oct2023 | Nyx | NN | Neural network emulator trained on updated Nyx simulations |
+| Nyx_alphap_extended | Nyx_Oct2023 | Nyx | NN | Extended version of Nyx_alphap emulator (k up to 8 Mpc^-1) |
+| Nyx_alphap_cov | Nyx_Jul2024 | Nyx | NN | Testing version of Nyx_alphap emulator |
+| Pedersen21 | Pedersen21 | Gadget | GP | GP emulator trained on Gadget simulations without mean flux rescaling |
+| Pedersen23 | Pedersen21 | Gadget | GP | Updated version of Pedersen21 GP emulator |
+| Pedersen21_ext | Pedersen21 | Gadget | GP | Extended version of Pedersen21 GP emulator |
+| Pedersen21_ext8 | Pedersen21 | Gadget | GP | Extended version of Pedersen21 GP emulator (k up to 8 Mpc^-1) |
+| Pedersen23_ext | Pedersen21 | Gadget | GP | Extended version of Pedersen23 GP emulator |
+| Pedersen23_ext8 | Pedersen21 | Gadget | GP | Extended version of Pedersen23 GP emulator (k up to 8 Mpc^-1) |
diff --git a/docs/docs/users/Simulations_list.md b/docs/docs/users/Simulations_list.md
@@ -0,0 +1,27 @@
+# AVAILABLE SIMULATIONS
+
+This section contains the list of simulations available in the archives. 
+
+## Gadget simulations
+The Gadget simulations contain 30 training simulations, which are named as "mpg_{x}", where x is an integer number from 0 to 29.
+Additionaly, there are 7 test simulations:
+
+- "mpg_central": The simulation parameters are at the center of the parameter space.
+- "mpg_neutrinos": The simulation contains massive neutrinos.
+- "mpg_running": The simulation has a non-zero running of the spectral index.
+- "mpg_growth": The growth factor of the simulation is different from that of the training set.
+- "mpg_reio": The reionization history is different from that of the training set.
+- "mpg_seed": Identical to the central simulation with different initial conditions. Meant to test the impact of cosmic variance. 
+- "mpg_curved": The simulation has a different curvature power spectrum from that of the training set.
+
+For information about the simulation parameters can be found in [Pedersen+21](https://arxiv.org/abs/2103.05195).
+
+## Nyx simulations
+
+The Nyx simulation suite contains 18 training simulations, which are named as "nyx_{x}", where x is an integer number from 0 to 17. Additionally, there are 2 test simulations:
+
+- "nyx_central": The simulation parameters are at the center of the parameter space.
+- "nyx_seed": Identical to the central simulation with different initial conditions. Meant to test the impact of cosmic variance.
+
+
+For information about the simulation parameters can be found in [TBD](..).
diff --git a/docs/docs/users/archive.md b/docs/docs/users/archive.md
@@ -1,11 +1,12 @@
-# ARCHIVE
+# ARCHIVES
 
 The LaCE emulators support two types of archives:
+
 - Gadget archive: Contains the P1D of Gadget simulations described in [Pedersen+21](https://arxiv.org/abs/2011.15127).  
 - Nyx archive: Contains the P1D of Nyx simulations described in (In prep.)
 
 ## Loading a Gadget Archive
-The Gadget archive contains 30 training simulations and seven test simulations. Each simulation contains 11 snapshotts covering redshifts from 2 to 4.5 in steps of 0.25
+The Gadget archive contains 30 training simulations and 7 test simulations. Each simulation contains 11 snapshotts covering redshifts from 2 to 4.5 in steps of 0.25.
 
 To laod a Gadget archive, you can use the `GadgetArchive` class:
 ```python
@@ -17,16 +18,23 @@ The P1D from the Gadget archive with the Pedersen+21 post-processing can be acce
 ```python
 archive = GadgetArchive(postproc='Pedersen21')
 ```
-This post-processing measures the P1D along one of the three box axes and contains three mean-flux rescaling per snapshot.
+This post-processing measures the P1D along one of the three box axes and does not contain mean-flux rescalings.
+
+On the other hand, the P1D from the Gadget archive with the Cabayol+23 post-processing can be accessed as follows:
+```python
+archive = GadgetArchive(postproc='Cabayol23')
+```
+This post-processing measures the P1D along the three box principal axes and contains five mean-flux rescaling per snapshot.
 
 ## Loading a Nyx Archive
 To load the Nyx archive, you can use the `NyxArchive` class:
+
 ```python
 from lace.archive.nyx_archive import NyxArchive
 ```
-Since the Nyx archive is not publicly available yet, you need to set the `NYX_PATH` environment variable to the path to the Nyx files on your local computer.
+Since the Nyx archive is not publicly available yet, **you need to set the `NYX_PATH` environment variable to the path to the Nyx files** on your local computer (or the cluster where you are running the code).
 
-There are two versions of the Nyx archive available: `Oct2023` and `Jul2024`. The first one contains 17 training simulations and 4 test simulations, and the second one contains 17 training simulations and 3 test simulations. Each simulation contains 14 snapshotts covering redshifts from 2.2 to 4.8 in steps of 0.2 plus additional snapshotts at higher redshifts for some of the simulations. In both cases, it is not recommended to use simulation number 14. 
+There are two versions of the Nyx archive available: Oct2023 and Jul2024. The first one contains 17 training simulations and 4 test simulations, and the second one contains 17 training simulations and 3 test simulations (the simulations are better described [here](./Simulations_list.md)). Each simulation contains 14 snapshots covering redshifts from 2.2 to 4.8 in steps of 0.2 plus additional snapshotts at higher redshifts for some of the simulations. In both cases, it is not recommended to use simulation number 14. 
 
 The P1D from the Nyx archive with the Oct2023 version can be accessed as follows:
 ```python
@@ -50,17 +58,4 @@ For the test set, the equivalent function is:
 ```python
 archive.get_testing_data(sim_label='mpg_central')
 ```
-where you can replace `sim_label` by any of the test simulation labels available in the archive. This will only load the fiducial snapshots without mean flux rescalings. 
-
-## Key keywords in the archive
-The archive contains many keywords that can be used to access specific data. Here is a non-exhaustive list of the most important ones:
-
-- `sim_label`: The label of the simulation. It can be any of the test simulation labels available in the archive.
-- `z`: The snapshot redshift.
-- `ind_axis`: Indicates the axis along which the P1D is measured. It can be 0, 1, 2 or 'average'
-- `ind_rescaling`: The index of mean-flux rescaling of the P1D.
-- `val_scaling`: The value of mean-flux rescaling of the P1D.
-- `cosmo_params`: A dictionary containing the cosmological parameters of the simulation.
-- `p1d_Mpc`: The P1D in Mpc.
-- `k_Mpc`: The wavevector in Mpc.
-
+where you can replace `sim_label` by any of the test simulation labels available in the archive (see [here](./Simulations_list.md)). This will only load the fiducial snapshots without mean flux rescalings. ¡