Skip to content

Commit

Permalink
Code cleaning (#115)
Browse files Browse the repository at this point in the history
* update documentation of train.py and predict.py

* add new docs

* force correct model naming if emulator_label is provided

* clean hardcodings

* update docs developers

* new model cp

* fix bugs compressed parameters

* chunksize

* chunksize

* new model with running

* new docu under develop.

* update with new files to ignore

* new structure docs

* cov data for lace

* modify function to make inference cleaner

* name star parameters with _cp

* updated notebooks

* update cosmopower documentation

* Remove DESI_cov from git tracking

* ne notebooks version

* new notebooks

* add desi data to gitignore

---------

Co-authored-by: Laura Cabayol Garcia <lauracabayol@Lauras-MacBook-Pro.local>
Co-authored-by: Laura Cabayol-Garcia <lcabayol@login13.chn.perlmutter.nersc.gov>
Co-authored-by: Laura Cabayol Garcia <lauracabayol@lauras-macbook-pro.home>
  • Loading branch information
4 people authored Dec 19, 2024
1 parent e552f1f commit 656f626
Show file tree
Hide file tree
Showing 28 changed files with 1,282 additions and 390 deletions.
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,12 @@ __pycache__
.github/workflows/.python-tests.yml.swp
docs/build/
docs/site/
data/validation_figures/tests/
data/cosmopower_models/*.dat
data/cosmopower_models/*.txt
data/cosmopower_models/*.npz
data/DESI_cov/

*.DS_Store
build/
data/NNmodels/testing_models/
20 changes: 18 additions & 2 deletions config_files/config_predict.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
## Emulator type: you can choose between a NN emulator or a GP emulator.
emulator_type: "NN"

## Emulator label: if you want to train a predefined model, you can use the emulator_label. Other possible labels are in the README.md file.
emulator_label: "Nyx_alphap_cov"
## Emulator label: if you want to train a predefined model, you can use the emulator_label. Other possible labels are in the docs.
emulator_label: "Cabayol23+"
training_set: null

## If the emulator needs to be trained without a specific simulation.
drop_sim: null
Expand All @@ -25,3 +26,18 @@ average_over_z: false
## Where to save the trained model from the project root directory
save_plot_path: "data/validation_figures/tests/test_p1d_err.png"
save_predictions_path: null

hyperparameters:
kmax_Mpc: 4
ndeg: 5
nepochs: 100
step_size: 75
drop_z: null
weighted_emulator: true
nhidden: 5
max_neurons: 50
seed: 32
lr0: 1e-3
batch_size: 100
weight_decay: 1e-4
z_max: 10
18 changes: 10 additions & 8 deletions config_files/config_train.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,33 +4,35 @@
emulator_type: "NN"

## Emulator label: if you want to train a predefined model, you can use the emulator_label. Other possible labels are in the README.md file.
emulator_label: "Cabayol23+"
emulator_label: null #"Cabayol23+"

## For the data, you can choose between loading and archive or a training set. You can only provide one of them.
### If you choose to load an archive, you can choose between Nyx or Gadget and a version.
archive:
file: Gadget # Can be either "Nyx" or "Gadget"
file: "Gadget" # Can be either "Nyx" or "Gadget"
version: "Cabayol23" #nyx version or Gadget postprocessing version

training_set: null #Predefined training sets. You can find the options in the README.md file.

drop_sim: null

## If no emulator_label is provided, you need to provide the hyperparameters for training a new model and the emulator parameters
emulator_params: ["Delta2_p", "n_p", "alpha_p", "sigT_Mpc", "gamma", "kF_Mpc"]
hyperparameters:
kmax_Mpc: 4
kmax_Mpc: 4.0
ndeg: 5
nepochs: 100
nepochs: 1
step_size: 75
drop_sim: null
drop_z: null
weighted_emulator: true
nhidden: 5
max_neurons: 50
seed: 32
lr0: 1e-3
lr0: 0.001
batch_size: 100
weight_decay: 1e-4
z_max: 10
weight_decay: 0.0001
z_max: 10.0

# Where to save the trained model from the project root directory
save_path: "data/NNmodels/testing_models/test.pt"
save_path: "data/NNmodels/testing_models/test.pt"
Binary file added data/cosmopower_models/Pk_cp_NN_nrun.pkl
Binary file not shown.
Binary file modified data/cosmopower_models/Pk_cp_NN_sumnu.pkl
Binary file not shown.
3 changes: 3 additions & 0 deletions docs/docs/developers/CreateNewEmulator.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,11 +90,14 @@ emulator = emulator_manager(emulator_label="New_Emulator"
```

In the first case, since you are specifying the `model path`, there is no naming convention for the model file. However, in the second case, the saved models must be stored in the following way:

- The folder must be `data/NNmodels/` from the root of the repository.
- For a specific emulator label, you need to create a new folder, e.g. `New_Emulator`.
- For the emulator using all training simulations, the model file is named `New_Emulator.pt`.
- For the emulator using the training set excluding a given simulation, the model file is named `New_Emulator_drop_sim_{simulation suite}_{simulation index}.pt`. For example, if you exclude the 10th simulation from the mpg training set, the model file is named `New_Emulator_drop_sim_mpg_10.pt`.

**For this reason, if the emulator label is provided, it will be saved following the naming convention even if another model path is specified.**

The emulator manager will automatically find the correct model file for the given emulator label. To set this up, you need to add the new emulator label to the `folder` dictionary in the `emulator_manager.py` file.
```python
folder = {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# TRAINING OPTIONS
# UNDER DEVELOPMENT SOLUTIONS

There are several features that can be used to customize the training of the emulators. This tutorial will guide you through the process of training emulators with different options.

Expand All @@ -10,9 +10,9 @@ The emulator supports weighting the training simulations with a covariance matri

To train an emulator with a covariance matrix, you need to provide a covariance matrix for the training simulations. Currently, the emulator only supports a diagonal covariance matrix. It is important that the covariance matrix is given in the __k__ binning of the training simulations.

The function '_load_DESIY1_err' in the `nn_emulator.py` file loads a covariance matrix. The covariance must be a json file with the relative error as a function of __z__ for each __k__ bin.
The function `_load_DESIY1_err` in the `nn_emulator.py` file loads a covariance matrix. The covariance must be a json file with the relative error as a function of __z__ for each __k__ bin.

From the relative error file in 'data/DESI_cov/rel_err_DESI_Y1.npy', we can generate the json file with the following steps:
From the relative error file in `data/DESI_cov/rel_err_DESI_Y1.npy`, we can generate the json file with the following steps:

First we load the data from the relative error file:

Expand Down
8 changes: 4 additions & 4 deletions docs/docs/developers/documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ The `gh-pages` branch is automatically updated when a PR is merged into `main`.
In order to write documentation, you can use the following structure:

- `docs/docs/developers`: Documentation for developers
- `docs/docs/`: Documentation for users
- `docs/docs/users`: Documentation for users

You can add new pages by adding a new `.md` file to the `docs/docs/` folder. Remember to add the new page to the `mkdocs.yml` file so that it is included in the documentation. The new page will automatically be added to the navigation menu.

To have a cleaner structure, add the new page to the corresponding `index.md` file.
To add a new page, you should create a new `.md` file to the `docs/docs/` folder.
To define where this document should be included and the structure of the documentation, add the new page to the `mkdocs.yml`. The new page will automatically be added to the navigation menu.
To have a cleaner structure, add the new page to the corresponding `index.md` file. The documentation is structured with an index file for each section.

2 changes: 1 addition & 1 deletion docs/docs/developers/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Welcome to the LaCE developer documentation! This section contains information f
## Contents

- [Creating New Emulators](CreateNewEmulator.md): Learn how to create and add new emulator types to LaCE
- [Training Options](trainingOptions.md): Implemented solutions to improve the emulators performance
- [Training Options](UnderDevelopment.md): Implemented solutions to improve the emulators performance
- [Code Testing](advancedTesting.md): Information to mantain and extend the automated testing
- [Documentation](documentation.md): How to write and maintain documentation

Expand Down
56 changes: 56 additions & 0 deletions docs/docs/users/Emulators_trainingSets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# PREDEFINED EMULATORS AND TRAINING SETS

## PREDEFINED EMULATORS
LaCE provides a set of predefined emulators that have been validated. These emulators are:

- Neural network emulators:
- Gadget emulators:
- Cabayol23: Neural network emulating the optimal P1D of Gadget simulations fitting coefficients to a 5th degree polynomial. It goes to scales of 4Mpc^{-1} and z<=4.5.
- Cabayol23+: Neural network emulating the optimal P1D of Gadget simulations fitting coefficients to a 5th degree polynomial. It goes to scales of 4Mpc^{-1} and z<=4.5. Updated version compared to Cabayol+23 paper.
- Cabayol23_extended: Neural network emulating the optimal P1D of Gadget simulations fitting coefficients to a 7th degree polynomial. It goes to scales of 8Mpc^{-1} and z<=4.5.
- Cabayol23+_extended: Neural network emulating the optimal P1D of Gadget simulations fitting coefficients to a 5th degree polynomial. It goes to scales of 4Mpc^{-1} and z<=4.5. Updated version compared to Cabayol+23 paper.
- Nyx emulators:
- Nyx_v0: Neural network emulating the optimal P1D of Nyx simulations fitting coefficients to a 6th degree polynomial. It goes to scales of 4Mpc^{-1} and z<=4.5.
- Nyx_v0_extended: Neural network emulating the optimal P1D of Nyx simulations fitting coefficients to a 6th degree polynomial. It goes to scales of 8Mpc^{-1} and z<=4.5.
- Nyx_alphap: Neural network emulating the optimal P1D of Nyx simulations fitting coefficients to a 6th degree polynomial. It goes to scales of 4Mpc^{-1} and z<=4.5.
- Nyx_alphap_extended: Neural network emulating the optimal P1D of Nyx simulations fitting coefficients to a 6th degree polynomial. It goes to scales of 8Mpc^{-1} and z<=4.5.
- Nyx_alphap_cov: Neural network under testing for the Nyx_alphap emulator.

- Gaussian Process emulators:
- Gadget emulators:
- "Pedersen21": Gaussian process emulating the optimal P1D of Gadget simulations. Pedersen+21 paper.
- "Pedersen23": Updated version of Pedersen21 emulator. Pedersen+23 paper.
- "Pedersen21_ext": Extended version of Pedersen21 emulator.
- "Pedersen21_ext8": Extended version of Pedersen21 emulator up to k=8 Mpc^-1.
- "Pedersen23_ext": Extended version of Pedersen23 emulator.
- "Pedersen23_ext8": Extended version of Pedersen23 emulator up to k=8 Mpc^-1.

## PREDEFINED TRAINING SETS

Similarly, LaCE provides a set of predefined training sets that have been used to train the emulators. These training sets correspond to a simulations suite, a postprocessing and the addition (or not) of mean flux rescalings. The training sets are:

- "Pedersen21": Training set used in [Pedersen+21 paper](https://arxiv.org/abs/2103.05195). Gadget simulations without mean flux rescalings.
- "Cabayol23": Training set used in [Cabayol+23 paper](https://arxiv.org/abs/2303.05195). Gadget simulations with mean flux rescalings and measuring the P1D along the three principal axes of the simulation box.
- "Nyx_Oct2023": Training set using Nyx version from October 2023.
- "Nyx_Jul2024": Training set using Nyx version from July 2024.

## CONNECTION BETWEEN PREDEFINED EMULATORS AND TRAINING SETS
The following table shows the default training set for each predefined emulator.

| Emulator | Training Set | Simulation | Type | Description |
|----------|--------------|------------|------|-------------|
| Cabayol23 | Cabayol23 | Gadget | NN | Neural network emulator trained on Gadget simulations with mean flux rescaling |
| Cabayol23+ | Cabayol23 | Gadget | NN | Updated version of Cabayol23 emulator |
| Cabayol23_extended | Cabayol23 | Gadget | NN | Extended version of Cabayol23 emulator (k up to 8 Mpc^-1) |
| Cabayol23+_extended | Cabayol23 | Gadget | NN | Extended version of Cabayol23+ emulator (k up to 8 Mpc^-1) |
| Nyx_v0 | Nyx_Oct2023 | Nyx | NN | Neural network emulator trained on Nyx simulations |
| Nyx_v0_extended | Nyx_Oct2023 | Nyx | NN | Extended version of Nyx_v0 emulator (k up to 8 Mpc^-1) |
| Nyx_alphap | Nyx_Oct2023 | Nyx | NN | Neural network emulator trained on updated Nyx simulations |
| Nyx_alphap_extended | Nyx_Oct2023 | Nyx | NN | Extended version of Nyx_alphap emulator (k up to 8 Mpc^-1) |
| Nyx_alphap_cov | Nyx_Jul2024 | Nyx | NN | Testing version of Nyx_alphap emulator |
| Pedersen21 | Pedersen21 | Gadget | GP | GP emulator trained on Gadget simulations without mean flux rescaling |
| Pedersen23 | Pedersen21 | Gadget | GP | Updated version of Pedersen21 GP emulator |
| Pedersen21_ext | Pedersen21 | Gadget | GP | Extended version of Pedersen21 GP emulator |
| Pedersen21_ext8 | Pedersen21 | Gadget | GP | Extended version of Pedersen21 GP emulator (k up to 8 Mpc^-1) |
| Pedersen23_ext | Pedersen21 | Gadget | GP | Extended version of Pedersen23 GP emulator |
| Pedersen23_ext8 | Pedersen21 | Gadget | GP | Extended version of Pedersen23 GP emulator (k up to 8 Mpc^-1) |
27 changes: 27 additions & 0 deletions docs/docs/users/Simulations_list.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# AVAILABLE SIMULATIONS

This section contains the list of simulations available in the archives.

## Gadget simulations
The Gadget simulations contain 30 training simulations, which are named as "mpg_{x}", where x is an integer number from 0 to 29.
Additionaly, there are 7 test simulations:

- "mpg_central": The simulation parameters are at the center of the parameter space.
- "mpg_neutrinos": The simulation contains massive neutrinos.
- "mpg_running": The simulation has a non-zero running of the spectral index.
- "mpg_growth": The growth factor of the simulation is different from that of the training set.
- "mpg_reio": The reionization history is different from that of the training set.
- "mpg_seed": Identical to the central simulation with different initial conditions. Meant to test the impact of cosmic variance.
- "mpg_curved": The simulation has a different curvature power spectrum from that of the training set.

For information about the simulation parameters can be found in [Pedersen+21](https://arxiv.org/abs/2103.05195).

## Nyx simulations

The Nyx simulation suite contains 18 training simulations, which are named as "nyx_{x}", where x is an integer number from 0 to 17. Additionally, there are 2 test simulations:

- "nyx_central": The simulation parameters are at the center of the parameter space.
- "nyx_seed": Identical to the central simulation with different initial conditions. Meant to test the impact of cosmic variance.


For information about the simulation parameters can be found in [TBD](..).
33 changes: 14 additions & 19 deletions docs/docs/users/archive.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
# ARCHIVE
# ARCHIVES

The LaCE emulators support two types of archives:

- Gadget archive: Contains the P1D of Gadget simulations described in [Pedersen+21](https://arxiv.org/abs/2011.15127).
- Nyx archive: Contains the P1D of Nyx simulations described in (In prep.)

## Loading a Gadget Archive
The Gadget archive contains 30 training simulations and seven test simulations. Each simulation contains 11 snapshotts covering redshifts from 2 to 4.5 in steps of 0.25
The Gadget archive contains 30 training simulations and 7 test simulations. Each simulation contains 11 snapshotts covering redshifts from 2 to 4.5 in steps of 0.25.

To laod a Gadget archive, you can use the `GadgetArchive` class:
```python
Expand All @@ -17,16 +18,23 @@ The P1D from the Gadget archive with the Pedersen+21 post-processing can be acce
```python
archive = GadgetArchive(postproc='Pedersen21')
```
This post-processing measures the P1D along one of the three box axes and contains three mean-flux rescaling per snapshot.
This post-processing measures the P1D along one of the three box axes and does not contain mean-flux rescalings.

On the other hand, the P1D from the Gadget archive with the Cabayol+23 post-processing can be accessed as follows:
```python
archive = GadgetArchive(postproc='Cabayol23')
```
This post-processing measures the P1D along the three box principal axes and contains five mean-flux rescaling per snapshot.

## Loading a Nyx Archive
To load the Nyx archive, you can use the `NyxArchive` class:

```python
from lace.archive.nyx_archive import NyxArchive
```
Since the Nyx archive is not publicly available yet, you need to set the `NYX_PATH` environment variable to the path to the Nyx files on your local computer.
Since the Nyx archive is not publicly available yet, **you need to set the `NYX_PATH` environment variable to the path to the Nyx files** on your local computer (or the cluster where you are running the code).

There are two versions of the Nyx archive available: `Oct2023` and `Jul2024`. The first one contains 17 training simulations and 4 test simulations, and the second one contains 17 training simulations and 3 test simulations. Each simulation contains 14 snapshotts covering redshifts from 2.2 to 4.8 in steps of 0.2 plus additional snapshotts at higher redshifts for some of the simulations. In both cases, it is not recommended to use simulation number 14.
There are two versions of the Nyx archive available: Oct2023 and Jul2024. The first one contains 17 training simulations and 4 test simulations, and the second one contains 17 training simulations and 3 test simulations (the simulations are better described [here](./Simulations_list.md)). Each simulation contains 14 snapshots covering redshifts from 2.2 to 4.8 in steps of 0.2 plus additional snapshotts at higher redshifts for some of the simulations. In both cases, it is not recommended to use simulation number 14.

The P1D from the Nyx archive with the Oct2023 version can be accessed as follows:
```python
Expand All @@ -50,17 +58,4 @@ For the test set, the equivalent function is:
```python
archive.get_testing_data(sim_label='mpg_central')
```
where you can replace `sim_label` by any of the test simulation labels available in the archive. This will only load the fiducial snapshots without mean flux rescalings.

## Key keywords in the archive
The archive contains many keywords that can be used to access specific data. Here is a non-exhaustive list of the most important ones:

- `sim_label`: The label of the simulation. It can be any of the test simulation labels available in the archive.
- `z`: The snapshot redshift.
- `ind_axis`: Indicates the axis along which the P1D is measured. It can be 0, 1, 2 or 'average'
- `ind_rescaling`: The index of mean-flux rescaling of the P1D.
- `val_scaling`: The value of mean-flux rescaling of the P1D.
- `cosmo_params`: A dictionary containing the cosmological parameters of the simulation.
- `p1d_Mpc`: The P1D in Mpc.
- `k_Mpc`: The wavevector in Mpc.

where you can replace `sim_label` by any of the test simulation labels available in the archive (see [here](./Simulations_list.md)). This will only load the fiducial snapshots without mean flux rescalings. ¡
Loading

0 comments on commit 656f626

Please sign in to comment.