Refactoring model creation code #1734

APJansen · 2023-05-17T14:12:09Z

Aim

I am refactoring the code that creates the PDF model (and perhaps the full one as well). The main objective is to make the code and the resulting architecture more readable, without changing anything in the behavior.
If anything I expect the changes to make it slightly faster and smaller in memory, but either way this will likely be negligible.

It is still a work in progress, but here is a summary of the changes I made so far, roughly from big to small:

I changed the structure of the sum rule code. Before there was a function msr_impose that creates a function that takes a pdf function and returns a normalized pdf function. Now it is a tensorflow model that takes as input the unnormalized pdf on the grid and on the integration grid, and the integration grid itself, and outputs the normalized pdf. This makes the code and the resulting model plots much easier to read.
I grouped all the actual neural layers into a neural network model, for the same reason
All of the smaller steps I put into named lambda layers, shared wherever possible between replicas and grids

With these changes, it's possible to create much more understandable plots of the architecture using tf.keras.utils.plot_model. These are from the basic runcard:

A single replica PDF
The NN block
The MSR block

APJansen · 2023-05-22T13:50:12Z

I simplified the preprocessing of the input somewhat, getting rid of layers that do nothing and collecting the options closer together.

Here's my understanding of what happens, just checking, with some questions in bold:

inp really means whether to add logs or not, taking x to (x, log x).
scaler is a lambda function (if not None) that takes x to (scaler(x), x)
If scaler is not None, inp is set to 1, meaning there are 3 options:

Do nothing, x -> x I assume this is never used?
Add logs, x -> (x, log x)
Scale, x -> (scaler(x), x)

In pdfNN_layer_generator I clarified this by introducing add_logs and use_feature_scaling variables, but I've kept the interface the same.

Wouldn't it be better to just remove inp from pdfNN_layer_generator as a parameter and base it on scaler, saying if scalar is None, add logs, otherwise not?

I assume the scaling happens before the model because it's not possible to use scipy functions inside a tensorflow layer?

After the scaling, inside the PDF model, the grid is mapped from the interval [0, 1] to [-1, 1]. Why not include this in the scaler?

Rather than concatenating scaler(x) to x and then splitting it again, wouldn't it be simpler to have separate inputs?
You'd have 5 inputs: the scaled and the original grid, the scaled and the original integration grid, and the scaled x=1 grid (not the original as that's used for the preprocessing prefactor, but x=1 only goes into the NN.
For the case of adding logs instead, you could have 4 inputs to keep it as similar as possible, (x, log(x)) for both grids as input to the NN and the original x for both grids as input to the preprocessing prefactor.

Unrelated: The FkRotation docstring is not up to date, it says the input is 8-dimensional but it's actually 9-dimensional, as you can see clearly from the model plot. I don't know enough to fix this but it'd be good if someone could.

Do you want me to add these plots as an option in the logging, and if so can you point me to where and how to best do this?

Below are the model plots as they are now.

Using logs

The only preprocessing that happens here is that the log of x is concatenated to it before it goes into the NN.

Using feature scaling

Here the input is already of the form (scaler(x), x). The preprocessing that happens is first this is split again into x = x_original and scaler(x) = x_scaled. The latter undergoes a linear transformation mapping [0, 1] to [-1, 1] (process_input). In addition this turns on the subtract_one option where instead of NN(x_scaled) it computes NN(x_scaled) - NN(x=1_scaled).

Not using either

I didn't test this as I wasn't sure how to turn both off or if that should even be possible.

…ather than reused, in the sumrule)

…tion grid and the integration grid, and outputs normalized pdf

…isplay shapes in model summary and plot

…actor for clarity and consistency)

… adding logs)

APJansen · 2023-07-13T09:41:33Z

The final 2 comparisons of this branch vs master on the feature scaling and flavor basis runcards are done:
feature scaling
flavor basis

RoyStegeman · 2023-07-13T09:53:24Z

Thanks @APJansen. While the feature scaling seems fine, the flavor basis looks a bit odd (better agreement than expected based on random sampling). Are you sure you are comparing the two different branches? Under "Code versions" in the report the versions are shown to be "inconsistent" so perhaps something went wrong?

APJansen · 2023-07-13T10:12:23Z

Hm... I'm not sure... have you seen this before? I did have some trouble running vp-comparefit, eventually fixed by remaning and reuploading the master branch run.

RoyStegeman · 2023-07-13T10:25:12Z

"inconsistent" in this caes just means that not all replicas were produced with the same commit.

The large agreement between two fits I'm basing on the distance plots. They are the distance between two central values normalised wrt the uncertainty in std of the central values, for more details see appendix B of https://arxiv.org/pdf/1410.8849.pdf. Of course, if two fits are done independently (i.e. different seeds, though in this case I assume even the different model should have effectively the same impact) these distances should be of order 1. While this is the case for feature scaling, for the flavor basis fit the agreement looks better than expected if the two fits consisted of independent samples.

Perhaps it's my assumption that the different model is similar to different seeds that's wrong, but then we'd need to understand why we observe this agreement for flavor basis but not feature scaling.

APJansen · 2023-07-13T11:11:17Z

Hmm, I see it's also inconsistent in the previous report on the flavor basis where I compared it to something else. Perhaps I was developing while the jobs were in the queue. I guess I'll have to rerun it.

scarlehoff · 2023-07-13T11:12:58Z

If you are rerunning, make sure you use the latest commit from this branch. I think this is basically done now so it would be great to link here the reports with what would be basically the last commit before merging.

APJansen · 2023-07-14T07:51:14Z

Here's the new report: https://vp.nnpdf.science/1pUg_vIYTkGmuHVMVSjQkw==/
The inconsistency of the code version is gone, but the results are still very similar (identical even?)
I have no idea how, I'm using the same scripts as before, printing the environment and the last commit, which are the last commits of either branch and the corresponding conda environments.

scarlehoff · 2023-07-14T08:48:02Z

Perfect agreement also for the flavour basis :D

Great! If @RoyStegeman is also happy with the tests I think this is basically done?

RoyStegeman · 2023-07-14T09:26:12Z

If @RoyStegeman is also happy with the tests

I'm not really... Somehow flavor basis seems to be exactly the same reply-by-replica but not feature scaling, which I don't understand. Either I'd expect the model to be exactly the same given the same seed before and after the changes in this PR, or I'd expect them to differ (which I would've thought more likely). Unless I'm missing something, these results hint at an inconsistency.

APJansen · 2023-07-14T09:31:01Z

I did a quick sanity check locally, running both the flavor_basis and feature_scaling runcards that @RoyStegeman posted here locally on both branches and just comparing the validation chi2 after 100 epochs. Indeed they are different for the feature_scaling runcard, already in the first digit, whereas for flavor_basis they are the same up to the 5th decimal.

I don't understand why, I tried copying either the model architecture and seeds or the basis from flavor_basis into feature_scaling and running that on both branches, but they're still not identical. Conversely starting with the flavor_basis runcard and copying both the model architecture, seeds and basis from feature_scaling, the results still differ.

I'll try some more comparisons

APJansen · 2023-07-14T11:09:53Z

I'm still confused. Doing both of 1. turning off feature scaling and 2. changing the basis to the flavor basis makes the difference between the two branches very small (~5 decimals in validation chi2 after 100 epochs), but either separately doesn't.
But the scaled grids (i.e. the output of _xgrid_generation) are identical between branches.
Also, rather than changing the basis to the flavorbasis just setting the smallx limits to [1, 1] as they are in that runcard is not sufficient (it gets to two decimals).

scarlehoff · 2023-07-14T11:25:11Z

I'm very confused. At some point you had a fit which is this one:

standard fit (master vs this branch): https://vp.nnpdf.science/vEkkEc1jQIiHACiGX5ilTA==/

in the evolution basis, and that's exactly the same.

So in principle, doing 1 but not 2 should make the difference also very small.

The three comparisons that we want are:

standard fit: (which unless I'm mistaken is this one https://vp.nnpdf.science/vEkkEc1jQIiHACiGX5ilTA==/)
this is with evolution basis and with the (x, logx) input
feature scaling: this can be in the evolution basis, no problem with that
flavour basis (this can be without feature scaling)

What are the comparisons we actually have?

APJansen · 2023-07-14T11:57:27Z

Right so we have:

name	report	runcard	report diff	100 epoch diff
feature scaling	report	feature_scaling.txt	small	large
flavor basis	report	flavor_basis.txt	very small	very small
standard	report	NNPDF40_nnlo_as_01180_1000.yml	very small	large

So I'm also confused, I just checked the standard runcard, both with the latest commits and the commits in that report, and the difference in the 100 epoch validation chi2 is not small in either case.

scarlehoff · 2023-07-14T12:03:06Z

Well, rather than small. The reports make it look like the results are exactly equal.

I don't understand how you can get the same results if the differences are large after 100 epochs.

APJansen · 2023-07-14T12:08:12Z

Just did the same check, the standard runcard after 100 epochs, but on snellius rather than on my mac, and now the results are identical.

RoyStegeman · 2023-07-14T12:10:59Z

And what if you do this check on snellius with the feature scaling runcard?

APJansen · 2023-07-14T12:15:57Z

Yes that was it, feature scaling has a slight difference, but just turning that off leaving everything else the same they become identical.

RoyStegeman · 2023-07-14T12:17:32Z

Do you know in which part of the code this difference originates? I thought everything would be deterministic, but maybe I'm forgetting something.

APJansen · 2023-07-14T12:29:18Z

I'm not sure, to be honest I thought that they were always slightly different, that the initialization wasn't the same, because I always tested this locally. (No idea why that would matter either, different tensorflow versions probably but why..)
feature scaling shouldn't introduce anything non-deterministic certainly.. The difference is small, 79.8 vs 78.1 (locally it could be a factor of 2 even), but still surprising.. feature scaling also toggles the subtraction of NN(x=1), maybe there's something there.

APJansen · 2023-07-14T13:36:01Z

I think I found the reason: I have changed the scaler to include the scaling from [0, 1] to [-1, 1], originally this was done with a lambda layer inside the model. This causes a difference of ~10^-8 in the scaled x using this runcard. That difference is amplified I assume by training. So this explains why it only happens with feature scaling, and I think it's harmless.

RoyStegeman · 2023-07-14T13:49:56Z

That indeed sounds like it's harmless. Do you understand what causes the difference of ~10^-8? I'd expect it wouldn't matter if we shift by 2*x-1 outside the model or inside a model layer.

scarlehoff · 2023-07-14T13:56:22Z

~10^-8 in the scaled x using this runcard.

Do you mean a difference of 10^-8 (which is basically entering the numerical precision zone) or a difference around x=1e-8?

APJansen · 2023-07-14T14:23:29Z

To be precise, a maximum absolute difference of 6e-8, mean absolute difference of 2e-8 and mean (signed) difference of 1e-9.
In one case it's done inside a keras layer, and in the other by numpy. I would guess that they just have slightly different numerical errors.

scarlehoff · 2023-07-14T14:25:18Z

Then that's perfectly fine.

RoyStegeman · 2023-07-14T14:26:10Z

Great, well done for finding the difference

RoyStegeman

Having checked that the results of this branch and master are numerically the same, I'd say this can be merged. I believe black and isort have both been run on the changed files?

APJansen · 2023-07-14T14:28:04Z

Great, thanks! Indeed I've run black and isort.

APJansen added 29 commits May 23, 2023 10:21

Name input layer xgrids

6b40d90

Rename and simplify dense_me -> neural_network

57f6552

Rename i->i_replica and layer_seed->replica_seed

d81814d

Rename preprocessing -> prefactor

557b14c

Create apply_prefactor layer (can't name yet as it is created twice r…

e087a38

…ather than reused, in the sumrule)

Merge msr.py into layers/msr_normalization.py

ca187e4

Turn msr_impose into class method

97d7409

Use MSR_Normalization class attributes

e0fb2f5

Create integration grid in MSR_Normalizations init

c8e7e63

Renamings

c5551f1

Prepare call method in MSR_Normalization

6241614

Share more layers, add names

4ecb1ba

Create named layers in msr_normalization

767977b

Add pdf_integrated step

d6ed37c

Move computation of xdivided to init, move integration into call

396fc6f

remove tempcall

472abed

Fix bug introduced after renaming preprocessing to prefactor

08c8876

Join neural network layers into their own NN_i model

5337d18

Clarify layer names

ed85389

Revert all changes to MSR

6bc651f

Rewrite msr into a model that takes as inputs the pdf, pdf on integra…

96328a2

…tion grid and the integration grid, and outputs normalized pdf

Set shape of integration grid to (None, 1) rather than (2000, 1) to d…

82cf95b

…isplay shapes in model summary and plot

Revert renaming of Preprocessing to Prefactor (now use PreprocessingF…

7b475ba

…actor for clarity and consistency)

Clean up model creation code

7153e93

Reorganize input options, keeping effect the same

dfb4eed

Make preprocessing layers dependent on which option is used (scaling,…

9dc1384

… adding logs)

Fix msr with scaling

7c6d8ae

Fix msr with scaling, shape of integration grid

cbce181

Set shape of x=1 input to None so tensorflow displays proper shapes

3296aa4

Merge branch 'master' into model_refactor

694417d

RoyStegeman approved these changes Jul 14, 2023

View reviewed changes

APJansen merged commit 6ddab6e into master Jul 14, 2023

APJansen deleted the model_refactor branch July 14, 2023 14:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring model creation code #1734

Refactoring model creation code #1734

APJansen commented May 17, 2023 •

edited

Loading

APJansen commented May 22, 2023

APJansen commented Jul 13, 2023

RoyStegeman commented Jul 13, 2023

APJansen commented Jul 13, 2023

RoyStegeman commented Jul 13, 2023 •

edited

Loading

APJansen commented Jul 13, 2023

scarlehoff commented Jul 13, 2023

APJansen commented Jul 14, 2023

scarlehoff commented Jul 14, 2023

RoyStegeman commented Jul 14, 2023

APJansen commented Jul 14, 2023

APJansen commented Jul 14, 2023

scarlehoff commented Jul 14, 2023

APJansen commented Jul 14, 2023

scarlehoff commented Jul 14, 2023

APJansen commented Jul 14, 2023

RoyStegeman commented Jul 14, 2023

APJansen commented Jul 14, 2023

RoyStegeman commented Jul 14, 2023

APJansen commented Jul 14, 2023

APJansen commented Jul 14, 2023

RoyStegeman commented Jul 14, 2023

scarlehoff commented Jul 14, 2023

APJansen commented Jul 14, 2023

scarlehoff commented Jul 14, 2023

RoyStegeman commented Jul 14, 2023

RoyStegeman left a comment •

edited

Loading

APJansen commented Jul 14, 2023

Refactoring model creation code #1734

Refactoring model creation code #1734

Conversation

APJansen commented May 17, 2023 • edited Loading

Aim

APJansen commented May 22, 2023

Using logs

Using feature scaling

Not using either

APJansen commented Jul 13, 2023

RoyStegeman commented Jul 13, 2023

APJansen commented Jul 13, 2023

RoyStegeman commented Jul 13, 2023 • edited Loading

APJansen commented Jul 13, 2023

scarlehoff commented Jul 13, 2023

APJansen commented Jul 14, 2023

scarlehoff commented Jul 14, 2023

RoyStegeman commented Jul 14, 2023

APJansen commented Jul 14, 2023

APJansen commented Jul 14, 2023

scarlehoff commented Jul 14, 2023

APJansen commented Jul 14, 2023

scarlehoff commented Jul 14, 2023

APJansen commented Jul 14, 2023

RoyStegeman commented Jul 14, 2023

APJansen commented Jul 14, 2023

RoyStegeman commented Jul 14, 2023

APJansen commented Jul 14, 2023

APJansen commented Jul 14, 2023

RoyStegeman commented Jul 14, 2023

scarlehoff commented Jul 14, 2023

APJansen commented Jul 14, 2023

scarlehoff commented Jul 14, 2023

RoyStegeman commented Jul 14, 2023

RoyStegeman left a comment • edited Loading

Choose a reason for hiding this comment

APJansen commented Jul 14, 2023

APJansen commented May 17, 2023 •

edited

Loading

RoyStegeman commented Jul 13, 2023 •

edited

Loading

RoyStegeman left a comment •

edited

Loading