Cholesky decomposition fails for ATLAS_2JET_7TEV_R06 in a closure test #2043

andreab1997 · 2024-04-10T09:27:40Z

When running a level-2 closure test which contains the dataset ATLAS_2JET_7TEV_R06, it fails (during n3fit execution) with the error
numpy.linalg.LinAlgError: 65-th leading minor of the array is not positive definite

Reproduce

A minimal runcard to reproduce this bug is the following

#
# Configuration file for NNPDF++
#

############################################################
description: NNPDF4.0 NNLO Global, closure test

############################################################
# frac: training fraction
# ewk: apply ewk k-factors
# sys: systematics treatment (see systypes)
dataset_inputs:
- {dataset: ATLAS_2JET_7TEV_R06, frac: 0.75, cfac: []} 

############################################################
datacuts:  
  t0pdfset: 231223-ab-baseline-nnlo-global-NNLOcuts_iterated     # PDF set to generate t0 covmat
  q2min: 3.49                       # Q2 minimum
  w2min: 12.5
  use_cuts: internal
############################################################
theory:
  theoryid: 708         # database id

############################################################
trvlseed: 1727532335
nnseed: 1737785873
mcseed: 2123509817
genrep: true          # true = generate MC replicas, false = use real data

parameters: # This defines the parameter dictionary that is passed to the Model Trainer
  nodes_per_layer: [25, 20, 8]
  activation_per_layer: [tanh, tanh, linear]
  initializer: glorot_normal
  optimizer:
    clipnorm: 6.073e-6
    learning_rate: 2.621e-3
    optimizer_name: Nadam
  epochs: 17000
  positivity:
    initial: 184.8
    multiplier:
  integrability:
    initial: 10
    multiplier:
  stopping_patience: 0.1
  layer_type: dense
  dropout: 0.0
  threshold_chi2: 3.5

fitting:
  fitbasis: EVOL  # EVOL (7), EVOLQED (8), etc.
  basis:
  - {fl: sng, trainable: false, smallx: [1.091, 1.119], largex: [1.471, 3.021]}
  - {fl: g, trainable: false, smallx: [0.7795, 1.095], largex: [2.742, 5.547]}
  - {fl: v, trainable: false, smallx: [0.472, 0.7576], largex: [1.571, 3.559]}
  - {fl: v3, trainable: false, smallx: [0.07483, 0.4501], largex: [1.714, 3.467]}
  - {fl: v8, trainable: false, smallx: [0.5731, 0.779], largex: [1.555, 3.465]}
  - {fl: t3, trainable: false, smallx: [-0.5498, 1.0], largex: [1.778, 3.5]}
  - {fl: t8, trainable: false, smallx: [0.5469, 0.857], largex: [1.555, 3.391]}
  - {fl: t15, trainable: false, smallx: [1.081, 1.142], largex: [1.491, 3.092]}

############################################################
positivity:
  posdatasets:
  - {dataset: POSF2U, maxlambda: 1e6}        # Positivity Lagrange Multiplier
  - {dataset: POSF2DW, maxlambda: 1e6}
  - {dataset: POSF2S, maxlambda: 1e6}
  - {dataset: POSFLL_19PTS, maxlambda: 1e6}
  - {dataset: POSDYU, maxlambda: 1e10}
  - {dataset: POSDYD, maxlambda: 1e10}
  - {dataset: POSDYS, maxlambda: 1e10}
  - {dataset: POSF2C_17PTS, maxlambda: 1e6}
  - {dataset: POSXUQ, maxlambda: 1e6}        # Positivity of MSbar PDFs
  - {dataset: POSXUB, maxlambda: 1e6}
  - {dataset: POSXDQ, maxlambda: 1e6}
  - {dataset: POSXDB, maxlambda: 1e6}
  - {dataset: POSXSQ, maxlambda: 1e6}
  - {dataset: POSXSB, maxlambda: 1e6}
  - {dataset: POSXGL, maxlambda: 1e6}

############################################################
integrability:
  integdatasets:
  - {dataset: INTEGXT8, maxlambda: 1e2}
  - {dataset: INTEGXT3, maxlambda: 1e2}

############################################################

closuretest:
  filterseed: 3345348918 # Random seed to be used in filtering data partitions
  fakedata: true     # true = to use FAKEPDF to generate pseudo-data
  fakepdf: 231223-ab-baseline-nnlo-global-NNLOcuts_iterated      # Theory input for pseudo-data
  errorsize: 1.0    # uncertainties rescaling
  fakenoise: true    # true = to add random fluctuations to pseudo-data
  rancutprob: 1.0   # Fraction of data to be included in the fit
  rancutmethod: 0   # Method to select rancutprob data fraction
  rancuttrnval: false # 0(1) to output training(valiation) chi2 in report
  printpdf4gen: false # To print info on PDFs during minimization

############################################################
debug: false
maxcores: 8

Things to note

The experimental covariance matrix generated during vp-setupfit to generate the level-1 data seems to be ok. In fact, vp-setupfit does not crash.
In order to reproduce this bug in master, another bug needs to be solved (which is solved already in Closure with same level1 #2007, look at coredata.py).

I tried to debug this myself but for the moment I was not able to figure out the problem.

CC: @scarlehoff @RoyStegeman @comane

The text was updated successfully, but these errors were encountered:

scarlehoff · 2024-04-10T09:32:22Z

The experimental covariance matrix generated during vp-setupfit to generate the level-1 data seems to be ok

This didn't fail before but it does now.
Do you have the old covariance matrix as well (so that you can check whether there are any changes)

andreab1997 · 2024-04-10T09:37:24Z

The experimental covariance matrix generated during vp-setupfit to generate the level-1 data seems to be ok

What do you mean with before? Before the new commondata?

This didn't fail before but it does now. Do you have the old covariance matrix as well (so that you can check whether there are any changes)

I do not have the old one and I am not sure how I can reproduce it now

scarlehoff · 2024-04-10T09:39:51Z

What do you mean with before? Before the new commondata?

Yes.

I do not have the old one and I am not sure how I can reproduce it now

git checkout 4.0.9

In order to reproduce this bug in master, another bug needs to be solved

Just to make sure, did you try doing this in master with only that bug solved?

andreab1997 · 2024-04-10T09:42:25Z

Just to make sure, did you try doing this in master with only that bug solved?

Yes, I am debugging now in master

andreab1997 · 2024-04-10T10:24:09Z

UPDATE: It seems that in the function dataset_inputs_covmat_from_systematics, during vp-setupfit I have
CommonData(setname='ATLAS_2JET_7TEV_R06_M12Y', ndata=90, commondataproc='DIJET', nkin=3, nsys=474, legacy=False, legacy_name='ATLAS_2JET_7TEV_R06', kin_variables=['ystar', 'm_jj', 'sqrts']), while during n3fit I have
[CommonData(setname='ATLAS_2JET_7TEV_R06_M12Y', ndata=90, commondataproc='DIJET', nkin=3, nsys=224, legacy=False, legacy_name='ATLAS_2JET_7TEV_R06', kin_variables=['ystar', 'm_jj', 'sqrts'])].

So nsys changes from vp-setupfit and n3fit, which I believe might be the problem

scarlehoff · 2024-04-10T10:30:28Z

The difference should be the number of SKIP. vp-setupfit is reading all of them. Then those are not written down (your changes to coredata.py ensure this is done consistently). After that n3fit can only read the ones that were written down.

So that difference is correct, but If that is the problem it means SKIP is having an effect somewhere (and it shouldn't).

andreab1997 · 2024-04-10T10:42:54Z

The difference should be the number of SKIP. vp-setupfit is reading all of them. Then those are not written down (your changes to coredata.py ensure this is done consistently). After that n3fit can only read the ones that were written down.

So that difference is correct, but If that is the problem it means SKIP is having an effect somewhere (and it shouldn't).

Ok but if this is the case, maybe is linked to my solution of the SKIP problem (the one I have written at the beginning of this issue)?

scarlehoff · 2024-04-10T10:45:15Z

No, that should be correct. The SKIP should not have an effect anywhere (so also their definition should not be written down).
You should be seeing the same mismatch in (old) master.

scarlehoff mentioned this issue Apr 10, 2024

Remove all SKIP #2044

Open

andreab1997 mentioned this issue Apr 10, 2024

Fixing bug in the MUL / ADD treatment when writing down closure data #2045

Merged

andreab1997 linked a pull request Apr 10, 2024 that will close this issue

Fixing bug in the MUL / ADD treatment when writing down closure data #2045

Merged

scarlehoff closed this as completed in #2045 Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cholesky decomposition fails for ATLAS_2JET_7TEV_R06 in a closure test #2043

Cholesky decomposition fails for ATLAS_2JET_7TEV_R06 in a closure test #2043

andreab1997 commented Apr 10, 2024 •

edited

Loading

scarlehoff commented Apr 10, 2024

andreab1997 commented Apr 10, 2024

scarlehoff commented Apr 10, 2024

andreab1997 commented Apr 10, 2024

andreab1997 commented Apr 10, 2024

scarlehoff commented Apr 10, 2024 •

edited

Loading

andreab1997 commented Apr 10, 2024

scarlehoff commented Apr 10, 2024

Cholesky decomposition fails for ATLAS_2JET_7TEV_R06 in a closure test #2043

Cholesky decomposition fails for ATLAS_2JET_7TEV_R06 in a closure test #2043

Comments

andreab1997 commented Apr 10, 2024 • edited Loading

Reproduce

Things to note

scarlehoff commented Apr 10, 2024

andreab1997 commented Apr 10, 2024

scarlehoff commented Apr 10, 2024

andreab1997 commented Apr 10, 2024

andreab1997 commented Apr 10, 2024

scarlehoff commented Apr 10, 2024 • edited Loading

andreab1997 commented Apr 10, 2024

scarlehoff commented Apr 10, 2024

andreab1997 commented Apr 10, 2024 •

edited

Loading

scarlehoff commented Apr 10, 2024 •

edited

Loading