Tevatron inclusive DY #2185

jacoterh · 2024-10-21T09:50:52Z

This PR implements filter functions for the Tevatron inclusive DY datasets:

CDF_Z0_1P96TEV_ZRAP: agreement with legacy implementation
D0_WPWM_1P96TEV: agreement with legacy, except for possible rounding errors in the old implementation of the kinematics, see specifically the last two bins here
D0_Z0_1P96TEV: agreement with legacy. One difference: systematic in 3rd bin is symmetric already on hepdata, while the legacy implementation seems to have symmetrised plus and minus uncertainties in the past here. This also affects the central value.

General comment: small overall differences are due to formatting. For example: 1.2999999999999998 instead of 1.3. Is there any function that was used in the past to uniform the way floats were formatted?

Radonirinaunimi · 2024-10-21T11:03:34Z

Thanks for this @jacoterh!

Small overall differences are due to formatting. For example: 1.2999999999999998 instead of 1.3. Is there any function that was used in the past to uniform the way floats were formatted?

Now, you can use this function:

nnpdf/nnpdf_data/nnpdf_data/filter_utils/utils.py

Lines 402 to 423 in bf8c890

    
           def prettify_float(dumper, value): 
        
               """ 
        
               Override the default yaml representer: 
        
               https://github.com/yaml/pyyaml/blob/48838a3c768e3d1bcab44197d800145cfd0719d6/lib/yaml/representer.py#L189 
        
               This function is used to prettify the float representation in the yaml file. 
        
               If the float has more than 8 digits, it will be represented in scientific notation with 8 digits. 
        
               Note: 
        
               ----- 
        
               When importing yaml in a module, 
        
               yaml.add_representer(float, prettify_float) 
        
               must be called to use this function. 
        
               """ 
        
               ret = dumper.represent_float(value) 
        
               if len(ret.value) > 8: 
        
                   ret_str = f"{value:.8e}" 
        
                   ret = dumper.represent_scalar('tag:yaml.org,2002:float', ret_str) 
        
               return ret

and call it as follows in the filter:

nnpdf/nnpdf_data/nnpdf_data/commondata/ATLAS_1JET_8TEV_R06/filter.py

Line 7 in bf8c890

yaml.add_representer(float, prettify_float)

Also, same as in the other PRs, could you check that the datasets are loaded properly and that the (t0) covmats are the same?

from validphys.api import API
import numpy as np

inp1 = {"dataset_input": {"dataset": f"{new_implementation}"}, "theoryid": 40_000_000, "use_cuts": "internal", "t0pdfset": "NNPDF40_nnlo_as_01180", "use_t0": True}
inp2 = {"dataset_input": {"dataset": f"{old_implementation}", "variant": "legacy"}, "theoryid": 40_000_000, "use_cuts": "internal", "t0pdfset": "NNPDF40_nnlo_as_01180", "use_t0": True}

covmat1 = API.covmat_from_systematics(**inp1)
covmat2 = API.covmat_from_systematics(**inp2)

t0_covmat1 = API.t0_covmat_from_systematics(**inp1)
t0_covmat2 = API.t0_covmat_from_systematics(**inp2)

np.all(np.isclose(covmat1, covmat2))
np.all(np.isclose(t0_covmat1, t0_covmat2))

scarlehoff · 2024-10-22T12:47:36Z

nnpdf_data/nnpdf_data/commondata/CDF_Z0_1P96TEV/metadata.yaml

-  url: ''
-  version: -1
+  url: https://www.hepdata.net/record/ins856131?version=1&table=Table%202
+  version: 1
 implemented_observables:
 - observable_name: ZRAP
  observable:
    description: Drell-Yan Rapidity Distribution
    label: CDF $Z$ rapidity (new)
    units: ''
  process_type: EWK_RAP


Hi @jacoterh

Remember to put as process_type: one of the processes that you will find in this file: https://github.com/NNPDF/nnpdf/blob/master/validphys2/src/validphys/process_options.py (at the bottom, either DY_Z_Y or DY_W_ETA)

and also put kinematic_override: identity.

This is to avoid going through some old code that we are trying to remove.

When you do that you will see that that file defines which kinematic variables are allowed, sometimes it is necessary to change from Q -> Q^2 or pT -> pT^2 (the kinematic info in the end is the same, but we are trying to standardize it a bit more)

jacoterh · 2024-10-22T12:52:01Z

Thanks @Radonirinaunimi! The numbers have been formatted correctly now. I have also performed the check you proposed using the snippet you provided, see for instance here . If you run this script, you'll find that the 3rd bin in D0_Z0_1P96TEV differs from the legacy implementation for the reason I mentioned in the beginning of this PR.

Some questions:

Why do the tests suddenly fail? Running pytest tests/test_datafiles.py locally works fine.
I explicitly checked whether the comparison against the legacy implementation failed when I randomly changed some numbers. It did for all datasets, except for the 1st bin in D0_WPWM_1P96TEV. The test does neem to care about whatever I write here. This doesn't happen for the D0_Z0_1P96TEV and CDF_Z0_1P96TEV datasets.
In the metadata, which hepdata table do I link for the url in case multiple entered the implementation?

scarlehoff · 2024-10-22T13:05:36Z

The current failure is because it is plotting::kinematics_override (the probability that I get a variable name or word in general correct without looking in the first try is close to 0)

The previous one probably related to some change in digits looking at the logs.

jacoterh · 2024-11-08T15:01:46Z

validphys reports:

observables = ["CDF_Z0_1P96TEV_ZRAP", "D0_WPWM_1P96TEV_ASY", "D0_Z0_1P96TEV_ZRAP"]

for obs in observables:

    new_implementation = obs
    old_implementation = obs

    inp1 = {
        "dataset_input": {"dataset": f"{new_implementation}"},
        "theoryid": 200,
        "use_cuts": "internal",
        "t0pdfset": "NNPDF40_nnlo_as_01180",
        "use_t0": True,
    }
    inp2 = {
        "dataset_input": {"dataset": f"{old_implementation}", "variant": "legacy"},
        "theoryid": 200,
        "use_cuts": "internal",
        "t0pdfset": "NNPDF40_nnlo_as_01180",
        "use_t0": True,
    }

    covmat1 = API.covmat_from_systematics(**inp1)
    covmat2 = API.covmat_from_systematics(**inp2)

    t0_covmat1 = API.t0_covmat_from_systematics(**inp1)
    t0_covmat2 = API.t0_covmat_from_systematics(**inp2)

    print(f"Comparison for {new_implementation}")
    print(np.all(np.isclose(covmat1, covmat2)))
    print(np.all(np.isclose(t0_covmat1, t0_covmat2)))

>>> Comparison for CDF_Z0_1P96TEV_ZRAP
True
True
Comparison for D0_WPWM_1P96TEV_ASY
True
True
Comparison for D0_Z0_1P96TEV_ZRAP
False
False

The last disagreement in D0_Z0_1P96TEV_ZRAP can be explained as follows. The systematic uncertainty in the 3rd bin appears symmetric already on hepdata, while the legacy implementation seems to have symmetrised plus and minus uncertainties in the past here. This also affects the central value.

scarlehoff · 2024-11-09T18:59:55Z

Thanks Jaco. One last thing, could you check that also the experimental covmat for the 3 datasets together doesn't change for non-diagonal entries?

(you can also e.g. just check that the experimental chi2 for all 3 considered together is unchanged)

(this will be important for the singletop data as well, to check that possible experimental correlations across datasets are also captured)

jacoterh · 2024-11-11T09:48:32Z

Sure, I made a plot of the ratio against the legacy implementation. There are no cross correlations between datasets here though (which I find odd as we have two datasets from D0), so the covmat is block diagonal meaning that the datasets can be cross checked separately. But I will also do this check for single top, where we do have cross correlations, thanks!

For reference, the ratio was obtained with

dsinps = [
        {'dataset': 'CDF_Z0_1P96TEV_ZRAP'},
        {'dataset': 'D0_WPWM_1P96TEV_ASY'},
        {'dataset': 'D0_Z0_1P96TEV_ZRAP'}
]

dsinps_legacy = [
        {'dataset': 'CDF_Z0_1P96TEV_ZRAP', 'variant': 'legacy'},
        {'dataset': 'D0_WPWM_1P96TEV_ASY', 'variant': 'legacy'},
        {'dataset': 'D0_Z0_1P96TEV_ZRAP', 'variant': 'legacy'}
]

inp = dict(dataset_inputs=dsinps, theoryid=200, use_cuts="internal")
inp_legacy = dict(dataset_inputs=dsinps_legacy, theoryid=200, use_cuts="internal")
cov = API.dataset_inputs_covmat_from_systematics(**inp)
cov_legacy = API.dataset_inputs_covmat_from_systematics(**inp_legacy)

scarlehoff

Hi @jacoterh

Thanks for this. All seems fine, the only thing, could you bump the version and change the version_comment in the metadata.yml files?

RE the failing test, you might need to regenerate pseudodata_test_fit_n3fit_240916. Since it is only the pseudodata that is relevant, you can just do:

vp-fitrename pseudodata_test_fit_n3fit_240916 pseudodata_test_fit_n3fit_241112
cp pseudodata_test_fit_n3fit_241112/filter.yaml  pseudodata_test_fit_n3fit_241112.yaml

And then just regenerate the data for the 10 replicas, no need to wait until the fit is done.

jacoterh added 6 commits October 15, 2024 17:13

adding CDF_Z0_1P96TEV rawdata

d79e65a

adding filter

e8d8849

finishing CDF_Z0_1P96TEV

5426573

cleaning filter

42e5514

adding filter for D0_WPWM_1P96TEV

3286b2c

updating doc

60f2d0f

jacoterh added 5 commits October 21, 2024 12:11

adding D0_Z0_1P96TEV filter

a54e68b

adding metadata in CDF_Z0_1P96TEV

aeb0564

adding metadata in D0_Z0_1P6TEV

de5af9c

adding metadata in D0_WPWM_1P96TEV

fd5b478

adding compatibility test

d07cb9e

scarlehoff reviewed Oct 22, 2024

View reviewed changes

adding kinematic override and updating process type

859d777

scarlehoff added the data toolchain label Nov 1, 2024

scarlehoff mentioned this pull request Nov 1, 2024

[WIP] Reimplement ATLAS_Z0_7TEV_49FB_HIMASS #2178

Open

jacoterh added 3 commits November 8, 2024 11:35

Merge branch 'master' into tevatron_inc_DY

1ef7482

updating kinematic overrride

ac81dc9

adding KIN_LABEL

1a68711

Merge branch 'master' into tevatron_inc_DY

659b581

This was referenced Nov 11, 2024

Re-implementation of ATLAS single top #2189

Open

Re-implementation of CMS single top #2200

Open

scarlehoff reviewed Nov 12, 2024

View reviewed changes

scarlehoff added the Done PRs that are done but waiting on something else to merge/approve label Nov 12, 2024

scarlehoff marked this pull request as ready for review November 13, 2024 15:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tevatron inclusive DY #2185

Tevatron inclusive DY #2185

jacoterh commented Oct 21, 2024 •

edited

Loading

Radonirinaunimi commented Oct 21, 2024

scarlehoff Oct 22, 2024

jacoterh Oct 22, 2024

jacoterh commented Oct 22, 2024

scarlehoff commented Oct 22, 2024

jacoterh commented Nov 8, 2024 •

edited

Loading

scarlehoff commented Nov 9, 2024

jacoterh commented Nov 11, 2024 •

edited

Loading

scarlehoff left a comment •

edited

Loading

Tevatron inclusive DY #2185

Are you sure you want to change the base?

Tevatron inclusive DY #2185

Conversation

jacoterh commented Oct 21, 2024 • edited Loading

Radonirinaunimi commented Oct 21, 2024

scarlehoff Oct 22, 2024

Choose a reason for hiding this comment

jacoterh Oct 22, 2024

Choose a reason for hiding this comment

jacoterh commented Oct 22, 2024

scarlehoff commented Oct 22, 2024

jacoterh commented Nov 8, 2024 • edited Loading

scarlehoff commented Nov 9, 2024

jacoterh commented Nov 11, 2024 • edited Loading

scarlehoff left a comment • edited Loading

Choose a reason for hiding this comment

jacoterh commented Oct 21, 2024 •

edited

Loading

jacoterh commented Nov 8, 2024 •

edited

Loading

jacoterh commented Nov 11, 2024 •

edited

Loading

scarlehoff left a comment •

edited

Loading