Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduction of an inconsistency within a Closure Test #1682

Open
wants to merge 108 commits into
base: master
Choose a base branch
from

Conversation

comane
Copy link
Member

@comane comane commented Feb 28, 2023

The purpose of this PR is to study the effects of introducing data inconsistencies within a Closure Test.

@comane comane marked this pull request as draft February 28, 2023 16:47
@andreab1997
Copy link
Contributor

Thank you for this! I am going to review it soon

validphys2/src/validphys/filters.py Outdated Show resolved Hide resolved
validphys2/src/validphys/coredata.py Outdated Show resolved Hide resolved
Comment on lines 234 to 235
lvl1_inconsistent_fit=False,
lvl2_inconsistent_fit=False
Copy link
Member Author

@comane comane Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giovannidecrescenzo I think this names are a bit misleading, I would rather call it type1_inconsistency and type2_inconsistency.
Then, at some point, if we see that they are equivalent we will only keep one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually not sure: type1_inconsistency and type2_inconsistency are only clear for us while I believe the name @giovannidecrescenzo suggested are at least a bit more clear for the others

@andreab1997 andreab1997 marked this pull request as ready for review May 22, 2023 09:19
@comane comane requested a review from scarlehoff May 23, 2023 09:54
@andreab1997
Copy link
Contributor

andreab1997 commented May 23, 2023

@comane I don't know if you addressed the points in my review. If you did, can you please paste the relevant commits as a comment in the relevant sections? (I know it is a pain but it makes reviewing a bit easier :) )

EDIT: same for @giovannidecrescenzo of course

@@ -10,6 +10,7 @@

from reportengine import collect
from reportengine.table import table
from reportengine.figure import figure, figuregen
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move these things to closure_plots.py?


log = logging.getLogger(__name__)

l = Loader()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loader should not be used in various modules. Use the existing mechanisms, e.g. in config .py to get data.

plt_file = ds.commondata.plotfiles[1]
with open(plt_file, "r") as file:
card = yaml.safe_load(file)
prcs = card["nnpdf31_process"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are easier, correct, ways to get the process. See what is done in config.py.

Copy link
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that y'all decided to open a new PR with a cherry-picking of the changes from this one, but I wanted to leave a few comments that I'd very much appreciate if you have in mind for the final version.

@@ -0,0 +1,155 @@
import dataclasses
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a docstr to this module and move it into the closuretest submodule

@@ -287,7 +287,10 @@ def level0_commondata_wc(data, fakepdf):


def make_level1_data(
data, level0_commondata_wc, filterseed, experiments_index, sep_mult
data, level0_commondata_wc, filterseed, experiments_index,
sep_mult, MULT = False, ADD = False, CORR = False, UNCORR = False, SPECIAL=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I read this function, the data should be already processed before calling this function, so I would remove these arguments and just process the data before calling make_level1_data since these are arguments that are only active when type1_inconsistency is set to True.


closure_data = make_level1_data(data, closure_data, filterseed, experiments_index, sep_mult)
#======= Level 1 closure test =======#
closure_data = make_level1_data(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Process the data and call make_level1_data with the data already processed. Passing through this many arguments is asking for trouble down the line (believe me!)


# for a consistent fit use commondata with modified systematics and also change central value
else:
level1_commondata_dict = {c.setname: c for c in level0_commondata_wc}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the only (new) information that needs to be passed to this function is whether there exist an original level 0 common data.

So, in my view, the data should be prepared before calling make_level1_pseudodata and the only argument that needs to be added is original_level0_commondata_wc=None.

Then you can do e.g.

if original_level_0_commondata_wc is None:
    level1_commondata_dict = {c.setname: c for c in level0_commondata_wc}
else:
    level1_commondata_dict = {c.setname: c for c in original_level0_commondata_wc}

inconsistent_datasets=[], sys_rescaling_factor_1=1, sys_rescaling_factor_2=1,
type1_inconsistency=False, type2_inconsistency=False, reference_fit=True


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as with make_level1_etc, the data should've been processed before entering this function. Specially given how far removed the arguments are from where they are used!

My personal recommendation would be to create another function of the type.

def filter_closure_data_by_experiment_with_inconsistencies(**bla):
    <preprocessing of the data>
    return filter_closure_data_by_experiment(data_preprocessed, original_data=data)

Note that I'm changing the input data and creating a new key for original data.
It would work just the same (maybe it would be even better) to leave the original arguments as they are and add a single argument inconsistent_data=<data already processed>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants