Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polarised Jet commondata implementation #2035

Merged
merged 82 commits into from
Jul 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
9d52347
PHENIX implementation, eta still needs to be done
toonhasenack Apr 4, 2024
90f5342
pre-commit change
toonhasenack Apr 4, 2024
577512c
implemented changes for PHENIX
toonhasenack Apr 12, 2024
eb86dbd
Merge branch 'master' into Jet_commondata
giacomomagni Apr 15, 2024
49a3755
STAR2012 and STAR2015 addition
toonhasenack Apr 15, 2024
aace47c
Name change of STARyear to STAR_year convention
toonhasenack Apr 17, 2024
994638a
propagate previous namechange
toonhasenack Apr 17, 2024
12d2ac9
implemented correlations for STAR_2015
toonhasenack Apr 17, 2024
ebcc5ca
init 510 correlated data
giacomomagni Apr 22, 2024
9c76548
move symm error to utils
giacomomagni Apr 22, 2024
516b85f
Merge branch 'master' into Jet_commondata
giacomomagni Apr 22, 2024
a5888e1
correct label
giacomomagni Apr 22, 2024
34e2c8b
correct label
giacomomagni Apr 22, 2024
79bb21f
fix art systematics
giacomomagni Apr 22, 2024
250ea79
fix metadata
giacomomagni Apr 22, 2024
fb54e16
make test passing
giacomomagni Apr 22, 2024
9d3f3f4
Giacomo's suggestions with 1 to go
toonhasenack Apr 23, 2024
cee14dd
init star 2015 parser
giacomomagni Apr 23, 2024
106d475
implement 2015 dijet and jet and fix tests
giacomomagni Apr 23, 2024
42d4135
STAR_2009_2JET implementation with only correlations to go
toonhasenack Apr 24, 2024
73d93f7
fix 2009 dijet and implement 2012
giacomomagni Apr 24, 2024
875eede
small correction in metadata
toonhasenack Apr 24, 2024
c6cc214
Merge branch 'Jet_commondata' of https://github.com/NNPDF/nnpdf into …
toonhasenack Apr 24, 2024
dfa92ba
full implementation of STAR_2009 including correlations
toonhasenack Apr 29, 2024
6cc4020
change back filter import
toonhasenack Apr 29, 2024
8687f8a
Giacomo's suggestions
toonhasenack May 1, 2024
7241f99
minor fixes on syntax
giacomomagni May 6, 2024
65a2728
fix kinematic label
giacomomagni May 6, 2024
6b15c86
fixed STAR2015 polarised error
toonhasenack May 7, 2024
8218129
apply same fix also to dijet
giacomomagni May 8, 2024
637e7d8
STAR_2012 correct correlations
toonhasenack May 14, 2024
72e4969
correct wrong year label
giacomomagni May 14, 2024
45a5c8d
Merge branch 'Jet_commondata' of https://github.com/NNPDF/nnpdf into …
giacomomagni May 14, 2024
1b9a9cf
fix metadata
giacomomagni May 14, 2024
de2a065
init restore older filters
giacomomagni May 15, 2024
b262ebb
resotre previous atlas utils
giacomomagni May 15, 2024
77ca9bb
resotre previous atlas utils
giacomomagni May 15, 2024
0ab33c1
more cleaning
giacomomagni May 15, 2024
ede56cf
Adding pol unc to 2013 data
giacomomagni May 16, 2024
030f63e
various fixes in 2009 1 JET data
giacomomagni May 16, 2024
0840309
make notation more uniform
giacomomagni May 16, 2024
63b1760
Add beam pol error to phenix
giacomomagni May 16, 2024
4f5460e
remove minus sign from pol unc
giacomomagni May 16, 2024
a565ab8
adding pol unc to 2005 and 2006 data
giacomomagni May 16, 2024
fd9c6ed
uncorrelate PHENIX from STAR_2005
toonhasenack May 16, 2024
f696309
revert correlations in phenix and star 2005
giacomomagni May 17, 2024
879fd26
init collapsing 2012 dijets
giacomomagni May 17, 2024
1e39f34
init collapsing 2013 dijets
giacomomagni May 17, 2024
46ecd2e
init collapsing 2015 dijets
giacomomagni May 17, 2024
e5ed528
addind missing refs
giacomomagni May 17, 2024
31ab9c2
some fixes
giacomomagni May 17, 2024
9347f45
Adding lumi shift to 2013 data
giacomomagni May 17, 2024
fe6afde
collapse 2009 dijets
giacomomagni May 17, 2024
023a95f
fix lumi unc and stat in 2009 dijets
giacomomagni May 17, 2024
e3805a7
move filters closer to raw data
giacomomagni May 17, 2024
bbd5112
correct unc order in 2015 data
giacomomagni May 23, 2024
6e43675
Merge branch 'master' into Jet_commondata
giacomomagni Jun 14, 2024
4fc04c4
update 1jet fktables names
giacomomagni Jun 14, 2024
49e15e7
split 1jet 2009 and 2015 into 2 observables
giacomomagni Jun 18, 2024
d7f9507
fix fktables names
giacomomagni Jun 18, 2024
aa2c2a7
fix phenix name
giacomomagni Jun 18, 2024
b5ff1d5
correct PHENIX source table
giacomomagni Jun 19, 2024
0c91013
add review comments
giacomomagni Jun 19, 2024
e983e99
Merge branch 'master' into Jet_commondata
Radonirinaunimi Jun 25, 2024
c3877df
account for pol/unpol distinction in dijets
toonhasenack Jun 28, 2024
e0cbe64
Merge branch 'master' into Jet_commondata
giacomomagni Jun 28, 2024
80b9fbb
510-->200
toonhasenack Jun 28, 2024
012347f
Merge branch 'Jet_commondata' of https://github.com/NNPDF/nnpdf into …
toonhasenack Jun 28, 2024
a516d7a
implement E. Aschenauer suggestion
giacomomagni Jul 3, 2024
a2941ae
implement 2004 1JET data
giacomomagni Jul 3, 2024
681416b
Merge branch 'master' into Jet_commondata
giacomomagni Jul 3, 2024
3059e94
Merge branch 'master' into Jet_commondata
giacomomagni Jul 5, 2024
cd57101
Merge branch 'master' into Jet_commondata
giacomomagni Jul 5, 2024
5348827
Merge branch 'master' into Jet_commondata
giacomomagni Jul 9, 2024
9b90f15
init dijet eta implementation
giacomomagni Jul 12, 2024
45b9f90
add eta variables to star data
giacomomagni Jul 12, 2024
6cfdbf3
varius typos fixes
giacomomagni Jul 12, 2024
03bdb01
some fixes to metadata
giacomomagni Jul 14, 2024
c2d9401
remove sqrts from dijets
giacomomagni Jul 15, 2024
2c1feb4
minor fixes
giacomomagni Jul 15, 2024
a7d8d54
minor fix on plot labels
giacomomagni Jul 16, 2024
9925e4d
another attempt on plot labels, now working
giacomomagni Jul 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 90 additions & 0 deletions nnpdf_data/nnpdf_data/filter_utils/correlations.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
import numpy as np
from numpy.linalg import eig


def upper_triangular_to_symmetric(ut, dim):
"""Build a symmetric matrix from the upper diagonal"""
corr = np.zeros((dim, dim))
last = dim
first = 0
for i in range(dim):
corr[i, i:] = ut[first:last]
last += dim - i - 1
first += dim - i
return corr


def compute_covmat(corrmat: np.ndarray, unc: np.ndarray, ndata: int) -> list:
"""Compute the covariance matrix with the artificial stat uncertainties."""
# multiply by stat err
cov_mat = np.einsum("i,ij,j->ij", unc, corrmat, unc)
return covmat_to_artunc(ndata, cov_mat.flatten().tolist())
giacomomagni marked this conversation as resolved.
Show resolved Hide resolved


def covmat_to_artunc(ndata, covmat_list, no_of_norm_mat=0):
giacomomagni marked this conversation as resolved.
Show resolved Hide resolved
r"""Convert the covariance matrix to a matrix of
artificial uncertainties.

NOTE: This function has been taken from validphys.newcommondata_utils.
If those utils get merged in the future, we can replace this.

Parameters
----------
ndata : integer
Number of data points
covmat_list : list
A one dimensional list which contains the elements of
the covariance matrix row by row. Since experimental
datasets provide these matrices in a list form, this
simplifies the implementation for the user.
no_of_norm_mat : int
Normalized covariance matrices may have an eigenvalue
of 0 due to the last data point not being linearly
independent. To allow for this, the user should input
the number of normalized matrices that are being treated
in an instance. For example, if a single covariance matrix
of a normalized distribution is being processed, the input
would be 1. If a covariance matrix contains pertains to
3 normalized datasets (i.e. cross covmat for 3
distributions), the input would be 3. The default value is
0 for when the covariance matrix pertains to an absolute
distribution.

Returns
-------
artunc : list
A two dimensional matrix (given as a list of lists)
which contains artificial uncertainties to be added
to the commondata. i^th row (or list) contains the
artificial uncertainties of the i^th data point.

"""
epsilon = -0.0000000001
neg_eval_count = 0
psd_check = True
covmat = np.zeros((ndata, ndata))
artunc = np.zeros((ndata, ndata))
for i in range(len(covmat_list)):
a = i // ndata
b = i % ndata
covmat[a][b] = covmat_list[i]
eigval, eigvec = eig(covmat)
for j in range(len(eigval)):
if eigval[j] < epsilon:
psd_check = False
elif eigval[j] > epsilon and eigval[j] <= 0:
neg_eval_count = neg_eval_count + 1
if neg_eval_count == (no_of_norm_mat + 1):
psd_check = False
elif eigval[j] > 0:
continue
if psd_check == False:
raise ValueError("The covariance matrix is not positive-semidefinite")
else:
for i in range(ndata):
for j in range(ndata):
if eigval[j] < 0:
continue
else:
artunc[i][j] = eigvec[i][j] * np.sqrt(eigval[j])
return artunc.tolist()
27 changes: 27 additions & 0 deletions nnpdf_data/nnpdf_data/filter_utils/uncertainties.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@

import numpy as np

def symmetrize_errors(delta_plus, delta_minus):
r"""Compute the symmetrized uncertainty and the shift in data point.

Parameters
----------
delta_plus : float
The top/plus uncertainty with sign
delta_minus : float
The bottom/minus uncertainty with sign

Returns
-------
se_delta : float
The value to be added to the data point
se_sigma : float
The symmetrized uncertainty to be used in commondata

"""
semi_diff = (delta_plus + delta_minus) / 2
average = (delta_plus - delta_minus) / 2
se_delta = semi_diff
se_sigma = np.sqrt(average * average + 2 * semi_diff * semi_diff)
return se_delta, se_sigma
giacomomagni marked this conversation as resolved.
Show resolved Hide resolved

Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@
HERE = pathlib.Path(__file__).parent
sys.path = [str(HERE.parent / "HERMES_NC_7GEV_EP")] + sys.path

from filter import compute_covmat

from nnpdf_data.filter_utils.correlations import compute_covmat

def read_data(fnames):
df = pd.DataFrame()
Expand Down Expand Up @@ -81,11 +80,9 @@ def write_data(df):
# Extract the correlation matrix and compute artificial systematics
ndata_points = len(data_central)
corrmatrix = read_corrmatrix(nb_datapoints=ndata_points)
# Compute the covariance matrix
compute_covmat(corrmatrix, df, ndata_points)

# Compute the covariance matrix
art_sys = compute_covmat(corrmatrix, df, ndata_points)
art_sys = compute_covmat(corrmatrix, df['stat'], ndata_points)

error = []
for i in range(ndata_points):
Expand Down
84 changes: 2 additions & 82 deletions nnpdf_data/nnpdf_data/new_commondata/HERMES_NC_7GEV_EP/filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
import pathlib

import numpy as np
from numpy.linalg import eig
import pandas as pd
import yaml

from nnpdf_data.filter_utils.correlations import compute_covmat

def read_data(fnames):
df = pd.DataFrame()
Expand Down Expand Up @@ -49,84 +49,6 @@ def read_corrmatrix(nb_datapoints: int = 15) -> np.ndarray:

return df_corrs.value.values.reshape((nb_datapoints, nb_datapoints))


def covmat_to_artunc(ndata, covmat_list, no_of_norm_mat=0):
r"""Convert the covariance matrix to a matrix of
artificial uncertainties.

NOTE: This function has been taken from validphys.newcommondata_utils.
If those utils get merged in the future, we can replace this.

Parameters
----------
ndata : integer
Number of data points
covmat_list : list
A one dimensional list which contains the elements of
the covariance matrix row by row. Since experimental
datasets provide these matrices in a list form, this
simplifies the implementation for the user.
no_of_norm_mat : int
Normalized covariance matrices may have an eigenvalue
of 0 due to the last data point not being linearly
independent. To allow for this, the user should input
the number of normalized matrices that are being treated
in an instance. For example, if a single covariance matrix
of a normalized distribution is being processed, the input
would be 1. If a covariance matrix contains pertains to
3 normalized datasets (i.e. cross covmat for 3
distributions), the input would be 3. The default value is
0 for when the covariance matrix pertains to an absolute
distribution.

Returns
-------
artunc : list
A two dimensional matrix (given as a list of lists)
which contains artificial uncertainties to be added
to the commondata. i^th row (or list) contains the
artificial uncertainties of the i^th data point.

"""
epsilon = -0.0000000001
neg_eval_count = 0
psd_check = True
covmat = np.zeros((ndata, ndata))
artunc = np.zeros((ndata, ndata))
for i in range(len(covmat_list)):
a = i // ndata
b = i % ndata
covmat[a][b] = covmat_list[i]
eigval, eigvec = eig(covmat)
for j in range(len(eigval)):
if eigval[j] < epsilon:
psd_check = False
elif eigval[j] > epsilon and eigval[j] <= 0:
neg_eval_count = neg_eval_count + 1
if neg_eval_count == (no_of_norm_mat + 1):
psd_check = False
elif eigval[j] > 0:
continue
if psd_check == False:
raise ValueError('The covariance matrix is not positive-semidefinite')
else:
for i in range(ndata):
for j in range(ndata):
if eigval[j] < 0:
continue
else:
artunc[i][j] = eigvec[i][j] * np.sqrt(eigval[j])
return artunc.tolist()


def compute_covmat(corrmat: np.ndarray, df: pd.DataFrame, ndata: int) -> list:
"""Compute the covariance matrix with the artificial stat uncertanties."""
# multiply by stat err
stat = df["stat"]
cov_mat = np.einsum("i,ij,j->ij", stat, corrmat, stat)
return covmat_to_artunc(ndata, cov_mat.flatten().tolist())


def write_data(df):
data_central = []
for i in range(len(df["G"])):
Expand All @@ -153,11 +75,9 @@ def write_data(df):
# Extract the correlation matrix and compute artificial systematics
ndata_points = len(data_central)
corrmatrix = read_corrmatrix(nb_datapoints=ndata_points)
# Compute the covariance matrix
compute_covmat(corrmatrix, df, ndata_points)

# Compute the covariance matrix
art_sys = compute_covmat(corrmatrix, df, ndata_points)
art_sys = compute_covmat(corrmatrix, df['stat'], ndata_points)

error = []
for i in range(ndata_points):
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
data_central:
- -0.0014
- -0.0005
- 0.0058
- 0.0034
- 0.0077
- -0.0181
101 changes: 101 additions & 0 deletions nnpdf_data/nnpdf_data/new_commondata/PHENIX_1JET_200GEV/filter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
import pandas as pd
import yaml

POL_UNC = 0.094


def read_data():
df = pd.DataFrame()

with open("rawdata/Table4.yaml", "r") as file:
data = yaml.safe_load(file)

pTbsub = data["independent_variables"][0]["values"]
pTsub = data["dependent_variables"][0]["values"]
ALLsub = data["dependent_variables"][1]["values"]

for i in range(len(ALLsub)):
df = pd.concat(
[
df,
pd.DataFrame(
{
"pT": [pTsub[i]["value"]],
"pTmin": [pTbsub[i]["low"]],
"pTmax": [pTbsub[i]["high"]],
"eta": [0.0],
"eta_min": [-0.35],
"eta_max": [0.35],
"sqrts": [200],
"ALL": [ALLsub[i]["value"]],
"stat": [ALLsub[i]["errors"][0]["symerror"]],
}
),
],
ignore_index=True,
)

df["pol"] = POL_UNC * abs(df["ALL"])
return df


def write_data(df):
data_central = []
for i in range(len(df["ALL"])):
data_central.append(float(df.loc[i, "ALL"]))

data_central_yaml = {"data_central": data_central}
with open("data.yaml", "w") as file:
yaml.dump(data_central_yaml, file, sort_keys=False)

# Write kin file
kin = []
for i in range(len(df["ALL"])):
kin_value = {
"pT": {
"min": float(df.loc[i, "pTmin"]),
"mid": float(df.loc[i, "pT"]),
"max": float(df.loc[i, "pTmax"]),
},
"sqrts": {"min": None, "mid": float(df.loc[i, "sqrts"]), "max": None},
"eta": {
"min": float(df.loc[i, "eta_min"]),
"mid": float(df.loc[i, "eta"]),
"max": float(df.loc[i, "eta_max"]),
},
}
kin.append(kin_value)

kinematics_yaml = {"bins": kin}

with open("kinematics.yaml", "w") as file:
yaml.dump(kinematics_yaml, file, sort_keys=False)

# Write unc file
error = []
for i in range(len(df)):
e = {"stat": float(df.loc[i, "stat"]), "pol": float(df.loc[i, "pol"])}
error.append(e)

error_definition = {
"stat": {
"description": "statistical uncertainty",
"treatment": "ADD",
"type": "UNCORR",
},
"pol": {
"description": "beam polarization uncertainty",
"treatment": "MULT",
"type": "RHIC2005POL",
},
}

uncertainties_yaml = {"definitions": error_definition, "bins": error}

with open("uncertainties.yaml", "w") as file:
yaml.dump(uncertainties_yaml, file, sort_keys=False)


if __name__ == "__main__":
df = read_data()
write_data(df)
Loading
Loading