Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unweighting implementation #2083

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from
Draft

Conversation

toonhasenack
Copy link
Collaborator

Calculation of the unweighted weights in the vp framework. Module can be called with
vp-unweight "path to chi2".csv "number of datapoints"

@RoyStegeman
Copy link
Member

Why is this a cli command and not just a vp action?

@toonhasenack
Copy link
Collaborator Author

@RoyStegeman, I followed the suggestion of @scarlehoff to do it this way.

pyproject.toml Outdated
@@ -106,7 +107,7 @@ qed = ["fiatlux"]
nolha = ["pdfflow", "lhapdf-management"]

[tool.poetry-dynamic-versioning]
enable = true
enable = false
Copy link
Member

@scarlehoff scarlehoff May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you disable versioning?

(it needs to be reenabled, but i'd like to know the exact problem that led to needing to disable it)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this happened because I built NNPDF locally. I reverted it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in principle you should be able to install with pip install -e . (or without -e) and it should not modify this file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange. It did happen in my case...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your environment managed by poetry?

import pandas as pd
from scipy.special import xlogy
from typing import Tuple
from tqdm import tqdm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too fond of tqdm because it usually makes logs and debugging harder to read.
But if you are set on using it you must add it to the dependencies as well.

parser.add_argument("chi2_name", help = "Add the filename of the chi2 dataset (.csv)")
parser.add_argument("N", help = "Add the amount of experimental datapoints that the chi2 is based on")
args = parser.parse_args()
chi2 = pd.read_csv(args.chi2_name).values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the csv generated? This should be part of the script.

Ideally the script should have two steps

  1. Check whether the csv exist and if It doesn't generate it

  2. Do the actual computation

In the validphys language these would be two actions, where the second depends on the first, but we can make it into proper validphys actions at the end, it can remain a script for the time being.

For the first action you can use a lot of vp stuff (e.g., functions to compute the chi2, or load pdfs).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I could not find a better solution for. The calculation of the chi2 per replica is done in the nnpdf/external/ poljet code and if I would shift the calculation of the chi2 to validphys, I'm afraid I must migrate more functionality. However, I don't think the solution will be more elegant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you are already using validphys in that code.

So a first stage can be a simple port, and then later on instead of doing self.l.check_commondata(data_name) or API.dataset_inputs_covmat_from_systematics(**inp) you would have a function that takes as input results and then according to the definiton of dataset_inputs in the runcard / input you would get a results object already populated with the dataset, covariance matrix, statistic / systematic errors, etc.


return Nps, entropies, Nopt

def main(chi2, N, store = True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add also here some doctrings explaining what it does.

"""
Initialize the Unweight class.

Args:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying to follow the numpy conventions https://numpydoc.readthedocs.io/en/latest/format.html#parameters at least for new code (and specially for parameters, returns, etc). It helps consistency.

@Radonirinaunimi
Copy link
Member

@toonhasenack Just to echo what has been above about the re/un-weighting being self-consistent. Everything except the generation of the theory predictions should be here, ie from the computations of the $\chi^2$ to the final dumping of the PDFs. Ideally, the theory predictions should also be stored in ../NNPDF/theories.

Also agreed on this being vp actions instead of scripts.

If you want, I could take over this PR from this point.

@Radonirinaunimi Radonirinaunimi marked this pull request as draft May 24, 2024 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants