-
Notifications
You must be signed in to change notification settings - Fork 10
Cache objects and performance plots
The tools present in this folder allow the user to produce matching efficiency, turn-on curves, and scaling plots for the various L1 objects under test.
Note: The repository includes a symlink to a location on eos
where usually all needed caching files already exist.
In order to run the next steps, the object
TTrees
from the L1NTuples need to be cached
as awkward
arrays saved into .parquet
files.
This is done by running:
cache_objects cfg_caching/<VERSION>/caching.yaml
This step needs to be run only once per configuration (unless changes in the input L1 ntuples occur) and the .parquet
files generated by the code can be used for all the subsequent steps of the workflow, without having to open the .root
files and load the objects every time the framework is run.
The .yaml
configuration files specify which samples
to be loaded. For each sample one can specify which
objects to be loaded from the gen-level and L1-object TTree
s.
For each object, one needs to specify which branches to be loaded:
<VERSION>:
GluGluToGG:
ntuple_path: <PATH_TO_L1_NTUPLES>
trees_branches:
genTree/L1GenTree:
<GEN_PART>: <GEN_BRANCHES>
l1PhaseIITree/L1PhaseIITree:
<L1_OBJ>: <L1_BRANCHES>
For <GEN_PART>
one can specify:
-
part_<PARTICLE>
, where<PARTICLE>
can be any of the strings defined inget_pdg_id
for gen-level leptons. -
genMetTrue
for gen-level MET. -
jets
for gen-level jets.
For <GEN_BRANCHES>
one can specify:
-
[Id, Stat, Pt, Eta, Phi]
when any ofpart_<PARTICLE>
is used. These refer to the gen-level branches to be loaded and used internally tocache_objects.py
. This list reflects the fact that in the.root
ntuples (saved in<PATH_TO_L1_NTUPLES>
) the gen-level information for these leptons is saved ingenTree/L1GenTree
in thepartId, partStat, partPt, partEta, partPhi
. -
"all"
in all the other cases (MET and jets). When the"all"
keyword is used, the framework will load all the branches ingenTree/L1GenTree
that start with<GEN_PART>
(e.g. forjet
it will loadjetPt
,jetEta
,jetPhi
,jetM
). Note: In principle you can use the"all"
for all the objects. However, if there are empty branches for a given object, the code might crash.
For <L1_OBJ>
one can specify all the objects that are documented for each release in this Google Doc.
As <L1_BRANCHES>
one specifies the list of branches to be loaded for a given <L1_OBJ>
. Similarly to the <GEN_PART>
, one can use the "all"
keyword to load all the branches in the TTree
that start with <L1_OBJ>
.
For reference, the branches used for each object in the legacy C++
version of the tools can be found here.
The plotter.py
script can be used to produce matching efficiencies, turn-on curves, and L1 scaling plots. It can be run with
object_performance configs/<VERSION>/object_performance/<PLOT_CONFIG>.yaml
where <PLOT_CONFIG>.yaml
is a config file that is usually stored in the path given above.
-
Cuts (e.g. eta, pT, isolation, and quality criteria) on the gen-level.
-
List of L1 objects to be included in the same plot for a given gen-level object.
-
Plotting features such as axes labels and binning of the histograms used for the computation of the efficiencies.
The outputs will be written to the outputs/<VERSION>/object_performance
directory, where <VERSION>
is the version of the ntuples used for the plots as specified in the .yaml
config file (more details on the config file are given below).
The plots are saved in three subfolders:
-
distributions
: plots of the distributions (histograms) used to compute the efficiencies. For each efficiency curve plotted, these plots depict the distributions used for as numberator and denumerator in the computation of the efficiencies. -
turnons
: plots of the matching efficiencies and L1 turn-on efficiency curves. -
scalings
: plots of the scalings, i.e. position of the turnon point (often 95% location) as a function of different threshold cuts on L1 objects.
The general structure of the config files used for the plotting script is the following:
<CONFIG_TITLE>:
sample: <INPUT_SAMPLE>
version: <VERSION>
match_test_to_ref: True/False
reference_object:
object: "<GEN_LEVEL_OBJECT>"
x_arg: "<OBSERVABLE>"
label: "<GEN_LABEL>"
cuts:
event:
- "{dr_0.3} < 0.15"
- "abs({eta}) > 1.5"
object:
- "abs({eta}) < 2.4"
test_objects:
<TEST_OBJ>:<OBJ_ID>:<ETA_REGION>
...
<TEST_OBJ>:<OBJ_ID>:<ETA_REGION>
xlabel: "Gen. $p_T$ (GeV)"
ylabel: "Matching Efficiency (Endcap)"
binning:
min: 0
max: 150
step: 3
where:
-
<CONFIG_TITLE>
: descriptive name of the plot (e.g.ElectronsMatchingBarrel
). This string is used as a prefix for the output filenames. -
<SAMPLE>
: identifier of the sample from which the objects should be taken. This string should reflect the one used in the cache step, e.g.DYLL_M50
,TT
,Hgg
,VBFHToTauTau
,GluGluToGG
,GluGluToHHTo2B2Tau
. -
<VERSION>
: identifier of the version of the ntuples to be used. Also this filed should reflect the one used in the config file for the cache step.
Hence, having specified <SAMPLE>
and <VERSION>
, the framework will look for gen-level and L1 objects (hereafter referred to as reference and test objects, respectively) in:
cache/<VERSION>/<VERSION>_<SAMPLE>_<OBJECT>.parquet
The next entries of the .yaml
config file are:
-
reference_object
: this filed specifies all the properties of gen-level (reference) objects. Only one reference object per<CONFIG_TITLE>
can be specified. For areference_object
one can specify:-
object
: the reference object to be used for the plots. The same list of particle names that can be used for<GEN_PART>
in the cache step. -
x_arg
: the observable to be used for the computation of the efficiencies and in the plots. Typically"Pt"
or"Eta"
. -
label
: the label to be used for the legend of the plots. -
cuts
: divided betweenobject
andevent
. The former refer to cuts applied to all the reference object on an event-by-event basis, while the latter defines cuts to be applied to particles in each event. In the example above, all the events with|eta|>2.4
are discarded. For the remaining events, we keep only particles that have gen-level isolation (dr_0.3
) < 0.15 and|eta|>1.5
. In order to specify a cut one needs to put in curly brackets the lowercase name of the observable as it can be found in theawkward
arrays used as inputs. For example:abs({eta})
,{pt}
,{dr_0.3}
, etc... Note: gen-level isolation between leptons and all the final state particles is computed here for different values of deltaR and stored in arrays calleddr_deltaR
(available fordeltaR=[0.1, 0.15, 0.2, 0.3, 1.5]
). In the config files one should always be using{dr_0.3}
(unless agreed upon differently with the L1-PhaseIIMenu group).
-
-
test_objects
: several objects can be specified under this field. These define all the L1 objects (test objects) for which efficiencies and scalings will be computed. Each test object is specified by referencing<TEST_OBJ>:<OBJ_ID>:<ETA_REGION>
. The object properties (meaning cuts, eta regions, etc.) are defined inconfigs/<VERSION>/objects/
.
Additional fields to be specified in the .yaml
config file are:
-
xlabel
andylabel
: self-explanatory. x and y-label of the plots. -
binning
: withmin
,max
, andstep
as fields. These define the range (min
andmax
) to be used for the plots (or, equivalently, for the efficiency computation) and the number of bins to be used therein (step
). Hence thebinning
field defines evenly spaced values (step
) within a given interval (withmin
andmax
as boundaries).
The one defined above is the general structure of the config files for the plotting step and can be used to produce matching efficiency plots (i.e. plots in which the efficiency is defined as the ratio of the reference objects with a match to the test object to all the reference objects.) as a function of transverse momentum and pseudorapidity.
Working examples for the main objects (electron, muons, jets, and taus) can be found in cfg_plots
under <OBJECT>_matching.yaml
and <OBJECT>_matching_eta.yaml
for the matching efficiency as a function of transverse momentum and pseudorapidity, respectively.
Additional fields can be included in the config file according to the specific use-case, as detailed below.
In addition to the fields mentioned above, one can add the thresholds
field in the .yaml
config file for the plots:
thresholds: <LIST_OF_THRESHOLDS>
to produce L1 turn-on curves for different cut values on the observable specified in suffix
for test_objects
.
For example, if <LIST_OF_THRESHOLDS>
is [10, 20]
and suffix
is pt
, turn-on curves for test objects with a cut at 10
and 20
GeV on pt
will be produced.
The L1 turn-on efficiency curves are defined as the ratio between the reference object suffix
observable matched to L1 suffix
observable, with a cut at the different <LIST_OF_THRESHOLDS>
values, and the matched distribution without any cut on the L1 suffix
observable.
For reference, to compare with the C++
version, the distributions used for the computation of the efficiencies are defined in this (and similar) scripts under --numerator
and --denumerator
. The corresponding objects are defined in this config file.
If the scalings
field is specified for a given <TEST_OBJ>
, then scaling plots will be produced.
A scaling plot defines the offline-to-online relation, thus specifying to which threshold on the L1 object it corresponds a certain location (often 90% or 95%) on the turn-on efficiency curve.
The syntax for the scalings
field is the following:
scalings:
method: <METHOD>
threshold: <THRESHOLD>
where:
-
<METHOD>
: can either be"naive"
or"errf"
. The former corresponds to an interpolation of the turn-on efficiency curve, while the latter corresponds to a fit of the turn-on efficiency curve with an error-function-like curve. The result of the interpolation of the fit is then used to retrieve the<THRESHOLD>
location on the turn-on curve. -
<THRESHOLD>
defines the position to be found on the turn-on efficiency curve (e.g.0.95
for 95%).
The scalings are built, for a given reference object, using the threshold values defined in scaling_thresholds
.
Objects are defined centrally and can be used by referencing them in the configs of both the object_performance
as well as the rate
part of the tools. This prevents the definitions of objects diverting between parts of the code and defines a clear interface for the scalings, that are computed per eta_region
as defined in the object.
The objects are defined in yaml
files like everything else in these menu tools. Here is an example of a object config:
L1gmtTkMuon:
label: "GMT TkMuon"
match_dR: 0.1
eta_ranges:
inclusive: [0, 7]
barrel: [0, 0.83]
overlap: [0.83, 1.24]
endcap: [1.24, 2.4]
ids:
default:
label: "GMT TkMuon, Loose ID"
cuts:
inclusive:
- "{hwQual} >= 3"
VLoose: # x.numberOfMatches() > 0
label: "GMT TkMuon, VLoose ID"
cuts:
inclusive:
- "{hwQual} >= 1"
Loose: # x.numberOfMatches() >1
label: "GMT TkMuon, Loose ID"
cuts:
inclusive:
- "{hwQual} >= 3"
Medium: # x.stubs().size()>1
label: "GMT TkMuon, Medium ID"
cuts:
inclusive:
- "{hwQual} >= 7"
Tight: # x.numberOfMatches()>2
label: "GMT TkMuon, Tight ID"
cuts:
inclusive:
- "{hwQual} >= 15"
* `x_arg`: the observable to be used for the computation of the efficiencies and in the plots. It has to be the same used for the reference object.
**Note:** For `trackerHT`, `phase1PuppiHT`, `trackerMHT`, and `trackerMET` objects the `<OBSERVABLE>` in `suffix` needs to be left empty (i.e. `""`) because for these objects only the transverse momentum is stored in the input ntuples (in a `TBranch` with the same name of the object).
* `label`: label for the test object (used in the legend of the plots).
* `match_dR`: a number here defines the deltaR cut to be used in the matching between reference (gen-level) and test (L1) objects. If this field is not included in `<TEST_OBJ>`, no deltaR matching is performed.
* `eta_ranges`: eta ranges for which different ids/cuts are to be defined
* `ids`: IDs with different cut criteria etc.
The cuts are then defined in the individual IDs. The name of each ID is the outermost key (default
, VLoose
, ... in the example above). A label for plotting and cuts can be defined. The cuts
are key value pairs, where the keys have to correspond to one of the defined eta_regions
and all cuts defined below it will be applied to this region only. Similarly to the reference object, the lowercase name of the observable used for the cut needs to be placed within curly brackets (e.g. {eta}
, {pt}
etc).
What cuts and observables need to be used can be assessed from the [objects Google Doc](https://docs.google.com/spreadsheets/d/1u3IjbePHyQnABg1nel06ITG1kO1bG0k5yw0zc_KqqHM/edit#gid=1105636672), the [PhaseII-Menu TWiki](https://twiki.cern.ch/twiki/bin/view/CMS/PhaseIIL1TriggerMenuTools), or with private communications with the L1 PhaseII-Menu conveners.