Skip to content

Cache objects and performance plots

Daniel Hundhausen edited this page Jun 6, 2024 · 6 revisions

Object Performance Tools

The tools present in this folder allow the user to produce matching efficiency, turn-on curves, and scaling plots for the various L1 objects under test.

Table of content

Caching the NTuple trees

Note: The repository includes a symlink to a location on eos where usually all needed caching files already exist.

In order to run the next steps, the object TTrees from the L1NTuples need to be cached as awkward arrays saved into .parquet files. This is done by running:

cache_objects cfg_caching/<VERSION>/caching.yaml 

This step needs to be run only once per configuration (unless changes in the input L1 ntuples occur) and the .parquet files generated by the code can be used for all the subsequent steps of the workflow, without having to open the .root files and load the objects every time the framework is run.

Structure of the config files: cache step

The .yaml configuration files specify which samples to be loaded. For each sample one can specify which objects to be loaded from the gen-level and L1-object TTrees. For each object, one needs to specify which branches to be loaded:

<VERSION>:
  GluGluToGG:
    ntuple_path: <PATH_TO_L1_NTUPLES>
    trees_branches:
      genTree/L1GenTree:
        <GEN_PART>: <GEN_BRANCHES>
      l1PhaseIITree/L1PhaseIITree:
        <L1_OBJ>: <L1_BRANCHES>

For <GEN_PART> one can specify:

  • part_<PARTICLE>, where <PARTICLE> can be any of the strings defined in get_pdg_id for gen-level leptons.

  • genMetTrue for gen-level MET.

  • jets for gen-level jets.

For <GEN_BRANCHES> one can specify:

  • [Id, Stat, Pt, Eta, Phi] when any of part_<PARTICLE> is used. These refer to the gen-level branches to be loaded and used internally to cache_objects.py. This list reflects the fact that in the .root ntuples (saved in <PATH_TO_L1_NTUPLES>) the gen-level information for these leptons is saved in genTree/L1GenTree in the partId, partStat, partPt, partEta, partPhi.

  • "all" in all the other cases (MET and jets). When the "all" keyword is used, the framework will load all the branches in genTree/L1GenTree that start with <GEN_PART> (e.g. for jet it will load jetPt, jetEta, jetPhi, jetM). Note: In principle you can use the "all" for all the objects. However, if there are empty branches for a given object, the code might crash.

For <L1_OBJ> one can specify all the objects that are documented for each release in this Google Doc. As <L1_BRANCHES> one specifies the list of branches to be loaded for a given <L1_OBJ>. Similarly to the <GEN_PART>, one can use the "all" keyword to load all the branches in the TTree that start with <L1_OBJ>.

For reference, the branches used for each object in the legacy C++ version of the tools can be found here.

Efficiency and Scalings

The plotter.py script can be used to produce matching efficiencies, turn-on curves, and L1 scaling plots. It can be run with

  object_performance configs/<VERSION>/object_performance/<PLOT_CONFIG>.yaml 

where <PLOT_CONFIG>.yaml is a config file that is usually stored in the path given above.

  • Cuts (e.g. eta, pT, isolation, and quality criteria) on the gen-level.

  • List of L1 objects to be included in the same plot for a given gen-level object.

  • Plotting features such as axes labels and binning of the histograms used for the computation of the efficiencies.

The outputs will be written to the outputs/<VERSION>/object_performance directory, where <VERSION> is the version of the ntuples used for the plots as specified in the .yaml config file (more details on the config file are given below). The plots are saved in three subfolders:

  • distributions: plots of the distributions (histograms) used to compute the efficiencies. For each efficiency curve plotted, these plots depict the distributions used for as numberator and denumerator in the computation of the efficiencies.

  • turnons: plots of the matching efficiencies and L1 turn-on efficiency curves.

  • scalings: plots of the scalings, i.e. position of the turnon point (often 95% location) as a function of different threshold cuts on L1 objects.

Structure of the config files: plotting

The general structure of the config files used for the plotting script is the following:

  <CONFIG_TITLE>:
    sample: <INPUT_SAMPLE>
    version: <VERSION>
    match_test_to_ref: True/False
    reference_object:
      object: "<GEN_LEVEL_OBJECT>"
      x_arg: "<OBSERVABLE>"
      label: "<GEN_LABEL>"
      cuts:
        event:
          - "{dr_0.3} < 0.15"
          - "abs({eta}) > 1.5"
        object:
          - "abs({eta}) < 2.4"
    test_objects:
      <TEST_OBJ>:<OBJ_ID>:<ETA_REGION>
      ...
      <TEST_OBJ>:<OBJ_ID>:<ETA_REGION>
    xlabel: "Gen. $p_T$ (GeV)"
    ylabel: "Matching Efficiency (Endcap)"
    binning:
      min: 0
      max: 150
      step: 3

where:

  • <CONFIG_TITLE>: descriptive name of the plot (e.g. ElectronsMatchingBarrel). This string is used as a prefix for the output filenames.

  • <SAMPLE>: identifier of the sample from which the objects should be taken. This string should reflect the one used in the cache step, e.g. DYLL_M50, TT, Hgg, VBFHToTauTau, GluGluToGG, GluGluToHHTo2B2Tau.

  • <VERSION>: identifier of the version of the ntuples to be used. Also this filed should reflect the one used in the config file for the cache step.

Hence, having specified <SAMPLE> and <VERSION>, the framework will look for gen-level and L1 objects (hereafter referred to as reference and test objects, respectively) in:

  cache/<VERSION>/<VERSION>_<SAMPLE>_<OBJECT>.parquet

The next entries of the .yaml config file are:

  • reference_object: this filed specifies all the properties of gen-level (reference) objects. Only one reference object per <CONFIG_TITLE> can be specified. For a reference_object one can specify:

    • object: the reference object to be used for the plots. The same list of particle names that can be used for <GEN_PART> in the cache step.

    • x_arg: the observable to be used for the computation of the efficiencies and in the plots. Typically "Pt" or "Eta".

    • label: the label to be used for the legend of the plots.

    • cuts: divided between object and event. The former refer to cuts applied to all the reference object on an event-by-event basis, while the latter defines cuts to be applied to particles in each event. In the example above, all the events with |eta|>2.4 are discarded. For the remaining events, we keep only particles that have gen-level isolation (dr_0.3) < 0.15 and |eta|>1.5. In order to specify a cut one needs to put in curly brackets the lowercase name of the observable as it can be found in the awkward arrays used as inputs. For example: abs({eta}), {pt}, {dr_0.3}, etc... Note: gen-level isolation between leptons and all the final state particles is computed here for different values of deltaR and stored in arrays called dr_deltaR (available for deltaR=[0.1, 0.15, 0.2, 0.3, 1.5]). In the config files one should always be using {dr_0.3} (unless agreed upon differently with the L1-PhaseIIMenu group).

  • test_objects: several objects can be specified under this field. These define all the L1 objects (test objects) for which efficiencies and scalings will be computed. Each test object is specified by referencing <TEST_OBJ>:<OBJ_ID>:<ETA_REGION>. The object properties (meaning cuts, eta regions, etc.) are defined in configs/<VERSION>/objects/.

Additional fields to be specified in the .yaml config file are:

  • xlabel and ylabel: self-explanatory. x and y-label of the plots.

  • binning: with min, max, and step as fields. These define the range (min and max) to be used for the plots (or, equivalently, for the efficiency computation) and the number of bins to be used therein (step). Hence the binning field defines evenly spaced values (step) within a given interval (with min and max as boundaries).

The one defined above is the general structure of the config files for the plotting step and can be used to produce matching efficiency plots (i.e. plots in which the efficiency is defined as the ratio of the reference objects with a match to the test object to all the reference objects.) as a function of transverse momentum and pseudorapidity. Working examples for the main objects (electron, muons, jets, and taus) can be found in cfg_plots under <OBJECT>_matching.yaml and <OBJECT>_matching_eta.yaml for the matching efficiency as a function of transverse momentum and pseudorapidity, respectively.

Additional fields can be included in the config file according to the specific use-case, as detailed below.

Turn-on curves

In addition to the fields mentioned above, one can add the thresholds field in the .yaml config file for the plots:

  thresholds: <LIST_OF_THRESHOLDS>

to produce L1 turn-on curves for different cut values on the observable specified in suffix for test_objects. For example, if <LIST_OF_THRESHOLDS> is [10, 20] and suffix is pt, turn-on curves for test objects with a cut at 10 and 20 GeV on pt will be produced. The L1 turn-on efficiency curves are defined as the ratio between the reference object suffix observable matched to L1 suffix observable, with a cut at the different <LIST_OF_THRESHOLDS> values, and the matched distribution without any cut on the L1 suffix observable.

For reference, to compare with the C++ version, the distributions used for the computation of the efficiencies are defined in this (and similar) scripts under --numerator and --denumerator. The corresponding objects are defined in this config file.

Scalings

If the scalings field is specified for a given <TEST_OBJ>, then scaling plots will be produced. A scaling plot defines the offline-to-online relation, thus specifying to which threshold on the L1 object it corresponds a certain location (often 90% or 95%) on the turn-on efficiency curve.

The syntax for the scalings field is the following:

  scalings:
    method: <METHOD>
    threshold: <THRESHOLD>

where:

  • <METHOD>: can either be "naive" or "errf". The former corresponds to an interpolation of the turn-on efficiency curve, while the latter corresponds to a fit of the turn-on efficiency curve with an error-function-like curve. The result of the interpolation of the fit is then used to retrieve the <THRESHOLD> location on the turn-on curve.

  • <THRESHOLD> defines the position to be found on the turn-on efficiency curve (e.g. 0.95 for 95%).

The scalings are built, for a given reference object, using the threshold values defined in scaling_thresholds.

Object definitions

Objects are defined centrally and can be used by referencing them in the configs of both the object_performance as well as the rate part of the tools. This prevents the definitions of objects diverting between parts of the code and defines a clear interface for the scalings, that are computed per eta_region as defined in the object.

Configuration

The objects are defined in yaml files like everything else in these menu tools. Here is an example of a object config:

L1gmtTkMuon:
  label: "GMT TkMuon"
  match_dR: 0.1 
  eta_ranges:
    inclusive: [0, 7]
    barrel: [0, 0.83]
    overlap: [0.83, 1.24]
    endcap: [1.24, 2.4]
  ids:
    default:
      label: "GMT TkMuon, Loose ID"
      cuts:
        inclusive:
          - "{hwQual} >= 3"
    VLoose: # x.numberOfMatches() > 0
      label: "GMT TkMuon, VLoose ID" 
      cuts:
        inclusive:
          - "{hwQual} >= 1"
    Loose: # x.numberOfMatches() >1
      label: "GMT TkMuon, Loose ID" 
      cuts:
        inclusive:
          - "{hwQual} >= 3"
    Medium: # x.stubs().size()>1
      label: "GMT TkMuon, Medium ID" 
      cuts:
        inclusive:
          - "{hwQual} >= 7"
    Tight: # x.numberOfMatches()>2
      label: "GMT TkMuon, Tight ID" 
      cuts:
        inclusive:
          - "{hwQual} >= 15"
* `x_arg`: the observable to be used for the computation of the efficiencies and in the plots. It has to be the same used for the reference object.
**Note:** For `trackerHT`, `phase1PuppiHT`, `trackerMHT`, and `trackerMET` objects the `<OBSERVABLE>` in `suffix` needs to be left empty (i.e. `""`) because for these objects only the transverse momentum is stored in the input ntuples (in a `TBranch` with the same name of the object).

* `label`: label for the test object (used in the legend of the plots).

* `match_dR`: a number here defines the deltaR cut to be used in the matching between reference (gen-level) and test (L1) objects. If this field is not included in `<TEST_OBJ>`, no deltaR matching is performed.

* `eta_ranges`: eta ranges for which different ids/cuts are to be defined

* `ids`: IDs with different cut criteria etc.

The cuts are then defined in the individual IDs. The name of each ID is the outermost key (default, VLoose, ... in the example above). A label for plotting and cuts can be defined. The cuts are key value pairs, where the keys have to correspond to one of the defined eta_regions and all cuts defined below it will be applied to this region only. Similarly to the reference object, the lowercase name of the observable used for the cut needs to be placed within curly brackets (e.g. {eta}, {pt} etc).

What cuts and observables need to be used can be assessed from the [objects Google Doc](https://docs.google.com/spreadsheets/d/1u3IjbePHyQnABg1nel06ITG1kO1bG0k5yw0zc_KqqHM/edit#gid=1105636672), the [PhaseII-Menu TWiki](https://twiki.cern.ch/twiki/bin/view/CMS/PhaseIIL1TriggerMenuTools), or with private communications with the L1 PhaseII-Menu conveners.