Validation of 8GeV Signal #1215

tomeichlersmith · 2023-10-03T15:51:49Z

I am updating ldmx-sw, here are the details.

What are the issues that this addresses?

This applies some tweaks to the dark brem configuration in Biasing/python/target.py so that the generation is efficient and appropriate for 8GeV.
Introduces a signal validation sample which uses dark-brem-lib-gen to generate a reference library during the validation sample init procedure.
Update the dark brem DQM analyzer to add the extracted information to the event bus and fill histograms for the Validation module.
Add a dark_brem module to Validation so that the plots below can be remade in the future with ease.
- @EinarElen I moved the tick_labels to be the bin centers rather than the bin edges. I applied this update to the hcal and photonuclear plots by removing the trailing empty tick label '' from all the tick label lists in those modules. Please confirm this was correct or guide on how to fix this if not.

Check List

I successfully compiled ldmx-sw with my developments
I ran my developments and the following shows that they are successful. (below)
NA ~~I attached any sub-module related changes to this PR.~~

Running

Rough outline, have not confirmed this can work out of the box.

# got dark brem libraries from SLAC
# extract them into small CSV libraries for quicker re-running (not necessary)
for dblib in dark-brem-libraries/*; do
  tar -C dark-brem-libraries -xzf ${dblib}
  ldmx g4db-extract-library ${dblib}
  gzip ${dblib}.csv
done
# run the simulation
for dblib in dark-brem-libraries/*.csv.gz; do
  ldmx fire config.py ${dblib} --out-dir valid
done
# make the plot
python3 -m Validation valid --systems dark_brem.kinematics --label '$m_A$ [GeV]' --param mA

The config.py I used is the same as the one in the new signal validation sample, but with some tweaks to have the dark brem library be passed on the command line.

diff config.py .github/validation_samples/signal/config.py

6,23d5
< import os
< import argparse
< 
< parser = argparse.ArgumentParser()
< parser.add_argument('dblib', help='Dark Brem event library to use as refernece. The run and dark photon mass are taken from the basenaem of this library as well.')
< parser.add_argument('--out-dir',help='directory in which to put output file')
< args = parser.parse_args()
< 
< def basename_no_extension(path):
<     # have to apply splitext twice since we have the chance of two extensions '.csv.gz'
<     return os.path.splitext(os.path.splitext(os.path.basename(path))[0])[0]
< 
< os.makedirs(args.out_dir, exist_ok=True)
< 
< parameter_list = basename_no_extension(args.dblib).split('_')
< print(parameter_list)
< mA = float(parameter_list[parameter_list.index('mA')+1])
< 
25c7,15
< mySim = target.dark_brem(mA*1000, args.dblib, 'ldmx-det-v14-8gev')
---
> mySim = target.dark_brem(
>     #A' mass in MeV - set in init.sh to same value in GeV
>     10.0,
>     # library path is uniquely determined by arguments given to `dbgen run` in init.sh
>     #   easiest way to find this path out is by running `. init.sh` locally to see what
>     #   is produced
>     'electron_tungsten_MaxE_8.0_MinE_4.0_RelEStep_0.1_UndecayedAP_mA_0.01_run_1',
>     'ldmx-det-v14-8gev'
> )
35,36c25,26
< p.maxEvents = 10000
< p.run = int(parameter_list[parameter_list.index('run')+1])
---
> p.maxEvents = int(os.environ['LDMX_NUM_EVENTS'])
> p.run = int(os.environ['LDMX_RUN_NUMBER'])
38,39c28,29
< p.histogramFile = f'{args.out_dir}/type_histos_beam_8gev_mA_{mA}_run_{p.run}.root'
< p.outputFiles = [ f'{args.out_dir}/type_events_beam_8gev_mA_{mA}_run_{p.run}.root' ]
---
> p.histogramFile = f'hist.root'
> p.outputFiles = [f'events.root']

Plots

Generally, we see the expected behavior. The A' takes a majority of the energy. The incident particle is close to the beam energy (most of the time), and the dark brem occurs in the tungsten target (most of the time). The heaviest mass point starts to happen more frequently in the trigger pads, but that is not too worrisome. This may inform us to include carbon and hydrogen in the dark brem reference library, but it is not necessary since the recoil electron loses so much energy either way in this high mass scenario.

You may notice that there is a bin for the dark photon energy above the beam energy. This is a non-empty bin because we are trying to replicate the dark brem process from MadGraph without the nucleus. G4DarkBreM chooses to put all of this messiness into the invisible dark photon so that the recoil electron's distributions follow the true process to the highest degree possible. In the high mass scenario, this means the dark photon's energy can easily exceed the beam energy since it also (in some sense) includes the energy that would be given to the nucleus. The two electrons involved (incident and recoil) also have this bin, but (if their energies are calculated correctly¹) they stay within physical bounds and faithfully model the beam electron prior to dark brem and the dark brem recoiling electron respectively.

(Apologies for the legend labels being out of order, they are merely ordered by how they get listed within python and I didn't feel like implementing a sorting mechanism within Validation.)

The calculation for the incident energy needs to use the recoil and dark photon's three momentum to calculate the incident's three momentum and then calculate its energy using its mass. This is necessary since G4DarkBreM chooses to conserve three momentum when ignoring the nucleus which therefore "shoves" a lot of "error" into the dark photon's total energy. This calculation is done correctly in the DQM processor in this PR. ↩

replace old ntuplizer with one that puts data in event bus better energy calculation in old, but the new adds the info to the event bus and extracts vertex material information for us

use mA=10MeV and use the template simulation from Biasing.target module use the init.sh script infrastructure to generate the reference lbirary before attempting the simulation

not updating labels for material/elements but that's how it goes

- move tick labels to bin centers so the number of labels is equal to the number of bins (and not the number of bin edges) - update hcal and photonuclear modules to conform to this change, removing the trailing empty label present in those tick label lists

this does not affect the distribution shapes greatly since the weights are extremely close to exactly 1/B within each sample

tomeichlersmith · 2023-10-03T18:52:02Z

The PR Validations are passing except for the new signal validation sample which is failing at the comparison step since there is no gold to compare against.

tomeichlersmith · 2023-10-05T16:22:48Z

At the last SW dev meeting, I was requested to investigate the cross section calculation at the new beam energy as well. The good news is that I've already extensively investigated the xsec calculation. The bad news is that it is really complicated!

I have a standing issue in SimCore related to this where I'm putting plots and the code/data I've used to make them. Specifically, the tar-ball linked there has the data files holding the total xsec as estimated by MadGraph/MadEvent which I am using as the source of "truth".

#1314

TLDR: The ratio of the cross section used within our simulation and the cross section estimated by MG/ME is roughly constant over the energy range we care about. The "constant-ness" of this ratio gets worse as the dark photon mass grows, but I don't think it has a large enough effect to matter at this point.

Long Answer

In the linked SimCore issue, I compare a "normalized" cross section calculated by the different cross section methods available in G4DB to a "normalized" cross section estimated by taking the mean of several MG/ME runs. The normalization is done to remove overall-scale effects that can be easily handled when calculating the total expected rate at analysis time. Currently, we use the G4DB default xsec method which uses "Full" for dark photon masses below 25MeV and "Improved" for masses above. Looking at the plots and focusing on our energy range of interest in the insets, these xsec calculations are roughly flat when compared to MG/ME (emphasis on roughly). While we could spend time optimizing this xsec calculation in order to more perfectly align with the MG/ME shape, I don't think that has enough benefit. I believe if we want the xsec calculation to match MG/ME we should simply import the MG/ME xsec and do an interpolation of it within G4DB rather than trying to tweak/hack a WW approximation. This would require some more development and this hurdle combined with the fact that the xsec is already kinda close causes me to conclude we are good (for now). Perhaps in the future it will be worth it to update G4DB or G4DB will be superseded by another generator like Pythia.

bryngemark · 2023-10-06T10:09:30Z

This looks good so far, and I'm particularly happy to see a suite of validation DQM histograms + the corresponding plotting being set up.

Just waiting to see the 4 GeV comparison and then I think this is good to go. The next step after merging would be to share your config to the LDCS repo where I will template-ify it and start production.

EinarElen · 2023-10-06T10:21:15Z

Looks good to me, I just had a minor comment suggestion and (as usual) a request for a corresponding header file for the source file. I (and I think a lot of people) will look in the header directory to find out what's available in a project, so if there's only a source file you'll miss that that translation unit exists.

Histogram bin update looks good, was a bit hacky before :)

tomeichlersmith · 2023-10-09T15:42:51Z

@EinarElen I moved the declarations into its own header. I cannot see another "minor comment suggestion", did you "finish your review" so that it gets posted for me?

tomeichlersmith · 2023-10-09T15:53:56Z

The 4GeV versions of the same histograms (with only 1k events sampled and doing density instead of overall count) have similar shapes and similar dependence on the A' mass. This is as expected because a 4->8GeV change is not too drastic of a change in the context of this model.

WARNING: the colors are different in this set of plots compared to the 8GeV plots above. This is because I just used my Validation module to plot the directory of plots again and did not bother trying to do an overlay or side-by-side comparison with matplotlib. I could, but I don't think it is worth it given how close these distributions are.

DQM/src/DQM/DarkBremInteraction.cxx

explain in doxygen comment why this function exists basically copying the rational written for the produce function as well.

tomeichlersmith added 6 commits October 2, 2023 12:24

merge old and new DQM dark brem stuff

7f3d719

replace old ntuplizer with one that puts data in event bus better energy calculation in old, but the new adds the info to the event bus and extracts vertex material information for us

update energy thresholds for target dark brem to 8GeV

ad744a2

init a signal validation sample

5d4aed3

use mA=10MeV and use the template simulation from Biasing.target module use the init.sh script infrastructure to generate the reference lbirary before attempting the simulation

add most prominent histograms for PR Validations

f9b8e13

add plotting of dark brem DQM histograms

1f61df8

not updating labels for material/elements but that's how it goes

tomeichlersmith requested review from EinarElen and bryngemark October 3, 2023 15:51

make sure to weight the histograms by the event weight

c16889a

this does not affect the distribution shapes greatly since the weights are extremely close to exactly 1/B within each sample

move declaration of DarkBremInteraction into its own header

ad64e86

use density since the histogram's weight scales are so different

1580575

EinarElen approved these changes Oct 10, 2023

View reviewed changes

DQM/src/DQM/DarkBremInteraction.cxx Show resolved Hide resolved

add rational for existence of energy function

1de89db

explain in doxygen comment why this function exists basically copying the rational written for the produce function as well.