Skip to content

JeffersonLab/clas12-qadb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QADB

CLAS12 Quality Assurance Database

Provides storage of and access to the QA monitoring results for the CLAS12 experiment at Jefferson Lab

Table of Contents

  1. How to Use the QADB in Your Analysis
  2. QA Information
  3. How to Access the QADB
  4. How to Access the Faraday Cup Charge
  5. Database Maintenance
  6. QA Ground Rules
  7. Contributions

How to Use the QADB in Your Analysis

The QADB is used to filter data based on Quality Assurance (QA) observations. The database stores information about the "defects" of each run: each run is subdivided into "QA bins", and for each bin, a set of "defect bits" may or may not be assigned. See the table of available data sets for which data are included in the QADB.

The user must decide which defect bits should be filtered out of their analysis. See the table of defect bits and decide which bits to use in the filter.

Important

Special care must be taken for the Misc defect bit, which is assigned for runs (or part of runs) that have abnormal conditions, whether found on the timelines or documented in the log book:

  • Each QA bin that has the Misc defect bit set includes a comment in the QADB, explaining why the bit was set
  • The analyzer must decide whether or not data with the Misc defect bit should be excluded from their analysis
  • To help with this decision-making, use the qadb-info misc command, or use the Misc summary tables are found in each dataset's directory, which provide the comment(s) for each run

The QADB is available on ifarm as the qadb module:

module avail qadb
# then 'module load' the one you want

Alternatively, you may download and use this repository locally:

git clone --recurse-submodules https://github.com/JeffersonLab/clas12-qadb.git
source clas12-qadb/environ.sh  # or environ.csh, if using csh

QA Information

Information from qadb-info

The program qadb-info may be used to get information about the QADB, including:

  • available data sets
  • defect bits
  • FC charge, filtered by QA defects chosen by the user
  • query the QADB by run number, event number, and/or QA bin number

For usage guidance, just run:

qadb-info

Tip

If qadb-info is not found, either:

  • it's at ./bin/qadb-info, so type the full path to it
  • add bin/ to your $PATH, which you can do with
source environ.sh   # for bash, zsh
source environ.csh  # for csh, tcsh

Caution

Do not call qadb-info in an analysis event loop, since it will run too slowly. Instead, use the provided software or operate on the QADB files directly.

Available Data Sets

The following tables describe the available data sets in the latest version of the QADB. The columns are:

  • Pass: the Pass number of the data set (higher is newer)
  • Data Set Name: a unique name for the data-taking period; click it to see the corresponding QA timelines
    • Typically [RUN_GROUP]_[RUN_PERIOD]
    • [RUN_PERIOD] follows the convention [SEASON(sp/su/fa/wi)]_[YEAR], and sometimes includes an additional keyword
  • Run range: the run numbers in this data set
  • Status:
    • 🟢 Up-to-Date: this is the most recent Pass of these data, and the QADB has been updated for it
    • ⚠️ Deprecated: a newer Pass exists for these data, but the QADB for this version is still preserved
    • TO DO: the Pass for these data exist, but the QADB has not yet been updated for it
  • Notes:
    • Data: the input data used for the QA; this is the top level directory, where trains (skim files) and full DSTs are stored
    • Analyzed Files: the specific files (e.g. train) used for the QA
    • Issues: links to any known issues with this QADB

Caution

The QADB for older data sets may have some issues, and may even violate the QA ground rules. It is HIGHLY recommended to also check the known important issues to see if any issues impact your analysis.

Run Group A

Pass Data Set Name and Timelines Link Run Range Status Notes
2 rga_fa18_inbending 5032 - 5419 🟢 Data /cache/clas12/rg-a/production/recon/fall2018/torus-1/pass2/main
Analyzed Files nSidis train
Issues None
2 rga_fa18_outbending 5422 - 5666 🟢 Data /cache/clas12/rg-a/production/recon/fall2018/torus+1/pass2
Analyzed Files nSidis train
Issues None
2 rga_sp19 6616 - 6783 🟢 Data /cache/clas12/rg-a/production/recon/spring2019/torus-1/pass2/dst
Analyzed Files nSidis train
Issues None
1 rga_sp18 Data
Analyzed Files
Issues
1 rga_fa18_inbending 5032 - 5419 ⚠️ Data /cache/clas12/rg-a/production/recon/fall2018/torus-1/pass1
Analyzed Files full DST files
Issues ‼️ #9, #48, #12
1 rga_fa18_outbending 5422 - 5666 ⚠️ Data /cache/clas12/rg-a/production/recon/fall2018/torus+1/pass1
Analyzed Files full DST files
Issues ‼️ #9, #48
1 rga_sp19 6616 - 6783 ⚠️ Data /cache/clas12/rg-a/production/recon/spring2019/torus-1/pass1
Analyzed Files full DST files
Issues ‼️ #9, #48

Run Group B

Pass Data Set Name and Timelines Link Run Range Status Notes
2 rgb_sp19 6156 - 6603 🟢 Data /cache/clas12/rg-b/production/recon/spring2019/torus-1/pass2/v0/dst
Analyzed Files sidisdvcs train
Issues None
2 rgb_fa19 11093 - 11300 🟢 Data /cache/clas12/rg-b/production/recon/fall2019/torus{+,-}1/pass2/v1/dst
Analyzed Files sidisdvcs train
Issues None
2 rgb_wi20 11323 - 11571 🟢 Data /cache/clas12/rg-b/production/recon/spring2020/torus-1/pass2/v1/dst
Analyzed Files sidisdvcs train
Issues None
1 rgb_sp19 6156 - 6603 ⚠️ Data /cache/clas12/rg-b/production/recon/spring2019/torus-1/pass1/v0/dst
Analyzed Files full DST files
Issues ‼️ #9, #48
1 rgb_fa19 11093 - 11300 ⚠️ Data /cache/clas12/rg-b/production/recon/fall2019/torus{+,-}1/pass1/v1/dst
Analyzed Files full DST files
Issues ‼️ #9, #48
1 rgb_wi20 11323 - 11571 ⚠️ Data /cache/clas12/rg-b/production/recon/spring2020/torus-1/pass1/v1/dst
Analyzed Files full DST files
Issues ‼️ #9, #48

Run Group C

Pass Data Set Name and Timelines Link Run Range Status Notes
1 rgc_su22 16042 - 16786 🟢 Data /cache/clas12/rg-c/production/summer22/pass1
Analyzed Files sidisdvcs train
Issues None
1 rgc_fa22 16843 - 17408 Data /cache/clas12/rg-c/production/fall22/pass1
Analyzed Files sidisdvcs train
Issues
1 rgc_sp23 17482 - 17811 Data /cache/clas12/rg-c/production/spring23/pass1
Analyzed Files sidisdvcs train
Issues

Run Group F

Pass Data Set Name and Timelines Link Run Range Status Notes
1 rgf_sp20_torusM1 12210 - 12329 Data /cache/clas12/rg-f/production/recon/spring2020/torus-1_solenoid-0.8/pass1v0/dst/recon
1 rgf_su20_torusPh 12389 - 12434 Data /cache/clas12/rg-f/production/recon/summer2020/torus+0.5_solenoid-0.745/pass1v0/dst/recon
1 rgf_su20_torusMh 12436 - 12443 Data /cache/clas12/rg-f/production/recon/summer2020/torus-0.5_solenoid-0.745/pass1v0/dst/recon
1 rgf_su20_torusM1 12447 - 12951 Data /cache/clas12/rg-f/production/recon/summer2020/torus-1_solenoid-0.745/pass1v0/dst/recon

Run Group K

Pass Data Set Name and Timelines Link Run Range Status Notes
2 rgk_fa18_7.5GeV 5674 - 5870 Data /cache/clas12/rg-k/production/recon/fall2018/torus+1/6535MeV/pass2/v0/dst
Analyzed Files
Issues
2 rgk_fa18_6.5GeV 5875 - 6000 Data /cache/clas12/rg-k/production/recon/fall2018/torus+1/7546MeV/pass2/v0/dst
Analyzed Files
Issues
1 rgk_fa18_7.5GeV 5674 - 5870 🟢 Data /cache/clas12/rg-k/production/recon/fall2018/torus+1/7546MeV/pass1/v0/dst/recon
Analyzed Files full DST files
Issues ‼️ #9, #48
1 rgk_fa18_6.5GeV 5875 - 6000 🟢 Data /cache/clas12/rg-k/production/recon/fall2018/torus+1/6535MeV/pass1/v0/dst/recon
Analyzed Files full DST files
Issues ‼️ #9, #48

Run Group M

Pass Data Set Name and Timelines Link Run Range Status Notes
1 rgm_fa21 15019 - 15884 🟢 Data /cache/clas12/rg-m/production/pass1/allData_forTimelines/
Analyzed Files full DST files
Issues ‼️ #9, #48

Defect Bit Definitions

  • QA information is stored for each QA bin, in the form of defect bits
    • the user needs only the run number and event number to query the QADB
  • A QA bin is:
    • the set of events between a fixed number of scaler readouts (roughly a time bin, although there are fluctuations in a bin's duration)
    • for older QADBs, Run Groups A, B, K, and M of Pass 1 data, the QA bins were DST 5-files
  • A defect bit is:
    • a bit (of a binary number) that is 1 if the QA bin exhibits the corresponding defect or 0 if not
    • each defect bit corresponds to a different defect, as shown in the table below
    • many defects check the value of N/F, defined as the trigger electron yield N, normalized by the DAQ-gated Faraday Cup charge F

Table of Defect Bits

Bit Name Description Additional Notes
0 TotalOutlier Outlier FD electron N/F, but not TerminalOutlier or MarginalOutlier
1 TerminalOutlier Outlier FD electron N/F of first or last QA bin of run
2 MarginalOutlier Marginal FD electron outlier N/F, within one standard deviation of cut line
3 SectorLoss1 FD electron N/F is an outlier and is diminished for several consecutive QA bins For older datasets (RG-A,B,K,M pass 1), this bit replaced the assignment of TotalOutlier, TerminalOutlier, and MarginalOutlier; newer datasets only add the SectorLoss bit and do not remove the outlier bits.
4 LowLiveTime Live time < 0.9 This assignment of this bit may be correlated with a low fraction of events with a defined (nonzero) helicity.
5 Misc Miscellaneous defect, documented as comment This bit is often assigned to all QA bins within a run, but in some cases, may only be assigned to the relevant QA bins. The analyzer must decide whether data assigned with the Misc bit should be excluded from their analysis; the comment is provided for this purpose. Analyzers are also encouraged to check the Hall B log book for further details. Note that special runs, such as empty target or low luminosity runs, also typically have this bit set; for such runs, the other defect bits may be meaningless, namely the outlier bits.
6 TotalOutlierFT Outlier FT electron N/F, but not TerminalOutlierFT or MarginalOutlierFT cf. TotalOutlier.
7 TerminalOutlierFT Outlier FT electron N/F of first or last QA bin of run cf. TerminalOutlier.
8 MarginalOutlierFT Marginal FT electron outlier N/F, within one standard deviation of cut line cf. MarginalOutlier.
9 LossFT1 FT electron N/F is an outlier and is diminished for several consecutive QA bins cf. SectorLoss.
10 BSAWrong Beam Spin Asymmetry is the wrong sign This bit is assigned per run. The asymmetry is significant, but the sign is opposite than expected; analyzers must therefore flip the helicity sign.
11 BSAUnknown Beam Spin Asymmetry is unknown, likely because of low statistics This bit is assigned per run. There are not enough data to determine if the helicity sign is correct for this run.
12 TSAWrong Target Spin Asymmetry is the wrong sign Not yet used.
13 TSAUnknown Target Spin Asymmetry is unknown, likely because of low statistics Not yet used.
14 DSAWrong Double Spin Asymmetry is the wrong sign Not yet used.
15 DSAUnknown Double Spin Asymmetry is unknown, likely because of low statistics Not yet used.
16 ChargeHigh FC Charge is abnormally high NOTE: the assignment criteria of this bit are still under study.
17 ChargeNegative FC Charge is negative The FC charge is calculated from the charge readout at QA bin boundaries. Normally the later charge readout is higher than the earlier; this bit is assigned when the opposite happens.
18 ChargeUnknown FC Charge is unknown; the first and last time bins always have this defect QA bin boundaries are at scaler charge readouts. The first QA bin, before any readout, has no initial charge; the last QA bin, after all scaler readouts, has no final charge. Therefore, the first and last QA bins have an unknown, but likely very small charge accumulation.
19 PossiblyNoBeam Both N and F are low, indicating the beam was possibly off NOTE: the assignment criteria of this bit are still under study.
  1. this bit may not be reliably defined in later datasets; use the other outlier bits instead

How to Access the QADB

You may access the QADB in many ways:

Text Access

  • human-readable tables are stored in qadb/*/qaTree.json.table; see the section QA data storage, Table files below for details for how to read these files
  • QADB JSON files are stored in qadb/*/qaTree.json

Software Access

Classes in both C++ and Groovy are provided, for access to the QADB within analysis code. In either case, you need environment variables; if you are using an ifarm build, they have already been set for you, otherwise:

source environ.sh   # for bash, zsh
source environ.csh  # for csh, tcsh

Then:

Important

C++ access needs rapidjson, provided as a submodule of this repository in srcC/rapidjson. If this directory is empty, you can clone the submodule by running

git submodule update --init --recursive

Example Code

The following C++ code demonstrates general QADB usage. The usage is very similar in Groovy.

Before Processing Events: Setup the QADB criteria

// instantiate QADB
QADB qa("latest"); // use "latest" for the latest cook, "pass1" for pass 1, etc.

// decide which defects you want to check for; an event will not pass the QA
// cut if the associated QA bin has any of the specified defects
qa.CheckForDefect("TotalOutlier");
qa.CheckForDefect("TerminalOutlier");
qa.CheckForDefect("MarginalOutlier");
qa.CheckForDefect("SectorLoss");
qa.CheckForDefect("Misc");

// decide which runs for which you care about the 'Misc' defect bit or not
std::vector<int> allow_these_misc_assignments = {
  5875,    // N/F low, gradually decreasing with file number
  // 5877,    // N/F is high for the whole run
  // 5878,    // N/F is high for the whole run
  5884,    // Ended run: mvt1/mvt2 crashed.
  5885,    // slightly low value of N/F
};
/* TIP: you can generate this list and comments using `qadb-info`,
   e.g., for RG-K datasets:

   # get the list of RG-K datasets
   >>> qadb-info print --list --run-group k --latest

       rgk_fa18_6.5GeV  ->  refers to pass1/rgk_fa18_6.5GeV
       rgk_fa18_7.5GeV  ->  refers to pass1/rgk_fa18_7.5GeV

   # get the list of RG-K runs with 'Misc' defect bit, with QADB comments
   >>> qadb-info misc --datasets rgk_fa18_6.5GeV,rgk_fa18_7.5GeV --code '//'

       misc_qa_runs = [
         5875,    // N/F low, gradually decreasing with file number
         5877,    // N/F is high for the whole run
         ...
       ]
 */

// tell `qa` to allow these data if ONLY the 'Misc' defect bit is assigned
for(auto run : allow_these_misc_assignments)
  qa.AllowMiscBit(run);

For Each Event: Check if the event's QADB bin passes your criteria

// get event-level info
auto runnum   = /* get the run number */
auto evnum    = /* get the event number */
auto helicity = /* get the beam helicity */

// correct the helicity sign
helicity *= qa.CorrectHelicitySign(runnum, evnum);

// apply QA cuts
if(qa.Pass(runnum, evnum)) {

  // accumulate FC charge (it will only accumulate once per QA bin you analyzed)
  qa.AccumulateCharge();

  /* continue your analysis here */
}

After Processing Events

// the total FC charge, filtered by the QA
auto total_charge = qa.GetAccumulatedCharge();

Caution

The above example code is not tested, and might be broken! You may need to refer to the other examples in srcC/ and src/.

QADB Files and Tables

The QADB files are organized by dataset: one subdirectory of qadb/ per dataset. Each directory contains:

  • Summary tables regarding the Misc defect bit assignment are stored in miscTable.md; use these to help decide which runs' Misc bits you want to omit from your analysis
  • A human-readable table of the full QADB is stored in qaTree.json.table, a "Table File"; see below for how to interpret this file
  • The QADB itself is stored in json files, meant for programmatic access

The dataset directories are organized by cook number (pass):

  • within qadb/, the pass*/ directories are for each cook (pass1, pass2, etc.)
    • within each pass*/ directory are subdirectories for each dataset
  • the latest/ directory contains symbolic links to the latest cook of each data set with a QADB

Table Files

Human-readable format of QA result, stored in qaTree.json.table

  • each run begins with the keyword RUN:; lines below are for each of that run's QA bins and their QA results, with the following syntax:
    • run_number bin_number defect_bits :: comment
      • defect bits have the following form: bit_number-defect_name[list_of_sectors], and [all] means that all 6 sectors have this defect
      • comments are usually associated with Misc defects, but not always

JSON files

qaTree.json

  • The QADB itself is stored as JSON files in qaTree.json
  • the format is a tree:
qaTree.json ─┬─ run number 1
             ├─ run number 2 ─┬─ bin number 1
             │                ├─ bin number 2
             │                ├─ bin number 3 ─┬─ evnumMin
             │                │                ├─ evnumMax
             │                │                ├─ sectorDefects
             │                │                ├─ defect
             │                │                └─ comment
             │                ├─ bin number 4
             │                └─ bin number 5
             ├─ run number 3
             └─ run number 4
  • for each bin, the following variables are defined:
    • evnumMin and evnumMax represent the range of event numbers associated with this bin; use this to map a particular event number to a bin number
    • sectorDefects is a map with sector number keys paired with lists of associated defect bits
    • defect is a decimal representation of the OR of each sector's defect bits, for example, 11=0b1011 means that the OR of the defect bit lists is [0,1,3]
    • comment stores an optional comment regarding the QA result

chargeTree.json

  • the charge is also stored in JSON files in chargeTree.json, with a similar format:
chargeTree.json ─┬─ run number 1
                 ├─ run number 2 ─┬─ bin number 1
                 │                ├─ bin number 2
                 │                ├─ bin number 3 ─┬─ fcChargeMin
                 │                │                ├─ fcChargeMax
                 │                │                ├─ ufcChargeMin
                 │                │                ├─ ufcChargeMax
                 │                │                └─ nElec ─┬─ sector 1
                 │                │                          ├─ sector 2
                 │                │                          ├─ sector 3
                 │                │                          ├─ sector 4
                 │                │                          ├─ sector 5
                 │                │                          └─ sector 6
                 │                ├─ bin number 4
                 │                └─ bin number 5
                 ├─ run number 3
                 └─ run number 4
  • for each bin, the following variables are defined:
    • fcChargeMin and fcChargeMax represent the minimum and maximum DAQ-gated Faraday cup charge, in nC
    • ufcChargeMin and ufcChargeMax represent the minimum and maximum FC charge, but not gated by the DAQ
    • the difference between the maximum and minimum charge is the accumulated charge in that bin
    • nElec lists the number of electrons from each sector

How to Access the Faraday Cup Charge

The charge is stored in the QADB for each QA bin, so that it is possible to determine the amount of accumulated charge for data that satisfy your specified QA criteria. To calculate the charge, you'll need to add up the charge from each bin that you include in your analysis. To help, you can either:

  • use the command qadb-info charge; use its options to specify:
    • the dataset and/or list of runs
    • which defect bits that you want to allow or reject
    • of the runs which only have the Misc bit, choose those that you want to allow or reject
    • the output format
  • use the software: see chargeSum.groovy or chargeSum.cpp for usage example in an analysis event loop; basically:
    • call QADB::AccumulateCharge() within your event loop, after your QA cuts are satisfied; the QADB instance will keep track of the accumulated charge you analyzed (accumulation performed per QA bin)
    • at the end of your event loop, the total accumulated charge you analyzed is given by QADB::GetAccumulatedCharge()

Caution

For Pass 1 QA results for Run Groups A, B, K, and M, we find some evidence that the charge from bin to bin may slightly overlap, or there may be gaps in the accumulated charge between each bin; the former leads to a slight over-counting and the latter leads to a slight under-counting

  • this issue is why we transitioned from using DST files as QA bins to using nth scaler readouts as bin boundaries
  • corrections of this issue to these older QADBs will not be applied

QADB Maintenance

Documentation for QADB maintenance and revision

Adding to or revising the QADB

  • the QADB files are produced by clas12-timeline
  • if you have produced QA results for a new data set, and would like to add them to the QADB, or if you would like to update results for an existing dataset, follow the following procedure:
    • mkdir qadb/pass${pass}/${dataset}/, then copy the final qaTree.json and chargeTree.json to that directory
    • add/update a symlink to this dataset in qadb/latest, if this is a new Pass
    • run util/makeTables.sh a pre-commit hook will take care of this
    • update customized QA criteria sets, such as OkForAsymmetry this function is no longer maintained
    • update the above table of data sets
    • submit a pull request

Adding new defect bits

  • defect bits must be added in the following places:
    • Groovy:
      • src/clasqa/Tools.groovy (copy from clasqa repository version)
      • src/clasqa/QADB.groovy
      • src/examples/dumpQADB.groovy (optional)
    • C++:
      • srcC/include/QADB.h
      • srcC/examples/dumpQADB.cpp (optional)
    • Documentation:
      • qadb/defect_definitions.json, then use util/makeDefectMarkdown.rb to generate Markdown table for README.md

QA Ground Rules

Important

The following rules are enforced for the QA procedure and the resulting QADB:

  1. The QA procedure runs on the data as they are and does not fix any of their problems.
  2. The QADB only provides defect identification and does not provide analysis-specific decisions.
  3. At least two people independently perform the "manual QA" part of the QA procedure, and the results are cross checked and merged.

Contributions

All contributions are welcome, whether to the code, examples, documentation, or the QADB itself. You are welcome to open an issue and/or a pull request. If the maintainer(s) do not respond in a reasonable time, send them an email.

About

CLAS12 Quality Assurance Database (QADB)

Resources

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •