From 0f39aefb351a9c36394396ad76d5940991355a7f Mon Sep 17 00:00:00 2001 From: Artur Gottmann Date: Fri, 24 Jan 2025 10:24:17 +0100 Subject: [PATCH] Extending sample manager description --- docs/sphinx_source/kingmaker.rst | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/docs/sphinx_source/kingmaker.rst b/docs/sphinx_source/kingmaker.rst index 885b271a..9bc4d054 100644 --- a/docs/sphinx_source/kingmaker.rst +++ b/docs/sphinx_source/kingmaker.rst @@ -30,6 +30,34 @@ Samples can be managed manually or using the ``sample_manager``, which can be st This starts a CLI, which can be used to add more samples to the database, update samples or quickly generate a sample list for producing ntuples. +Information on CMS datasets +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To search for CMS datasets, we need first a bit of information, how these datasets can look like. We refer to the CMS dataset names as ``DAS nicks``, since we will search for the using the +Data Aggregation Service (DAS) of CMS. The datasets can be searched for at https://cmsweb.cern.ch/das/, or alternatively via ``dasgoclient`` (https://github.com/dmwm/dasgoclient) in a CMSSW +command-line environment. Our ``sample_manager`` integrates the corresponding software components and puts them into a questionnaire logic. + +The naming convention of CMS datasets is according to https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookLocatingDataSamples as follows: + +.. code-block:: bash + + # Convention: + /PrimaryDataset/ProcessedDataset/DataTier + # Examples: + ## MC Simulation: + /DYJetsToLL_M-50_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v1/NANOAODSIM + ## Data: + /Tau/Run2016B-ver2_HIPM_UL2016_MiniAODv2-v1/MINIAOD + ## User-produced Dataset: + /Tau/aakhmets-data_2016ULpreVFP_tau_Tau_Run2016B-ver2_HIPM_1736940678-00000000000000000000000000000000/USER + +- ``PrimaryDataset`` usually represents the superset of data recorded by the experiment in case of Data, and the simulated process in case of MC simulation. In general, for User-produced Datasets this can be anything, however, users are responsible for having meaningful names. +- ``ProcessedDataset`` provides details on the actual production or processing campaigns of the dataset, including conditions (so-called ``GlobalTag``), version, etc. Again, user Datasets can have there anything, but users are encouraged to have there something meaningful. +- ``DataTier`` represents the dataformat of the dataset. A list of some more popular formats is given here: https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookDataFormats#EvenT. We are mostly interested in NANOAOD(SIM) and MINIAOD(SIM) tailored for analyses. The ``USER`` datatier represents anything that a user can produce. + +All centrally produced datasets from CMS are stored under the ``prod/global`` DAS instance, while there is a dedicated DAS instance for user datasets, ``prod/phys03``. +See https://cmsweb.cern.ch/das/services for more details. + Addition of new Samples ~~~~~~~~~~~~~~~~~~~~~~~ @@ -43,6 +71,7 @@ When adding a new sample, follow the instructions of the ``sample_manager``. In Database loaded The database contains 581 samples, split over 4 era(s) and 22 sampletype(s) ? What do you want to do? Add a new sample + ? Select the DAS instance for the search prod/global ? Enter a DAS nick to add /DYJetsToLL_M-50_*/RunIISummer20UL16NanoAOD*v9-106X*/NANOAODSIM Multiple results found ? Which dataset do you want to add ? (Use arrow keys to move, to select, to toggle, to invert) @@ -293,4 +322,4 @@ The ``problematic_eras`` option is used to define eras, where only one file per .. warning:: For friend trees, multiprocessing is not possible, since the resulting friend tree must have the same order as the input tree. Therefore, the ``htcondor_request_cpus`` option has to be set to 1, which will disable multiprocessing. -For a more complete description of the different options, please refer to the overcomplete configuration in the law repository (https://github.com/riga/law/blob/master/law.cfg.example). \ No newline at end of file +For a more complete description of the different options, please refer to the overcomplete configuration in the law repository (https://github.com/riga/law/blob/master/law.cfg.example).