From 0f39aefb351a9c36394396ad76d5940991355a7f Mon Sep 17 00:00:00 2001
From: Artur Gottmann <artur.gottmann@kit.edu>
Date: Fri, 24 Jan 2025 10:24:17 +0100
Subject: [PATCH] Extending sample manager description

---
 docs/sphinx_source/kingmaker.rst | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/docs/sphinx_source/kingmaker.rst b/docs/sphinx_source/kingmaker.rst
index 885b271a..9bc4d054 100644
--- a/docs/sphinx_source/kingmaker.rst
+++ b/docs/sphinx_source/kingmaker.rst
@@ -30,6 +30,34 @@ Samples can be managed manually or using the ``sample_manager``, which can be st
 
 This starts a CLI, which can be used to add more samples to the database, update samples or quickly generate a sample list for producing ntuples.
 
+Information on CMS datasets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To search for CMS datasets, we need first a bit of information, how these datasets can look like. We refer to the CMS dataset names as ``DAS nicks``, since we will search for the using the
+Data Aggregation Service (DAS) of CMS. The datasets can be searched for at https://cmsweb.cern.ch/das/, or alternatively via ``dasgoclient`` (https://github.com/dmwm/dasgoclient) in a CMSSW
+command-line environment. Our ``sample_manager`` integrates the corresponding software components and puts them into a questionnaire logic.
+
+The naming convention of CMS datasets is according to https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookLocatingDataSamples as follows:
+
+.. code-block:: bash
+
+    # Convention:
+    /PrimaryDataset/ProcessedDataset/DataTier
+    # Examples:
+    ## MC Simulation:
+    /DYJetsToLL_M-50_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL16NanoAODv9-106X_mcRun2_asymptotic_v17-v1/NANOAODSIM
+    ## Data:
+    /Tau/Run2016B-ver2_HIPM_UL2016_MiniAODv2-v1/MINIAOD
+    ## User-produced Dataset:
+    /Tau/aakhmets-data_2016ULpreVFP_tau_Tau_Run2016B-ver2_HIPM_1736940678-00000000000000000000000000000000/USER
+
+- ``PrimaryDataset`` usually represents the superset of data recorded by the experiment in case of Data, and the simulated process in case of MC simulation. In general, for User-produced Datasets this can be anything, however, users are responsible for having meaningful names.
+- ``ProcessedDataset`` provides details on the actual production or processing campaigns of the dataset, including conditions (so-called ``GlobalTag``), version, etc. Again, user Datasets can have there anything, but users are encouraged to have there something meaningful.
+- ``DataTier`` represents the dataformat of the dataset. A list of some more popular formats is given here: https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookDataFormats#EvenT. We are mostly interested in NANOAOD(SIM) and MINIAOD(SIM) tailored for analyses. The ``USER`` datatier represents anything that a user can produce.
+
+All centrally produced datasets from CMS are stored under the ``prod/global`` DAS instance, while there is a dedicated DAS instance for user datasets, ``prod/phys03``.
+See https://cmsweb.cern.ch/das/services for more details.
+
 Addition of new Samples
 ~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -43,6 +71,7 @@ When adding a new sample, follow the instructions of the ``sample_manager``. In
     Database loaded
     The database contains 581 samples, split over 4 era(s) and 22 sampletype(s)
     ? What do you want to do? Add a new sample
+    ? Select the DAS instance for the search prod/global
     ? Enter a DAS nick to add /DYJetsToLL_M-50_*/RunIISummer20UL16NanoAOD*v9-106X*/NANOAODSIM
     Multiple results found
     ? Which dataset do you want to add ? (Use arrow keys to move, <space> to select, <a> to toggle, <i> to invert)
@@ -293,4 +322,4 @@ The ``problematic_eras`` option is used to define eras, where only one file per
 .. warning::
     For friend trees, multiprocessing is not possible, since the resulting friend tree must have the same order as the input tree. Therefore, the ``htcondor_request_cpus`` option has to be set to 1, which will disable multiprocessing.
 
-For a more complete description of the different options, please refer to the overcomplete configuration in the law repository (https://github.com/riga/law/blob/master/law.cfg.example).
\ No newline at end of file
+For a more complete description of the different options, please refer to the overcomplete configuration in the law repository (https://github.com/riga/law/blob/master/law.cfg.example).