Docs

taehyounpark · taehyounpark · commit db3d879ea392 · 2024-04-09T00:21:30.000-04:00
diff --git a/docs/examples/xaod.md b/docs/examples/xaod.md
@@ -1,7 +1,3 @@
-@section example-hep More examples
-
-[HepQuery](https://github.com/taehyounpark/queryosity-hep) provides the extensions for ROOT TTree datasets and ROOT `TH1`-based outputs.
-
 # `xAOD` analysis
 
 1. Apply the MC event weight.
diff --git a/docs/guide/columns.md b/docs/guide/columns.md
@@ -1,24 +1,42 @@
 # Computing quantities
 
-New columns can be computed out of existing ones by calling `queryosity::dataflow::define()` with the appropriate argument, or operators between the underlying value types.
+::::{tab-set}
 
-:::{admonition} Template
-```{code} cpp
-auto cnst = df.define(column::constant<DTYPE>(VALUE));
-auto eqn  = df.define(column::expression(EXPRESSION))(COLUMNS...);
-auto defn = df.define(column::definition<DEFINITION>(ARGUMENTS...))(COLUMNS...);
+:::{tab-item} Constant
+:::{card} Template
+```cpp
+auto cnst = df.define(dataset::constant(VAL));
+```
+:::
+
+:::{tab-item} Expression
+:::{card} Template
+```cpp
+auto eqn = df.define(column::expression(FUNC))(COLS...);
 ```
 :::
 
+:::{tab-item} Definition
+:::{card} Template
+```cpp
+auto defn = df.define(column::definition<DEF>(ARGS...))(COLS...);
+```
+:::
+
+::::
+
 :::{admonition} Requirements on column value type
 :class: important
 A computed column **MUST** output a value of a type `T` that is:
 - {{DefaultConstructible}}.
 - {{CopyAssignable}} or {{MoveAssignable}}.
 :::
 
+(computing-columns-operators)=
 ## Basic operations
 
+Binary and unary operators on the underlying value types are supported.
+
 ```cpp
 // constants columns do not change per-entry
 auto zero = df.define(column::constant(0));
@@ -27,8 +45,7 @@ auto two = df.define(column::constant(2));
 
 // binary/unary operators
 auto three = one + two;
-auto v_0 = v[zero];
-// reminder: actions are *lazy*, i.e. no undefined behaviour (yet)
+auto v_0 = v[zero]; // no undefined behaviour (yet), even if v.size()==0.
 
 // can be re-assigned as long as value type remains unchanged
 two = three - one;
diff --git a/docs/guide/dataflow.md b/docs/guide/dataflow.md
@@ -28,7 +28,8 @@ The dataflow accepts (up to three) optional keyword arguments options to configu
 | `dataset::weight(scale)` | Apply a global `scale` to all weights. | `1.0` |
 | `dataset::head(nrows)` | Process the first `nrows` of the dataset. | `-1` (all entries) |
 
-:::{example}
+:::{admonition} Example
+:class: note
 ```cpp
 dataflow df(multithread::enable(10), dataset::weight(1.234), dataset::head(100));
 ```
diff --git a/docs/guide/datasets.md b/docs/guide/datasets.md
@@ -4,17 +4,12 @@ A dataflow needs at least one input dataset with rows to loop over.
 Presumably, the dataset also has columns containing some data to analyze for each entry.
 Arbitrary dataset formats and column data can be supported by implementing their respective ABCs.
 
-```{admonition} Template
-:class: note
+:::{card} Template
 ```{code} cpp
-auto ds = df.load(dataset::input<FORMAT>(ARGUMENTS...));
+auto ds = df.load(dataset::input<DS>(ARGS...));
 auto col = ds.read(dataset::column<DTYPE>(NAME));
 ```
-
-```{seealso}
-- [`dataset::source`](#dataset-source) and [`dataset::reader`](#dataset-reader)
-- [`column::reader`](#column-reader)
-```
+:::
 
 ## Loading-in a dataset
 
@@ -62,11 +57,10 @@ A dataflow can load multiple datasets of different input formats into one datafl
 
 :::{card} 
 :text-align: center
-<!-- :::{topic} JSON and CSV side-by-side. -->
+JSON and CSV side-by-side.
+^^^
 ```{image} ../images/json_csv.png
 ```
-+++
-JSON and CSV side-by-side.
 :::
 
 ```{code} cpp
@@ -76,15 +70,20 @@ using csv = qty::csv;
 auto y = df.read(dataset::input<csv>(data_csv), dataset::column<double>("y"));
 
 // x from json, y from csv
-auto z = x + y;
+auto z = x + y; // see next section
 ```
 
-```{admonition} Dataset partition requirements
+:::{admonition} Dataset partition requirements
 :class: important
 When multiple datasets are loaded into a dataflow, the `queryosity::dataset::source::partition()` implementation of each dataset **MUST** collectively satisfy:
 - All non-empty partitions **MUST** have the same total number of entries.
   - If the sub-range boundaries are not aligned with one another, then a common denominator partition with only sub-range boundaries present across all partitions is determined and used in parallelizing the dataflow.
 - A dataset can report an empty partition to relinquish the control of the entry loop to the other dataset(s) in the dataflow.
   - Thus, there **MUST** be at least one dataset that reports a non-empty partition.
   - The dataset with an empty partition, as well as its columns, **MUST** remain in a valid state for traversing over any entry numbers as dictated by the other dataset(s).
-```
+:::
+
+:::{seealso}
+- [`dataset::source`](#dataset-source) and [`dataset::reader`](#dataset-reader)
+- [`column::reader`](#column-reader)
+:::
diff --git a/docs/guide/queries.md b/docs/guide/queries.md
@@ -7,12 +7,11 @@ There are a total of three steps in fully specifying a query:
 2. Input column(s) with which it is filled with.
 3. Associated selection(s) at which it is performed.
 
-```{admonition} Template
-:class: note
+:::{card} Template
 ```{code} cpp
-auto q = df.get(query::output<DEFINITION>(ARGUMENTS...))
-             .fill(COLUMNS...)
-             .at(SELECTIONS...);
+auto q = df.get(query::output<DEF>(ARGS...))
+             .fill(COLS...)
+             .at(SELS...);
 ```
 :::
 
diff --git a/docs/guide/selections.md b/docs/guide/selections.md
@@ -1,59 +1,105 @@
 {#applying-selections}
 # Applying selections
 
-## Initiate a cutflow
+::::{tab-set}
 
-Call queryosity::dataflow::filter() or queryosity::dataflow::weight() to initiate a selection in the cutflow
+:::{tab-item} Existing column
+:::{card} Template
+```cpp
+auto cut = df.filter(COL);
+auto wgt = df.weight(COL);
+```
+:::
 
+:::{tab-item} Constant
+:::{card} Template
 ```cpp
-auto [w, cat] = ds.read(dataset::column<double>("weight"),
-                        dataset::column<std::string>("category"));
-auto a = df.define(column::constant<std::string>("a"));
-auto b = df.define(column::constant<std::string>("b"));
-auto c = df.define(column::constant<std::string>("c"));
+auto cut = df.filter(dataset::constant(VAL));
+auto wgt = df.weight(dataset::constant(VAL));
+```
+:::
+
+:::{tab-item} Expression
+:::{card} Template
+```cpp
+auto cut = df.filter(column::expression(FUNC))(COLS...);
+auto wgt = df.weight(column::expression(FUNC))(COLS...);
+```
+:::
+
+:::{tab-item} Definition
+:::{card} Template
+```cpp
+auto cut = df.filter(column::definition<DEF>(ARGS...))(COLS...);
+auto wgt = df.weight(column::definition<DEF>(ARGS...))(COLS...);
+```
+:::
+
+::::
+
 
-// initiate a cutflow
-auto weighted = df.weight(w);
+## Initiating a cutflow
+Call `dataflow::filter()` or `dataflow::weight()` to initiate a selection in the cutflow.
+```cpp
+auto all = df.filter(column::constant(true));
 ```
 
+
 ## Compounding selections
 
-Subsequently-compounded selections from existing ones can be applied by chained `filter()`/`weight()` calls. 
+Selections can be compounded onto existing ones regardless of their cut/weight specification:
+a cut simply passes through the weight decision of its previous selection (if one exists), and vice versa.
+
 ```cpp
-// cuts and weights can be compounded in any order.
-auto cut =
-    weighted.filter(column::expression([](double w) { return (w >= 0;); }))(w);
+auto w = ds.read(dataset::column<double>("weight"));
+
+auto sel = all.weight(w).filter(
+    column::expression([](double w) { return (w >= 0;); }))(w);
+// cut    = (true) && (true) && (w>=0);
+// weight = (1.0)   *    (w)  *  (1.0);
 ```
 
 ## Branching selections
 
+Applying multiple selections from a common node creates a branching in the cutflow.
+
 ```cpp
-// applying more than one selection from a node creates a branching point.
-auto cut_a = cut.filter(cat == a);
-auto cut_b = cut.filter(cat == b);
-auto cut_c = cut.filter(cat == c);
+auto cat = ds.read(dataset::column<std::string>("category"));
+auto a = df.define(column::constant<std::string>("a"));
+auto b = df.define(column::constant<std::string>("b"));
+auto c = df.define(column::constant<std::string>("c"));
+
+auto sel_a = sel.filter(cat == a);
+auto sel_b = sel.filter(cat == b);
+auto sel_c = sel.filter(cat == c);
 ```
 
-## Merging selections
+## Joining selections
+
+Any set of selections can be merged back together by `&&`/`||`/`*`-ing them.
 
 ```cpp
-// selections can be merged based on their decision values.
-auto cut_a_and_b = df.filter(cut_a && cut_b);
-auto cut_b_or_c = df.filter(cut_b || cut_c);
+// why weight(w)? see below
+auto sel_a_and_b = df.filter(sel_a && sel_b).weight(w);
+auto sel_b_or_c  = df.weight(w).filter(sel_b || sel_c);
 ```
+
+:::{important}
+The mechanism for joining selections is simply that of [Basic operations](#computing-columns-operators) between columns.
+Therefore, a joined cut/weight constitutes the first selection in a new cutflow, while its complementary decision, is discarded.
+These can (and should) be re-applied at any point in the new cutflow.
+:::
+
 ## Yield at a selection
 
 ```cpp
 // single selection
-auto all = df.filter(column::constant(true));
-auto yield_tot = def.get(selection::yield(all));
-unsigned long long yield_tot_entries =
-    yield_tot.entries;                    // number of entries passed
+auto yield_tot = df.get(selection::yield(all));
+unsigned long long yield_tot_entries = yield_tot.entries; // number of entries
 double yield_tot_value = yield_tot.value; // sum(weights)
 double yield_tot_error = yield_tot.error; // sqrt(sum(weights squared))
 
 // multiple selections 
-// (sel_a/b/c: (varied) lazy selections)
 auto [yield_a, yield_b, yield_c] =
     df.get(selection::yield(sel_a, sel_b, sel_c));
 ```
diff --git a/docs/index.md b/docs/index.md
@@ -1,3 +1,5 @@
+# Welcome to Queryosity
+
 ![Version](https://img.shields.io/badge/Version-0.4.1-blue.svg)
 ![C++ Standard](https://img.shields.io/badge/C++-17-blue.svg)
 [![Ubuntu](https://github.com/taehyounpark/analogical/actions/workflows/ubuntu.yml/badge.svg?branch=master)](https://github.com/taehyounpark/analogical/actions/workflows/ubuntu.yml)
@@ -21,4 +23,4 @@ start/index
 guide/index
 examples/index
 references/index
-```
+```
diff --git a/docs/start/conceptual.md b/docs/start/conceptual.md
@@ -72,6 +72,7 @@ Selection
     - A series of two or more cuts becomes their intersection, `and`
   - A floating-point `weight` to assign a statistical significance to the entry.
     - A series of two or more weights becomes to their product, `*`.
+  - A cut is referred to as being *complementary* to weight and vice versa.
 
 ***
 
diff --git a/include/queryosity/todo_varied.h b/include/queryosity/todo_varied.h
@@ -74,13 +74,6 @@ class todo<Helper>::varied : public dataflow::node,
       -> std::array<typename lazy<query::booked_t<V>>::varied,
                     sizeof...(Nodes)>;
 
-  /**
-   * @brief Shortcut for `evaluate()`/`apply()`/`fill()` for
-   * columns/selections/queries.
-   * @tparam Cols (Varied) Input column types.
-   * @param[in] cols... Input columns.
-   * @return Lazy column definition
-   */
   template <typename... Cols>
   auto operator()(Cols &&...cols) ->
       typename decltype(std::declval<todo<Helper>>().operator()(