Skip to content

Commit db3d879

Browse files
committed
Docs
1 parent 66584ab commit db3d879

File tree

9 files changed

+120
-66
lines changed

9 files changed

+120
-66
lines changed

docs/examples/xaod.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,3 @@
1-
@section example-hep More examples
2-
3-
[HepQuery](https://github.com/taehyounpark/queryosity-hep) provides the extensions for ROOT TTree datasets and ROOT `TH1`-based outputs.
4-
51
# `xAOD` analysis
62

73
1. Apply the MC event weight.

docs/guide/columns.md

Lines changed: 25 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,42 @@
11
# Computing quantities
22

3-
New columns can be computed out of existing ones by calling `queryosity::dataflow::define()` with the appropriate argument, or operators between the underlying value types.
3+
::::{tab-set}
44

5-
:::{admonition} Template
6-
```{code} cpp
7-
auto cnst = df.define(column::constant<DTYPE>(VALUE));
8-
auto eqn = df.define(column::expression(EXPRESSION))(COLUMNS...);
9-
auto defn = df.define(column::definition<DEFINITION>(ARGUMENTS...))(COLUMNS...);
5+
:::{tab-item} Constant
6+
:::{card} Template
7+
```cpp
8+
auto cnst = df.define(dataset::constant(VAL));
9+
```
10+
:::
11+
12+
:::{tab-item} Expression
13+
:::{card} Template
14+
```cpp
15+
auto eqn = df.define(column::expression(FUNC))(COLS...);
1016
```
1117
:::
1218

19+
:::{tab-item} Definition
20+
:::{card} Template
21+
```cpp
22+
auto defn = df.define(column::definition<DEF>(ARGS...))(COLS...);
23+
```
24+
:::
25+
26+
::::
27+
1328
:::{admonition} Requirements on column value type
1429
:class: important
1530
A computed column **MUST** output a value of a type `T` that is:
1631
- {{DefaultConstructible}}.
1732
- {{CopyAssignable}} or {{MoveAssignable}}.
1833
:::
1934

35+
(computing-columns-operators)=
2036
## Basic operations
2137

38+
Binary and unary operators on the underlying value types are supported.
39+
2240
```cpp
2341
// constants columns do not change per-entry
2442
auto zero = df.define(column::constant(0));
@@ -27,8 +45,7 @@ auto two = df.define(column::constant(2));
2745

2846
// binary/unary operators
2947
auto three = one + two;
30-
auto v_0 = v[zero];
31-
// reminder: actions are *lazy*, i.e. no undefined behaviour (yet)
48+
auto v_0 = v[zero]; // no undefined behaviour (yet), even if v.size()==0.
3249

3350
// can be re-assigned as long as value type remains unchanged
3451
two = three - one;

docs/guide/dataflow.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,8 @@ The dataflow accepts (up to three) optional keyword arguments options to configu
2828
| `dataset::weight(scale)` | Apply a global `scale` to all weights. | `1.0` |
2929
| `dataset::head(nrows)` | Process the first `nrows` of the dataset. | `-1` (all entries) |
3030
31-
:::{example}
31+
:::{admonition} Example
32+
:class: note
3233
```cpp
3334
dataflow df(multithread::enable(10), dataset::weight(1.234), dataset::head(100));
3435
```

docs/guide/datasets.md

Lines changed: 13 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,12 @@ A dataflow needs at least one input dataset with rows to loop over.
44
Presumably, the dataset also has columns containing some data to analyze for each entry.
55
Arbitrary dataset formats and column data can be supported by implementing their respective ABCs.
66

7-
```{admonition} Template
8-
:class: note
7+
:::{card} Template
98
```{code} cpp
10-
auto ds = df.load(dataset::input<FORMAT>(ARGUMENTS...));
9+
auto ds = df.load(dataset::input<DS>(ARGS...));
1110
auto col = ds.read(dataset::column<DTYPE>(NAME));
1211
```
13-
14-
```{seealso}
15-
- [`dataset::source`](#dataset-source) and [`dataset::reader`](#dataset-reader)
16-
- [`column::reader`](#column-reader)
17-
```
12+
:::
1813

1914
## Loading-in a dataset
2015

@@ -62,11 +57,10 @@ A dataflow can load multiple datasets of different input formats into one datafl
6257

6358
:::{card}
6459
:text-align: center
65-
<!-- :::{topic} JSON and CSV side-by-side. -->
60+
JSON and CSV side-by-side.
61+
^^^
6662
```{image} ../images/json_csv.png
6763
```
68-
+++
69-
JSON and CSV side-by-side.
7064
:::
7165

7266
```{code} cpp
@@ -76,15 +70,20 @@ using csv = qty::csv;
7670
auto y = df.read(dataset::input<csv>(data_csv), dataset::column<double>("y"));
7771
7872
// x from json, y from csv
79-
auto z = x + y;
73+
auto z = x + y; // see next section
8074
```
8175

82-
```{admonition} Dataset partition requirements
76+
:::{admonition} Dataset partition requirements
8377
:class: important
8478
When multiple datasets are loaded into a dataflow, the `queryosity::dataset::source::partition()` implementation of each dataset **MUST** collectively satisfy:
8579
- All non-empty partitions **MUST** have the same total number of entries.
8680
- If the sub-range boundaries are not aligned with one another, then a common denominator partition with only sub-range boundaries present across all partitions is determined and used in parallelizing the dataflow.
8781
- A dataset can report an empty partition to relinquish the control of the entry loop to the other dataset(s) in the dataflow.
8882
- Thus, there **MUST** be at least one dataset that reports a non-empty partition.
8983
- The dataset with an empty partition, as well as its columns, **MUST** remain in a valid state for traversing over any entry numbers as dictated by the other dataset(s).
90-
```
84+
:::
85+
86+
:::{seealso}
87+
- [`dataset::source`](#dataset-source) and [`dataset::reader`](#dataset-reader)
88+
- [`column::reader`](#column-reader)
89+
:::

docs/guide/queries.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,11 @@ There are a total of three steps in fully specifying a query:
77
2. Input column(s) with which it is filled with.
88
3. Associated selection(s) at which it is performed.
99

10-
```{admonition} Template
11-
:class: note
10+
:::{card} Template
1211
```{code} cpp
13-
auto q = df.get(query::output<DEFINITION>(ARGUMENTS...))
14-
.fill(COLUMNS...)
15-
.at(SELECTIONS...);
12+
auto q = df.get(query::output<DEF>(ARGS...))
13+
.fill(COLS...)
14+
.at(SELS...);
1615
```
1716
:::
1817

docs/guide/selections.md

Lines changed: 72 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,59 +1,105 @@
11
{#applying-selections}
22
# Applying selections
33

4-
## Initiate a cutflow
4+
::::{tab-set}
55

6-
Call queryosity::dataflow::filter() or queryosity::dataflow::weight() to initiate a selection in the cutflow
6+
:::{tab-item} Existing column
7+
:::{card} Template
8+
```cpp
9+
auto cut = df.filter(COL);
10+
auto wgt = df.weight(COL);
11+
```
12+
:::
713

14+
:::{tab-item} Constant
15+
:::{card} Template
816
```cpp
9-
auto [w, cat] = ds.read(dataset::column<double>("weight"),
10-
dataset::column<std::string>("category"));
11-
auto a = df.define(column::constant<std::string>("a"));
12-
auto b = df.define(column::constant<std::string>("b"));
13-
auto c = df.define(column::constant<std::string>("c"));
17+
auto cut = df.filter(dataset::constant(VAL));
18+
auto wgt = df.weight(dataset::constant(VAL));
19+
```
20+
:::
21+
22+
:::{tab-item} Expression
23+
:::{card} Template
24+
```cpp
25+
auto cut = df.filter(column::expression(FUNC))(COLS...);
26+
auto wgt = df.weight(column::expression(FUNC))(COLS...);
27+
```
28+
:::
29+
30+
:::{tab-item} Definition
31+
:::{card} Template
32+
```cpp
33+
auto cut = df.filter(column::definition<DEF>(ARGS...))(COLS...);
34+
auto wgt = df.weight(column::definition<DEF>(ARGS...))(COLS...);
35+
```
36+
:::
37+
38+
::::
39+
1440

15-
// initiate a cutflow
16-
auto weighted = df.weight(w);
41+
## Initiating a cutflow
42+
Call `dataflow::filter()` or `dataflow::weight()` to initiate a selection in the cutflow.
43+
```cpp
44+
auto all = df.filter(column::constant(true));
1745
```
1846

47+
1948
## Compounding selections
2049

21-
Subsequently-compounded selections from existing ones can be applied by chained `filter()`/`weight()` calls.
50+
Selections can be compounded onto existing ones regardless of their cut/weight specification:
51+
a cut simply passes through the weight decision of its previous selection (if one exists), and vice versa.
52+
2253
```cpp
23-
// cuts and weights can be compounded in any order.
24-
auto cut =
25-
weighted.filter(column::expression([](double w) { return (w >= 0;); }))(w);
54+
auto w = ds.read(dataset::column<double>("weight"));
55+
56+
auto sel = all.weight(w).filter(
57+
column::expression([](double w) { return (w >= 0;); }))(w);
58+
// cut = (true) && (true) && (w>=0);
59+
// weight = (1.0) * (w) * (1.0);
2660
```
2761
2862
## Branching selections
2963
64+
Applying multiple selections from a common node creates a branching in the cutflow.
65+
3066
```cpp
31-
// applying more than one selection from a node creates a branching point.
32-
auto cut_a = cut.filter(cat == a);
33-
auto cut_b = cut.filter(cat == b);
34-
auto cut_c = cut.filter(cat == c);
67+
auto cat = ds.read(dataset::column<std::string>("category"));
68+
auto a = df.define(column::constant<std::string>("a"));
69+
auto b = df.define(column::constant<std::string>("b"));
70+
auto c = df.define(column::constant<std::string>("c"));
71+
72+
auto sel_a = sel.filter(cat == a);
73+
auto sel_b = sel.filter(cat == b);
74+
auto sel_c = sel.filter(cat == c);
3575
```
3676

37-
## Merging selections
77+
## Joining selections
78+
79+
Any set of selections can be merged back together by `&&`/`||`/`*`-ing them.
3880

3981
```cpp
40-
// selections can be merged based on their decision values.
41-
auto cut_a_and_b = df.filter(cut_a && cut_b);
42-
auto cut_b_or_c = df.filter(cut_b || cut_c);
82+
// why weight(w)? see below
83+
auto sel_a_and_b = df.filter(sel_a && sel_b).weight(w);
84+
auto sel_b_or_c = df.weight(w).filter(sel_b || sel_c);
4385
```
86+
87+
:::{important}
88+
The mechanism for joining selections is simply that of [Basic operations](#computing-columns-operators) between columns.
89+
Therefore, a joined cut/weight constitutes the first selection in a new cutflow, while its complementary decision, is discarded.
90+
These can (and should) be re-applied at any point in the new cutflow.
91+
:::
92+
4493
## Yield at a selection
4594

4695
```cpp
4796
// single selection
48-
auto all = df.filter(column::constant(true));
49-
auto yield_tot = def.get(selection::yield(all));
50-
unsigned long long yield_tot_entries =
51-
yield_tot.entries; // number of entries passed
97+
auto yield_tot = df.get(selection::yield(all));
98+
unsigned long long yield_tot_entries = yield_tot.entries; // number of entries
5299
double yield_tot_value = yield_tot.value; // sum(weights)
53100
double yield_tot_error = yield_tot.error; // sqrt(sum(weights squared))
54101

55102
// multiple selections
56-
// (sel_a/b/c: (varied) lazy selections)
57103
auto [yield_a, yield_b, yield_c] =
58104
df.get(selection::yield(sel_a, sel_b, sel_c));
59105
```

docs/index.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
# Welcome to Queryosity
2+
13
![Version](https://img.shields.io/badge/Version-0.4.1-blue.svg)
24
![C++ Standard](https://img.shields.io/badge/C++-17-blue.svg)
35
[![Ubuntu](https://github.com/taehyounpark/analogical/actions/workflows/ubuntu.yml/badge.svg?branch=master)](https://github.com/taehyounpark/analogical/actions/workflows/ubuntu.yml)
@@ -21,4 +23,4 @@ start/index
2123
guide/index
2224
examples/index
2325
references/index
24-
```
26+
```

docs/start/conceptual.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ Selection
7272
- A series of two or more cuts becomes their intersection, `and`
7373
- A floating-point `weight` to assign a statistical significance to the entry.
7474
- A series of two or more weights becomes to their product, `*`.
75+
- A cut is referred to as being *complementary* to weight and vice versa.
7576

7677
***
7778

include/queryosity/todo_varied.h

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -74,13 +74,6 @@ class todo<Helper>::varied : public dataflow::node,
7474
-> std::array<typename lazy<query::booked_t<V>>::varied,
7575
sizeof...(Nodes)>;
7676

77-
/**
78-
* @brief Shortcut for `evaluate()`/`apply()`/`fill()` for
79-
* columns/selections/queries.
80-
* @tparam Cols (Varied) Input column types.
81-
* @param[in] cols... Input columns.
82-
* @return Lazy column definition
83-
*/
8477
template <typename... Cols>
8578
auto operator()(Cols &&...cols) ->
8679
typename decltype(std::declval<todo<Helper>>().operator()(

0 commit comments

Comments
 (0)