|
3 | 3 |
|
4 | 4 | `queryosity` has been purposefully designed for data analysis workflows in high-energy physics experiments prioritizing the following principles.
|
5 | 5 |
|
6 |
| -@section design-clear-interface Clear interface above all else. |
| 6 | +@section design-clear-interface Clarity and consistency above all else. |
7 | 7 |
|
8 |
| -- Provide a faithful, one-to-one correspondence between the description of the analysis logic by the interface and its underlying graph(s) of tasks being performed. |
9 |
| -- The analysis code written by Alice must be readable and understandable to Bob, and vice versa. |
| 8 | +- The interface should be a faithful representation of the analysis task graph. |
| 9 | +- The analysis code written by Alice should be readable and understandable to Bob, and vice versa. |
10 | 10 |
|
11 |
| -@section design-arbitrary-data Arbirary data types. |
| 11 | +@section design-arbitrary-data Arbitrary data types. |
12 | 12 |
|
13 |
| -- Many "columns" are not POD: they are of non-trivial data types containing nested properties, links to data of other types, etc. The interface for handling these columns should be front-and-center. |
| 13 | +- Many "columns" are not trivial: they can contain nested properties, links to other data, etc. |
14 | 14 | - If a dataset has rows, or "events", the library should be able to run over it.
|
15 | 15 | - Output results of any data structure as desired.
|
16 | 16 |
|
17 | 17 | @section design-cutflow Unified cutflow for cuts and weights.
|
18 | 18 |
|
19 |
| -- There is exactly one difference between a decision to (1) accept an event (cut), or (2) assign a statistical significance to it (weight): one is a yes-or-no, and the other is a number. |
20 |
| -- Selections are defined individually, then connected through a "cutflow" that is as deep (compounded selections) or wide (branched selections) as needed. |
21 |
| -- Whenever a particular selection is in effect for an event, all queries are consistently populated with the same cut and weight. |
| 19 | +- There is only one difference between (1) accepting an event (cut), or (2) assigning a statistical significance to it (weight): one is a yes-or-no, and the other is a number. |
| 20 | +- Selections can be arbitrarily deep (compounded selections) or wide (branched selections). |
| 21 | +- Whenever a particular selection is in effect for an event, all queries are populated with the same entries and weights. |
22 | 22 |
|
23 | 23 | @section design-performance Optimal(maximal) efficiency(usage) of computational resources.
|
24 | 24 |
|
25 | 25 | - Never perform an action for an event unless needed.
|
26 |
| -- The dataset is partitioned, and the traversal over each sub-range is parallelized. |
| 26 | +- Partition the dataset and traverse over each sub-range in parallel. |
27 | 27 |
|
28 | 28 | @section design-systematic-variations Built-in, generalized handling of systematic variations.
|
29 | 29 |
|
30 | 30 | - An experiment can be subject to @f$ O(100) @f$ sources of "systematic uncertainties".
|
31 |
| -- Applying systematic variations that are (1) specified once and automatically propagated, and (2) processed all at once in one dataset traversal, are crucial in minimizing "time-to-insight". |
| 31 | +- Applying systematic variations that are (1) specified once and automatically propagated, and (2) processed all at once in one dataset traversal, is crucial for minimizing "time-to-insight". |
32 | 32 |
|
33 | 33 | @see @ref conceptual
|
0 commit comments