You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -22,8 +26,12 @@ Actions of each task graph can receive ones of the previous graphs as inputs:
22
26
23
27
## Lazy actions
24
28
25
-
All actions are *lazy*, meaning they are not executed them unless required.
26
-
Accessing the result of a query turns it and all other actions *eager*, triggering the dataset traversal.
29
+
Lazy action
30
+
: An action that is not performed, i.e. initialized/executed/finalized, unless requested by the user.
31
+
32
+
***
33
+
34
+
Accessing the result of a lazy query turns it and all other actions *eager*, triggering the dataset traversal.
27
35
The eagerness of actions in each entry is as follows:
28
36
29
37
1. A query is performed only if its associated selection passes the cut.
@@ -32,65 +40,91 @@ The eagerness of actions in each entry is as follows:
32
40
33
41
## Columns
34
42
35
-
A `column` holds a value of some data type `T` to be updated for each entry.
36
-
Columns that are read-in from a dataset or user-defined constants are *independent*, i.e. their values do not depend on others, whereas columns evaluated out of existing ones as inputs are *dependent*.
37
-
The tower of dependent columns evaluated out of more independent ones forms the computation graph:
43
+
Column
44
+
: An action that holds a value of some data type `T` to be updated for each entry.
45
+
46
+
Independent column
47
+
: A column whose value does not depend on others
48
+
49
+
Dependent column
50
+
: A column whose value is evaluated out of those from other columns as inputs.
51
+
52
+
***
53
+
54
+
The tower of dependent columns can be constructed to form the computation graph:
38
55
56
+
:::{card}
57
+
:text-align: center
39
58

59
+
+++
60
+
Example computation graph.
61
+
:::
40
62
41
63
Only the minimum number of computations needed are performed for each entry:
42
-
- If and when a column value is computed for an entry, it is cached and never re-computed.
43
-
- A column value is not copied when used as an input for dependent columns.
44
-
- It *is* copied if a conversion is required.
64
+
- A column value is computed *once* for an entry (if needed), then cached and never re-computed.
65
+
- A column value is not copied when used as an input for dependent columns (unless a conversion is needed).
45
66
46
67
## Selections
47
68
48
-
A `selection` represents a scalar-valued decision made on an entry:
49
-
50
-
- A boolean `cut` to determine if a query should be performed for a given entry.
69
+
Selection
70
+
: A scalar-valued column corresponding to a "decision" on an entry:
71
+
- A boolean `cut` to determine if a query should be performed for the entry.
51
72
- A series of two or more cuts becomes their intersection, `and`
52
-
- A floating-point `weight` to assign a statistical significance to the entry.
73
+
- A floating-point `weight` to assign a statistical significance to the entry.
53
74
- A series of two or more weights becomes to their product, `*`.
54
75
55
-
A cutflow can have from the following types connections between selections:
76
+
***
56
77
57
-

78
+
A cutflow can contain the following types of connections between selections:
58
79
59
80
- Applying a selection from an existing node, which determines the order in which they are compounded.
60
81
- Branching selections by applying more than one selection from a common node.
61
82
- Merging two selections, e.g. taking the union/intersection of two cuts.
62
83
63
-
Selections constitute a specific type of columns; as such, they are subject to the value-caching and evaluation behaviour of the computation graph.
64
-
Addditionally, the cutflow imposes the following rules on them:
84
+
:::{card}
85
+
:text-align: center
86
+

87
+
+++
88
+
Example cutflow.
89
+
:::
90
+
91
+
Selections constitute a specific type of columns, so they are subject to the lazy-evaluation and value-caching behaviour of the computation graph.
92
+
Addditionally, the cutflow imposes the following rules:
65
93
- The cut at a selection is evaluated only if all previous cuts have passed.
66
94
- The weight at a selection is evaluated only if its cut has passed.
67
95
68
96
## Queries
69
97
70
-
A `query` specifies an output result obtained from counting entries of the dataset.
71
-
For multithreaded runs, the user must also define how outputs from individual threads should be merged together to yield a result representative of the full dataset.
72
-
73
-
- It must be associated with a selection whose cut determines which entries to count.
98
+
Query
99
+
: An action that outputs result of some data type `T` after traversing the dataset.
100
+
- It must be associated with a selection whose cut determines which entries to count.
74
101
- (Optional) The result is populated with the weight taken into account.
75
-
- How an entry populates the query depends on its implementation.
102
+
- How the query counts an entry is a user-implemented arbitrary action.
76
103
- (Optional) The result is populated based on values of inputs columns.
77
104
78
-
Two common workflows exist in associating queries with selections:
105
+
***
106
+
107
+
:::{card}
108
+
:text-align: center
109
+
```{image} ../images/query_1.png
110
+
+++
111
+
Making, filling, and booking a query.
112
+
:::
79
113
80
-
@image html query_1.png "Running a single query at multiple selections."
114
+
## Systematic variations
81
115
82
-
@image html query_2.png "Running multiple queries at a selection."
116
+
Systematic variation
117
+
: A change in a column value that affects the outcomes of associated selections and queries.
A sensitivity analysis means to study how changes in the system's inputs affect the output.
87
-
In the context of dataset queries, a **systematic variation** constitutes a __change in a column value that affects the outcome of selections and queries__.
121
+
A sensitivity analysis means to study how changes in the system's inputs affect its output.
122
+
In the context of a dataflow, the inputs are column values and outputs are query results.
88
123
89
-
Encapsulating the nominal and variations of a column creates a `varied` node in which each variation is mapped by the name of its associated systematic variation.
90
-
A varied node can be treated functionally identical to a non-varied one, with all nominal+variations being propagated through the relevant task graphs implicitly:
124
+
The nominal and variations of a column can be encapsulted within a *varied* node, which can be treated functionally identical to a nominal-only one except that all nominal+variations are propagated through downstream actions implicitly:
91
125
92
126
- Any column definitions and selections evaluated out of varied input columns will be varied.
93
-
- Any queries performed filled with varied input columns and/or at varied selections will be varied.
127
+
- Any queries performed with varied input columns and/or at varied selections will be varied.
94
128
95
129
The propagation proceeds in the following fashion:
96
130
@@ -99,6 +133,12 @@ The propagation proceeds in the following fashion:
99
133
100
134
All variations are processed at once in a single dataset traversal; in other words, they do not incur any additional runtime overhead other than what is needed to perform the actions themselves.
101
135
102
-
@image html variation.png "Propagation of systematic variations."
136
+
:::{card}
137
+
:text-align: center
138
+
```{image} ../images/variation.png
139
+
```
140
+
+++
141
+
Propagation of systematic variations on $z = x+y$.
0 commit comments