Skip to content

Commit 8ab2f7e

Browse files
committed
Docs & examples
1 parent 15fddcc commit 8ab2f7e

26 files changed

+774
-573
lines changed

README.md

Lines changed: 29 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -21,14 +21,13 @@
2121

2222
## Hello World
2323
```cpp
24-
#include "queryosity/json.h"
25-
#include "queryosity/hist.h"
26-
27-
#include "queryosity.h"
28-
2924
#include <fstream>
30-
#include <vector>
3125
#include <sstream>
26+
#include <vector>
27+
28+
#include "queryosity.h"
29+
#include "queryosity/hist.h"
30+
#include "queryosity/json.h"
3231

3332
using dataflow = qty::dataflow;
3433
namespace multithread = qty::multithread;
@@ -41,31 +40,30 @@ using h1d = qty::hist::hist<double>;
4140
using linax = qty::hist::axis::regular;
4241

4342
int main() {
44-
45-
dataflow df( multithread::enable(10) );
46-
47-
std::ifstream data("data.json");
48-
auto [x, w] = df.read(
49-
dataset::input<json>(data),
50-
dataset::column<std::vector<double>>("x"),
51-
dataset::column<double>("w")
52-
);
53-
54-
auto zero = df.define( column::constant(0) );
55-
auto x0 = x[zero];
56-
57-
auto sel = df.weight(w).filter(
58-
column::expression([](std::vector<double> const& v){return v.size()}), x
59-
);
60-
61-
auto h_x0_w = df.get(
62-
query::output<h1d>( linax(100,0.0,1.0) )
63-
).fill(x0).book(sel).result();
64-
65-
std::ostringstream os;
66-
os << *h_x0_w;
67-
std::cout << os.str() << std::endl;
68-
43+
dataflow df(multithread::enable(10));
44+
45+
std::ifstream data("data.json");
46+
auto [x, v, w] = df.read(
47+
dataset::input<json>(data), dataset::column<double>("x"),
48+
dataset::column<std::vector<double>>("v"), dataset::column<double>("w"));
49+
50+
auto zero = df.define(column::constant(0));
51+
auto v0 = v[zero];
52+
53+
auto sel =
54+
df.weight(w)
55+
.filter(column::expression(
56+
[](std::vector<double> const &v) { return v.size(); }))(v)
57+
.filter(column::expression([](double x) { return x > 100.0; }))(x);
58+
59+
auto h_x0_w = df.get(query::output<h1d>(linax(20, 0.0, 200.0)))
60+
.fill(v0)
61+
.at(sel)
62+
.result();
63+
64+
std::ostringstream os;
65+
os << *h_x0_w;
66+
std::cout << os.str() << std::endl;
6967
}
7068
```
7169

docs/pages/conceptual.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ For multithreaded runs, the user must also define how outputs from individual th
7272

7373
- It must be associated with a selection whose cut determines which entries to count.
7474
- (Optional) The result is populated with the weight taken into account.
75-
- How an entry is to be counted to populate the query depends on the user definition, i.e. it is an arbitrary action.
75+
- How an entry populates the query depends on its implementation.
7676
- (Optional) The result is populated based on values of inputs columns.
7777

7878
Two common workflows exist in associating queries with selections:
@@ -87,7 +87,7 @@ A sensitivity analysis means to study how changes in the system's inputs affect
8787
In the context of dataset queries, a **systematic variation** constitutes a __change in a column value that affects the outcome of selections and queries__.
8888

8989
Encapsulating the nominal and variations of a column creates a `varied` node in which each variation is mapped by the name of its associated systematic variation.
90-
A varied node in a dataflow can be treated functionally identical to a non-varied one, with all nominal+variations being propagated through all relevant task graphs implicitly:
90+
A varied node can be treated functionally identical to a non-varied one, with all nominal+variations being propagated through the relevant task graphs implicitly:
9191

9292
- Any column definitions and selections evaluated out of varied input columns will be varied.
9393
- Any queries performed filled with varied input columns and/or at varied selections will be varied.
@@ -97,7 +97,7 @@ The propagation proceeds in the following fashion:
9797
- **Lockstep.** If two actions each have a variation of the same name, they are in effect together.
9898
- **Transparent.** If only one action has a given variation, then the nominal is in effect for the other.
9999

100-
All variations are processed at once in a single dataset traversal, i.e. they do not incur additional runtime overhead other than what is already required to perform the actions themselves.
100+
All variations are processed at once in a single dataset traversal; in other words, they do not incur any additional runtime overhead other than what is needed to perform the actions themselves.
101101

102102
@image html variation.png "Propagation of systematic variations."
103103

docs/pages/design.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,31 +3,31 @@
33

44
`queryosity` has been purposefully designed for data analysis workflows in high-energy physics experiments prioritizing the following principles.
55

6-
@section design-clear-interface Clear interface above all else.
6+
@section design-clear-interface Clarity and consistency above all else.
77

8-
- Provide a faithful, one-to-one correspondence between the description of the analysis logic by the interface and its underlying graph(s) of tasks being performed.
9-
- The analysis code written by Alice must be readable and understandable to Bob, and vice versa.
8+
- The interface should be a faithful representation of the analysis task graph.
9+
- The analysis code written by Alice should be readable and understandable to Bob, and vice versa.
1010

11-
@section design-arbitrary-data Arbirary data types.
11+
@section design-arbitrary-data Arbitrary data types.
1212

13-
- Many "columns" are not POD: they are of non-trivial data types containing nested properties, links to data of other types, etc. The interface for handling these columns should be front-and-center.
13+
- Many "columns" are not trivial: they can contain nested properties, links to other data, etc.
1414
- If a dataset has rows, or "events", the library should be able to run over it.
1515
- Output results of any data structure as desired.
1616

1717
@section design-cutflow Unified cutflow for cuts and weights.
1818

19-
- There is exactly one difference between a decision to (1) accept an event (cut), or (2) assign a statistical significance to it (weight): one is a yes-or-no, and the other is a number.
20-
- Selections are defined individually, then connected through a "cutflow" that is as deep (compounded selections) or wide (branched selections) as needed.
21-
- Whenever a particular selection is in effect for an event, all queries are consistently populated with the same cut and weight.
19+
- There is only one difference between (1) accepting an event (cut), or (2) assigning a statistical significance to it (weight): one is a yes-or-no, and the other is a number.
20+
- Selections can be arbitrarily deep (compounded selections) or wide (branched selections).
21+
- Whenever a particular selection is in effect for an event, all queries are populated with the same entries and weights.
2222

2323
@section design-performance Optimal(maximal) efficiency(usage) of computational resources.
2424

2525
- Never perform an action for an event unless needed.
26-
- The dataset is partitioned, and the traversal over each sub-range is parallelized.
26+
- Partition the dataset and traverse over each sub-range in parallel.
2727

2828
@section design-systematic-variations Built-in, generalized handling of systematic variations.
2929

3030
- An experiment can be subject to @f$ O(100) @f$ sources of "systematic uncertainties".
31-
- Applying systematic variations that are (1) specified once and automatically propagated, and (2) processed all at once in one dataset traversal, are crucial in minimizing "time-to-insight".
31+
- Applying systematic variations that are (1) specified once and automatically propagated, and (2) processed all at once in one dataset traversal, is crucial for minimizing "time-to-insight".
3232

3333
@see @ref conceptual

0 commit comments

Comments
 (0)