You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -43,24 +43,31 @@ Call queryosity::dataflow::load() with an input dataset and its constructor argu
43
43
The loaded dataset can then read out columns, provided their data types and names.
44
44
45
45
@cpp
46
-
using json = qty::json;
46
+
std::ifstream data_json("data.json");
47
47
48
-
std::ifstream data("data.json");
49
-
auto ds = df.load(dataset::input<json>(data));
48
+
// load a dataset
49
+
using json = qty::json;
50
+
auto ds = df.load(dataset::input<json>(data_json));
50
51
52
+
// read a column
51
53
auto x = ds.read(dataset::column<double>("x"));
54
+
55
+
// shortcut: read multiple columns from a dataset
56
+
auto [w, cat] = ds.read(dataset::column<double>("weight"), dataset::column<std::string>("category"));
52
57
@endcpp
53
58
54
59
A dataflow can load multiple datasets, as long as all valid partitions reported by queryosity::dataset::source::partition() have the same number of total entries.
55
60
A dataset can report an empty partition, which signals that it relinquishes the control to the other datasets.
56
61
57
62
@cpp
58
-
using csv = qty::csv;
59
-
60
63
std::ifstream data_csv("data.csv");
61
-
auto y = df.load(dataset::input<csv>(data_csv)).read(dataset::column<double>("y"));
62
64
63
-
auto z = x+y;
65
+
// another shortcut: load dataset & read column(s) at once
66
+
using csv = qty::csv;
67
+
auto y = df.read(dataset::input<csv>(data_csv), dataset::column<double>("y"));
68
+
69
+
// x from json, y from csv
70
+
auto z = x + y; // (see next section)
64
71
@endcpp
65
72
66
73
@see
@@ -76,94 +83,116 @@ auto z = x+y;
76
83
77
84
@section guide-column Computing quantities
78
85
79
-
Call queryosity::dataflow::define() with the appropriate argument.
86
+
New columns can be computed out of existing ones by calling queryosity::dataflow::define() with the appropriate argument, or operators between the underlying data types.
// (C++ funciton, functor, lambda, etc.) evaluated out of input columns
108
+
A column can also be defined by a custom implementation, which offers:
105
109
106
-
// pass large values by const reference to prevent expensive copies
107
-
auto s_length = df.define(column::expression([](const std::string& txt){return txt.length();}), s);
110
+
- Customization: user-defined constructor arguments and member variables/functions.
111
+
- Optimization: each input column is provided as a column::observable<T>, which defers its computation for the entry until column::observable<T>::value() is invoked.
Call queryosity::dataflow::filter() or queryosity::dataflow::weight() to initiate a selection in the cutflow, and apply subsequent selections from existing nodes to compound them.
Call queryosity::dataflow::vary() to create varied columns.
208
236
There are two ways in which variations can be specified:
209
237
210
-
1.**Automatic.** Specify a specific type of column to be instantiated along with the nominal+variations. Always ensures the lockstep+transparent propagation of variations.
211
-
2.**Manual.** Provide existing instances of columns to be nominal+variations; any column whose output value type is compatible to that of the nominal can be set as a variation.
238
+
1.**Pre-instantiation.** Provide the nominal argument and a mapping of variation name to alternate arguments.
239
+
2.**Post-instantiation.** Provide existing instances of columns to be the nominal and its variations.
240
+
- Any column whose data type is compatible with that of the nominal can be set as a variation.
212
241
213
242
Both approaches can (and should) be used in a dataflow for full control over the creation & propagation of systematic variations.
214
243
215
244
@cpp
216
-
// automatic -- set and forget
245
+
// pre-instantiation
217
246
218
247
// dataset columns are varied by different column names
0 commit comments