diff --git a/README.md b/README.md index 8d506a5f..689a5559 100644 --- a/README.md +++ b/README.md @@ -93,13 +93,9 @@ Arquero uses modern JavaScript features, and so will not work with some outdated ### In Node.js or Application Bundles -First install `arquero` as a dependency, for example via `npm install arquero --save`. Arquero assumes Node version 12 or higher. - -Import using CommonJS module syntax: - -```js -const aq = require('arquero'); -``` +First install `arquero` as a dependency, for example via `npm install arquero --save`. +Arquero assumes Node version 18 or higher. +As of Arquero version 6, the library uses type `module` and should be loaded using ES module syntax. Import using ES module syntax, import all exports into a single object: @@ -113,6 +109,12 @@ Import using ES module syntax, with targeted imports: import { op, table } from 'arquero'; ``` +Dynamic import (e.g., within a Node.js REPL): + +```js +aq = await import('arquero'); +``` + ## Build Instructions To build and develop Arquero locally: diff --git a/docs/api/expressions.md b/docs/api/expressions.md index ea0d0d5a..cd819764 100644 --- a/docs/api/expressions.md +++ b/docs/api/expressions.md @@ -142,14 +142,10 @@ So why do we do this? Here are a few reasons: * **Performance**. After parsing an expression, Arquero performs code generation, often creating more performant code in the process. This level of indirection also allows us to generate optimized expressions for certain inputs, such as Apache Arrow data. -* **Flexibility**. Providing our own parsing also allows us to introduce new kinds of backing data without changing the API. For example, we could add support for different underlying data formats and storage layouts. - -* **Portability**. While a common use case of Arquero is to query data directly in the same JavaScript runtime, Arquero verbs can also be [*serialized as queries*](./#queries): one can specify verbs in one environment, but then send them to another environment for processing. For example, the [arquero-worker](https://github.com/uwdata/arquero-worker) package sends queries to a worker thread, while the [arquero-sql](https://github.com/chanwutk/arquero-sql) package sends them to a backing database server. As custom methods may not be defined in those environments, Arquero is designed to make this translation between environments possible and easier to reason about. - -* **Safety**. Arquero table expressions do not let you call methods defined on input data values. For example, to trim a string you must call `op.trim(str)`, not `str.trim()`. Again, this aids portability: otherwise unsupported methods defined on input data elements might "sneak" in to the processing. Invoking arbitrary methods may also lead to security vulnerabilities when allowing untrusted third parties to submit queries into a system. +* **Flexibility**. Providing our own parsing also allows us to introduce new kinds of backing data without changing the API. For example, we could add support for different underlying data formats and storage layouts. More importantly, it also allows us analyze expressions and incorporate aggregate and window functions in otherwise "normal" JavaScript expressions. * **Discoverability**. Defining all functions on a single object provides a single catalog of all available operations. In most IDEs, you can simply type `op.` (and perhaps hit the tab key) to the see a list of all available functions and benefit from auto-complete! -Of course, one might wish to make different trade-offs. Arquero is designed to support common use cases while also being applicable to more complex production setups. This goal comes with the cost of more rigid management of functions. However, Arquero can be extended with custom variables, functions, and even new table methods or verbs! As starting points, see the [params](table#params), [addFunction](extensibility#addFunction), and [addTableMethod](extensibility#addTableMethod) functions to introduce external variables, register new `op` functions, or extend tables with new methods. +Of course, one might wish to make different trade-offs. Arquero is designed to support common use cases while also being applicable to more complex production setups. This goal comes with the cost of more rigid management of functions. However, Arquero can be extended with custom variables, functions, and even new table methods or verbs! As starting points, see the [params](table#params) and [addFunction](extensibility#addFunction) methods to introduce external variables or register new `op` functions. -All that being said, not all use cases require portability, safety, etc. For such cases Arquero provides an escape hatch: use the [`escape()` expression helper](./#escape) to apply a standard JavaScript function *as-is*, skipping any internal parsing and code generation. \ No newline at end of file +All that being said, Arquero provides an escape hatch: use the [`escape()` expression helper](./#escape) to apply a standard JavaScript function *as-is*, skipping any internal parsing and code generation. As a result, escaped functions do *not* support aggregation and window operations, as these depend on Arquero's internal parsing and code generation. diff --git a/docs/api/extensibility.md b/docs/api/extensibility.md index 30ab6a3a..0308c139 100644 --- a/docs/api/extensibility.md +++ b/docs/api/extensibility.md @@ -14,6 +14,7 @@ title: Extensibility \| Arquero API Reference * [addVerb](#addVerb) * [Package Bundles](#packages) * [addPackage](#addPackage) +* [Table Methods](#table-methods)
@@ -123,158 +124,21 @@ aq.table({ x: [4, 3, 2, 1] }) ## Table Methods -Add new table-level methods or verbs. The [addTableMethod](#addTableMethod) function registers a new function as an instance method of tables only. The [addVerb](#addVerb) method registers a new transformation verb with both tables and serializable [queries](./#query). - -
# -aq.addTableMethod(name, method[, options]) · [Source](https://github.com/uwdata/arquero/blob/master/src/register.js) - -Register a custom table method, adding a new method with the given *name* to all table instances. The provided *method* must take a table as its first argument, followed by any additional arguments. - -This method throws an error if the *name* argument is not a legal string value. -To protect Arquero internals, the *name* can not start with an underscore (`_`) character. If a custom method with the same name is already registered, the override option must be specified to overwrite it. In no case may a built-in method be overridden. - -* *name*: The name to use for the table method. -* *method*: A function implementing the table method. This function should accept a table as its first argument, followed by any additional arguments. -* *options*: Function registration options. - * *override*: Boolean flag (default `false`) indicating if the added method is allowed to override an existing method with the same name. Built-in table methods can **not** be overridden; this flag applies only to methods previously added using the extensibility API. - -*Examples* - -```js -// add a table method named size, returning an array of row and column counts -aq.addTableMethod('size', table => [table.numRows(), table.numCols()]); -aq.table({ a: [1,2,3], b: [4,5,6] }).size() // [3, 2] -``` - -
# -aq.addVerb(name, method, params[, options]) · [Source](https://github.com/uwdata/arquero/blob/master/src/register.js) - -Register a custom transformation verb with the given *name*, adding both a table method and serializable [query](./#query) support. The provided *method* must take a table as its first argument, followed by any additional arguments. The required *params* argument describes the parameters the verb accepts. If you wish to add a verb to tables but do not require query serialization support, use [addTableMethod](#addTableMethod). - -This method throws an error if the *name* argument is not a legal string value. -To protect Arquero internals, the *name* can not start with an underscore (`_`) character. If a custom method with the same name is already registered, the override option must be specified to overwrite it. In no case may a built-in method be overridden. - -* *name*: The name to use for the table method. -* *method*: A function implementing the table method. This function should accept a table as its first argument, followed by any additional arguments. -* *params*: An array of schema descriptions for the verb parameters. These descriptors are needed to support query serialization. Each descriptor is an object with *name* (string-valued parameter name) and *type* properties (string-valued parameter type, see below). If a parameter has type `"Options"`, the descriptor can include an additional object-valued *props* property to describe any non-literal values, for which the keys are property names and the values are parameter types. -* *options*: Function registration options. - * *override*: Boolean flag (default `false`) indicating if the added method is allowed to override an existing method with the same name. Built-in verbs can **not** be overridden; this flag applies only to methods previously added using the extensibility API. - -*Parameter Types*. The supported parameter types are: - -* `"Expr"`: A single table expression, such as the input to [`filter()`](verbs/#filter). -* `"ExprList"`: A list of column references or expressions, such as the input to [`groupby()`](verbs/#groupby). -* `"ExprNumber"`: A number literal or numeric table expression, such as the *weight* option of [`sample()`](verbs/#sample). -* `"ExprObject"`: An object containing a set of expressions, such as the input to [`rollup()`](verbs/#rollup). -* `"JoinKeys"`: Input join keys, as in [`join()`](verbs/#join). -* `"JoinValues"`: Output join values, as in [`join()`](verbs/#join). -* `"Options"`: An options object of key-value pairs. If any of the option values are column references or table expressions, the descriptor should include a *props* property with property names as keys and parameter types as values. -* `"OrderKeys"`: A list of ordering criteria, as in [`orderby`](verbs/#orderby). -* `"SelectionList"`: A set of columns to select and potentially rename, as in [`select`](verbs/#select). -* `"TableRef"`: A reference to an additional input table, as in [`join()`](verbs/#join). -* `"TableRefList"`: A list of one or more additional input tables, as in [`concat()`](verbs/#concat). - -*Examples* - -```js -// add a bootstrapped confidence interval verb that -// accepts an aggregate expression plus options -aq.addVerb( - 'bootstrap_ci', - (table, expr, options = {}) => table - .params({ frac: options.frac || 1000 }) - .sample((d, $) => op.round($.frac * op.count()), { replace: true }) - .derive({ id: (d, $) => op.row_number() % $.frac }) - .groupby('id') - .rollup({ bs: expr }) - .rollup({ - lo: op.quantile('bs', options.lo || 0.025), - hi: op.quantile('bs', options.hi || 0.975) - }), - [ - { name: 'expr', type: 'Expr' }, - { name: 'options', type: 'Options' } - ] -); - -// apply the new verb -aq.table({ x: [1, 2, 3, 4, 6, 8, 9, 10] }) - .bootstrap_ci(op.mean('x')) -``` - -
- -## Package Bundles - -Extend Arquero with a bundle of functions, table methods, and/or verbs. - -
# -aq.addPackage(bundle[, options]) · [Source](https://github.com/uwdata/arquero/blob/master/src/register.js) - -Register a *bundle* of extensions, which may include standard functions, aggregate functions, window functions, table methods, and verbs. If the input *bundle* has a key named `"arquero_package"`, the value of that property is used; otherwise the *bundle* object is used directly. This method is particularly useful for publishing separate packages of Arquero extensions and then installing them with a single method call. - -A package bundle has the following structure: - -```js -const bundle = { - functions: { ... }, - aggregateFunctions: { ... }, - windowFunctions: { ... }, - tableMethods: { ... }, - verbs: { ... } -}; -``` - -All keys are optional. For example, `functions` or `verbs` may be omitted. Each sub-bundle is an object of key-value pairs, where the key is the name of the function and the value is the function to add. - -The lone exception is the `verbs` bundle, which instead uses an object format with *method* and *params* keys, corresponding to the *method* and *params* arguments of [addVerb](#addVerb): - -```js -const bundle = { - verbs: { - name: { - method: (table, expr) => { ... }, - params: [ { name: 'expr': type: 'Expr' } ] - } - } -}; -``` - -The package method performs validation prior to adding any package content. The method will throw an error if any of the package items fail validation. See the [addFunction](#addFunction), [addAggregateFunction](#addAggregateFunction), [addWindowFunction](#windowFunction), [addTableMethod](#addTableMethod), and [addVerb](#addVerb) methods for specific validation criteria. The *options* argument can be used to specify if method overriding is permitted, as supported by each of the aforementioned methods. - -* *bundle*: The package bundle of extensions. -* *options*: Function registration options. - * *override*: Boolean flag (default `false`) indicating if the added method is allowed to override an existing method with the same name. Built-in table methods or verbs can **not** be overridden; for table methods and verbs this flag applies only to methods previously added using the extensibility API. +To add new table-level methods, including transformation verbs, simply assign new methods to the `ColumnTable` class prototype. *Examples* ```js -// add a package -aq.addPackage({ - functions: { - square: x => x * x, - }, - tableMethods: { - size: table => [table.numRows(), table.numCols()] - } -}); -``` - -```js -// add a package, ignores any content outside of "arquero_package" -aq.addPackage({ - arquero_package: { - functions: { - square: x => x * x, - }, - tableMethods: { - size: table => [table.numRows(), table.numCols()] +import { ColumnTable, op } from 'arquero'; + +// add a sum verb, which returns a new table containing summed +// values (potentially grouped) for a given column name +Object.assign( + ColumnTable.prototype, + { + sum(column, { as = 'sum' } = {}) { + return this.rollup({ [as]: op.sum(column) }); } } -}); +); ``` - -```js -// add a package from a separate library -aq.addPackage(require('arquero-arrow')); -``` \ No newline at end of file diff --git a/docs/api/index.md b/docs/api/index.md index 2fb03053..6d128398 100644 --- a/docs/api/index.md +++ b/docs/api/index.md @@ -10,7 +10,7 @@ title: Arquero API Reference * [Table Input](#input) * [load](#load), [loadArrow](#loadArrow), [loadCSV](#loadCSV), [loadFixed](#loadFixed), [loadJSON](#loadJSON) * [Table Output](#output) - * [toArrow](#toArrow) + * [toArrow](#toArrow), [toArrowIPC](#toArrowIPC) * [Expression Helpers](#expression-helpers) * [op](#op), [agg](#agg), [escape](#escape) * [bin](#bin), [desc](#desc), [frac](#frac), [rolling](#rolling), [seed](#seed) @@ -18,8 +18,6 @@ title: Arquero API Reference * [all](#all), [not](#not), [range](#range) * [matches](#matches), [startswith](#startswith), [endswith](#endswith) * [names](#names) -* [Queries](#queries) - * [query](#query), [queryFrom](#queryFrom)
@@ -102,6 +100,11 @@ This method performs parsing only. To both load and parse an Arrow file, use [lo * *arrowTable*: An [Apache Arrow](https://arrow.apache.org/docs/js/) data table or a byte array (e.g., [ArrayBuffer](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer) or [Uint8Array](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array)) in the Arrow IPC format. * *options*: An Arrow import options object: * *columns*: An ordered set of columns to import. The input may consist of: column name strings, column integer indices, objects with current column names as keys and new column names as values (for renaming), or a selection helper function such as [all](#all), [not](#not), or [range](#range)). + * *convertDate*: Boolean flag (default `true`) to convert Arrow date values to JavaScript Date objects. If false, defaults to what the Arrow implementation provides, typically timestamps as number values. + * *convertDecimal*: Boolean flag (default `true`) to convert Arrow fixed point decimal values to JavaScript numbers. If false, defaults to what the Arrow implementation provides, typically byte arrays. The conversion will be lossy if the decimal can not be exactly represented as a double-precision floating point number. + *convertTimestamp*: Boolean flag (default `true`) to convert Arrow timestamp values to JavaScript Date objects. If false, defaults to what the Arrow implementation provides, typically timestamps as number values. + *convertBigInt*: Boolean flag (default `false`) to convert Arrow integers with bit widths of 64 bits or higher to JavaScript numbers. If false, defaults to what the Arrow implementation provides, typically `BigInt` values. The conversion will be lossy if the integer is so large it can not be exactly represented as a double-precision floating point number. + *memoize*: Boolean hint (default `true`) to enable memoization of expensive conversions. If true, memoization is applied for string and nested (list, struct) types, caching extracted values to enable faster access. Memoization is also applied to converted Date values, in part to ensure exact object equality. This hint is ignored for dictionary columns, whose values are always memoized. *Examples* @@ -405,10 +408,10 @@ const dt = await aq.loadJSON('data/table.json', { autoType: false }) ## Table Output -Methods for writing table data to an output format. Most output methods are defined as [table methods](table#output), not in the top level namespace. +Methods for writing data to an output format. Most output methods are available as [table methods](table#output), in addition to the top level namespace.
# -aq.toArrow(data[, options]) · [Source](https://github.com/uwdata/arquero/blob/master/src/arrow/encode/index.js) +aq.toArrow(data[, options]) · [Source](https://github.com/uwdata/arquero/blob/master/src/arrow/to-arrow.js) Create an [Apache Arrow](https://arrow.apache.org/docs/js/) table for the input *data*. The input data can be either an [Arquero table](#table) or an array of standard JavaScript objects. This method will throw an error if type inference fails or if the generated columns have differing lengths. For Arquero tables, this method can instead be invoked as [table.toArrow()](table#toArrow). @@ -477,6 +480,34 @@ const at = toArrow([ ]); ``` +
# +table.toArrowBuffer(data[, options]) · [Source](https://github.com/uwdata/arquero/blob/master/src/arrow/to-arrow-ipc.js) + +Format input data in the binary [Apache Arrow](https://arrow.apache.org/docs/js/) IPC format. The input data can be either an [Arquero table](#table) or an array of standard JavaScript objects. This method will throw an error if type inference fails or if the generated columns have differing lengths. For Arquero tables, this method can instead be invoked as [table.toArrowIPC()](table#toArrowIPC). + +The resulting binary data may be saved to disk or passed between processes or tools. For example, when using [Web Workers](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers), the output of this method can be passed directly between threads (no data copy) as a [Transferable](https://developer.mozilla.org/en-US/docs/Web/API/Transferable) object. Additionally, Arrow binary data can be loaded in other language environments such as [Python](https://arrow.apache.org/docs/python/) or [R](https://arrow.apache.org/docs/r/). + +This method will throw an error if type inference fails or if the generated columns have differing lengths. + +* *options*: Options for Arrow encoding, same as [toArrow](#toArrow) but with an additional *format* option. + * *format*: The Arrow IPC byte format to use. One of `'stream'` (default) or `'file'`. + +*Examples* + +Encode Arrow data from an input Arquero table: + +```js +import { table, toArrowIPC } from 'arquero'; + +const dt = table({ + x: [1, 2, 3, 4, 5], + y: [3.4, 1.6, 5.4, 7.1, 2.9] +}); + +// encode table as a transferable Arrow byte buffer +// here, infers Uint8 for 'x' and Float64 for 'y' +const bytes = toArrowIPC(dt); +```
@@ -776,58 +807,3 @@ table.rename(aq.names(['a', 'b', 'c'])) // select and rename the first three columns, all other columns are dropped table.select(aq.names(['a', 'b', 'c'])) ``` - - -
- - -## Queries - -Queries allow deferred processing. Rather than process a sequence of verbs immediately, they can be stored as a query. The query can then be *serialized* to be stored or transferred, or later *evaluated* against an Arquero table. - -
# -aq.query([tableName]) · [Source](https://github.com/uwdata/arquero/blob/master/src/query/query.js) - -Create a new query builder instance. The optional *tableName* string argument indicates the default name of a table the query should process, and is used only when evaluating a query against a catalog of tables. The resulting query builder includes the same [verb](verbs) methods as a normal Arquero table. However, rather than evaluating verbs immediately, they are stored as a list of verbs to be evaluated later. - -The method *query.evaluate(table, catalog)* will evaluate the query against an Arquero table. If provided, the optional *catalog* argument should be a function that takes a table name string as input and returns a corresponding Arquero table instance. The catalog will be used to lookup tables referenced by name for multi-table operations such as joins, or to lookup the primary table to process when the *table* argument to evaluate is `null` or `undefined`. - -Use the query *toObject()* method to serialize a query to a JSON-compatible object. Use the top-level [queryFrom](#queryFrom) method to parse a serialized query and return a new "live" query instance. - -*Examples* - -```js -// create a query, then evaluate it on an input table -const q = aq.query() - .derive({ add1: d => d.value + 1 }) - .filter(d => d.add1 > 5 ); - -const t = q.evaluate(table); -``` - -```js -// serialize a query to a JSON-compatible object -// the query can be reconstructed using aq.queryFrom -aq.query() - .derive({ add1: d => d.value + 1 }) - .filter(d => d.add1 > 5 ) - .toObject(); -``` - - -
# -aq.queryFrom(object) · [Source](https://github.com/uwdata/arquero/blob/master/src/query/query.js) - -Parse a serialized query *object* and return a new query instance. The input *object* should be a serialized query representation, such as those generated by the query *toObject()* method. - -*Examples* - -```js -// round-trip a query to a serialized form and back again -aq.queryFrom( - aq.query() - .derive({ add1: d => d.value + 1 }) - .filter(d => d.add1 > 5 ) - .toObject() -) -``` diff --git a/docs/api/op.md b/docs/api/op.md index 2f5c8627..d2387605 100644 --- a/docs/api/op.md +++ b/docs/api/op.md @@ -54,14 +54,6 @@ Merges two or more arrays in sequence, returning a new array. * *values*: The arrays to merge. -
# -op.join(array[, delimiter]) · [Source](https://github.com/uwdata/arquero/blob/master/src/op/functions/array.js) - -Creates and returns a new string by concatenating all of the elements in an *array* (or an array-like object), separated by commas or a specified *delimiter* string. If the *array* has only one item, then that item will be returned without using the delimiter. - -* *array*: The input array value. -* *join*: The delimiter string (default `','`). -
# op.includes(array, value[, index]) · [Source](https://github.com/uwdata/arquero/blob/master/src/op/functions/array.js) @@ -79,6 +71,14 @@ Returns the first index at which a given *value* can be found in the *sequence* * *sequence*: The input array or string value. * *value*: The value to search for. +
# +op.join(array[, delimiter]) · [Source](https://github.com/uwdata/arquero/blob/master/src/op/functions/array.js) + +Creates and returns a new string by concatenating all of the elements in an *array* (or an array-like object), separated by commas or a specified *delimiter* string. If the *array* has only one item, then that item will be returned without using the delimiter. + +* *array*: The input array value. +* *delimiter*: The delimiter string (default `','`). +
# op.lastindexof(sequence, value) · [Source](https://github.com/uwdata/arquero/blob/master/src/op/functions/array.js) @@ -102,21 +102,12 @@ Returns a new array in which the given *property* has been extracted for each el * *array*: The input array value. * *property*: The property name string to extract. Nested properties are not supported: the input `"a.b"` will indicates a property with that exact name, *not* a nested property `"b"` of the object `"a"`. -
# -op.slice(sequence[, start, end]) · [Source](https://github.com/uwdata/arquero/blob/master/src/op/functions/array.js) - -Returns a copy of a portion of the input *sequence* (array or string) selected from *start* to *end* (*end* not included) where *start* and *end* represent the index of items in the sequence. - -* *sequence*: The input array or string value. -* *start*: The starting integer index to copy from (inclusive, default `0`). -* *end*: The ending integer index to copy from (exclusive, default `sequence.length`). -
# -op.reverse(array) · [Source](https://github.com/uwdata/arquero/blob/master/src/op/functions/array.js) +op.reverse(sequence) · [Source](https://github.com/uwdata/arquero/blob/master/src/op/functions/array.js) -Returns a new array with the element order reversed: the first *array* element becomes the last, and the last *array* element becomes the first. The input *array* is unchanged. +Returns a new array or string with the element order reversed: the first *sequence* element becomes the last, and the last *sequence* element becomes the first. The input *sequence* is unchanged. -* *array*: The input array value. +* *sequence*: The input array or string value.
# op.sequence([start,] stop[, step]) · [Source](https://github.com/uwdata/arquero/blob/master/src/op/functions/sequence.js) @@ -127,6 +118,14 @@ Returns an array containing an arithmetic sequence from the *start* value to the * *stop*: The stopping value of the sequence. The stop value is exclusive; it is not included in the result. * *step*: The step increment between sequence values (default `1`). +
# +op.slice(sequence[, start, end]) · [Source](https://github.com/uwdata/arquero/blob/master/src/op/functions/array.js) + +Returns a copy of a portion of the input *sequence* (array or string) selected from *start* to *end* (*end* not included) where *start* and *end* represent the index of items in the sequence. + +* *sequence*: The input array or string value. +* *start*: The starting integer index to copy from (inclusive, default `0`). +* *end*: The ending integer index to copy from (exclusive, default `sequence.length`).
@@ -683,7 +682,7 @@ Compare two values for equality, using join semantics in which `null !== null`. Returns a boolean indicating whether the *object* has the specified *key* as its own property (as opposed to inheriting it). If the *object* is a [Map](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map) or [Set](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Set) instance, the `has` method will be invoked directly on the object, otherwise `Object.hasOwnProperty` is used. * *object*: The object, Map, or Set to test for property membership. -* *property*: The string property name to test for. +* *key*: The string key (property name) to test for.
# op.keys(object) · [Source](https://github.com/uwdata/arquero/blob/master/src/op/functions/object.js) @@ -811,6 +810,7 @@ If specified, the *index* looks up a value of the resulting match. If *index* is * *value*: The input string value. * *regexp*: The [regular expression](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions) to match against. +* *index*: The index into the match result array or capture group. *Examples* diff --git a/docs/api/table.md b/docs/api/table.md index d5c7088d..182f2b83 100644 --- a/docs/api/table.md +++ b/docs/api/table.md @@ -14,7 +14,7 @@ title: Table \| Arquero API Reference * [assign](#assign) * [transform](#transform) * [Table Columns](#columns) - * [column](#column), [columnAt](#columnAt), [columnArray](#columnArray) + * [column](#column), [columnAt](#columnAt) * [columnIndex](#columnIndex), [columnName](#columnName), [columnNames](#columnNames) * [Table Values](#table-values) * [array](#array), [values](#values) @@ -23,7 +23,7 @@ title: Table \| Arquero API Reference * [Table Output](#output) * [objects](#objects), [object](#object), [Symbol.iterator](#@@iterator) * [print](#print), [toHTML](#toHTML), [toMarkdown](#toMarkdown) - * [toArrow](#toArrow), [toArrowBuffer](#toArrowBuffer), [toCSV](#toCSV), [toJSON](#toJSON) + * [toArrow](#toArrow), [toArrowIPC](#toArrowIPC), [toCSV](#toCSV), [toJSON](#toJSON)
@@ -235,7 +235,7 @@ aq.table({ a: [1, 2], b: [3, 4] }) Get the column instance with the given *name*, or `undefined` if does not exist. The returned column object provides a lightweight abstraction over the column storage (such as a backing array), providing a *length* property and *get(row)* method. -A column instance may be used across multiple tables and so does _not_ track a table's filter or orderby critera. To access filtered or ordered values, use the table [get](#get), [getter](#getter), or [columnArray](#columnArray) methods. +A column instance may be used across multiple tables and so does _not_ track a table's filter or orderby critera. To access filtered or ordered values, use the table [get](#get), [getter](#getter), or [array](#array) methods. * *name*: The column name. @@ -260,16 +260,6 @@ const dt = aq.table({ a: [1, 2, 3], b: [4, 5, 6] }) dt.columnAt(1).get(1) // 5 ``` -
# -table.columnArray(name[, constructor]) · [Source](https://github.com/uwdata/arquero/blob/master/src/table/table.js) - -_This method is a deprecated alias for the table [array()](#array) method. Please use [array()](#array) instead._ - -Get an array of values contained in the column with the given *name*. Unlike direct access through the table [column](#column) method, the array returned by this method respects any table filter or orderby criteria. By default, a standard [Array](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array) is returned; use the *constructor* argument to specify a [typed array](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray). - -* *name*: The column name. -* *constructor*: An optional array constructor (default [`Array`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/Array)) to use to instantiate the output array. Note that errors or truncated values may occur when assigning to a typed array with an incompatible type. -
# table.columnIndex(name) · [Source](https://github.com/uwdata/arquero/blob/master/src/table/table.js) @@ -362,14 +352,14 @@ for (const value of table.values('colA')) { ``` ```js -// slightly less efficient version of table.columnArray('colA') +// slightly less efficient version of table.array('colA') const colValues = Array.from(table.values('colA')); ```
# table.data() · [Source](https://github.com/uwdata/arquero/blob/master/src/table/table.js) -Returns the internal table storage data structure. +Returns the internal table storage data structure: an object with column names for keys and column arrays for values. This method returns the same structure used by the Table (not a copy) and its contents should not be modified.
# table.get(name[, row]) · [Source](https://github.com/uwdata/arquero/blob/master/src/table/column-table.js) @@ -438,7 +428,7 @@ Perform a table scan, invoking the provided *callback* function for each row of * *callback*: Function invoked for each row of the table. The callback is invoked with the following arguments: * *row*: The table row index. - * *data*: The backing table data store. + * *data*: The backing table data store (as returned by table [`data`](#data) method). * *stop*: A function to stop the scan early. The callback can invoke *stop()* to prevent future scan calls. * *order*: A boolean flag (default `false`), indicating if the table should be scanned in the order determined by [orderby](verbs#orderby). This argument has no effect if the table is unordered. @@ -629,14 +619,15 @@ const at2 = dt.toArrow({ }); ``` -
# +
# table.toArrowBuffer([options]) · [Source](https://github.com/uwdata/arquero/blob/master/src/arrow/encode/index.js) Format this table as binary data in the [Apache Arrow](https://arrow.apache.org/docs/js/) IPC format. The binary data may be saved to disk or passed between processes or tools. For example, when using [Web Workers](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers), the output of this method can be passed directly between threads (no data copy) as a [Transferable](https://developer.mozilla.org/en-US/docs/Web/API/Transferable) object. Additionally, Arrow binary data can be loaded in other language environments such as [Python](https://arrow.apache.org/docs/python/) or [R](https://arrow.apache.org/docs/r/). -This method will throw an error if type inference fails or if the generated columns have differing lengths. This method is a shorthand for `table.toArrow().serialize()`. +This method will throw an error if type inference fails or if the generated columns have differing lengths. -* *options*: Options for Arrow encoding, same as [toArrow](#toArrow). +* *options*: Options for Arrow encoding, same as [toArrow](#toArrow) but with an additional *format* option. + * *format*: The Arrow IPC byte format to use. One of `'stream'` (default) or `'file'`. *Examples* @@ -652,7 +643,7 @@ const dt = table({ // encode table as a transferable Arrow byte buffer // here, infers Uint8 for 'x' and Float64 for 'y' -const bytes = dt.toArrowBuffer(); +const bytes = dt.toArrowIPC(); ```
# diff --git a/docs/index.md b/docs/index.md index 2a86f6f9..19315224 100644 --- a/docs/index.md +++ b/docs/index.md @@ -93,13 +93,9 @@ Arquero uses modern JavaScript features, and so will not work with some outdated ### In Node.js or Application Bundles -First install `arquero` as a dependency, for example via `npm install arquero --save`. Arquero assumes Node version 12 or higher. - -Import using CommonJS module syntax: - -```js -const aq = require('arquero'); -``` +First install `arquero` as a dependency, for example via `npm install arquero --save`. +Arquero assumes Node version 18 or higher. +As of Arquero version 6, the library uses type `module` and should be loaded using ES module syntax. Import using ES module syntax, import all exports into a single object: @@ -113,6 +109,12 @@ Import using ES module syntax, with targeted imports: import { op, table } from 'arquero'; ``` +Dynamic import (e.g., within a Node.js REPL): + +```js +aq = await import('arquero'); +``` + ## Build Instructions To build and develop Arquero locally: diff --git a/jsconfig.json b/jsconfig.json new file mode 100644 index 00000000..c1651dcc --- /dev/null +++ b/jsconfig.json @@ -0,0 +1,12 @@ +{ + "include": ["src/**/*"], + "compilerOptions": { + "allowJs": true, + "checkJs": true, + "noEmit": true, + "module": "node16", + "moduleResolution": "node16", + "target": "es2022", + "skipLibCheck": true + } +} diff --git a/package.json b/package.json index 95bca81f..8b375617 100644 --- a/package.json +++ b/package.json @@ -1,7 +1,7 @@ { "name": "arquero", "type": "module", - "version": "6.0.0", + "version": "6.0.0-beta", "description": "Query processing and transformation of array-backed data tables.", "keywords": [ "data", @@ -14,14 +14,12 @@ ], "license": "BSD-3-Clause", "author": "Jeffrey Heer (http://idl.cs.washington.edu)", - "main": "dist/arquero.node.js", - "module": "src/index-node.js", + "exports": "./src/index.js", "unpkg": "dist/arquero.min.js", "jsdelivr": "dist/arquero.min.js", "types": "dist/types/index.d.ts", "browser": { - "./dist/arquero.node.js": "./dist/arquero.min.js", - "./src/index-node.js": "./src/index.js" + "./src/index.js": "./src/index-browser.js" }, "repository": { "type": "git", @@ -33,7 +31,8 @@ "postbuild": "tsc", "perf": "TZ=America/Los_Angeles tape 'perf/**/*-perf.js'", "lint": "eslint src test", - "test": "TZ=America/Los_Angeles mocha 'test/**/*-test.js'", + "test": "TZ=America/Los_Angeles mocha 'test/**/*-test.js' --timeout 5000", + "posttest": "tsc --project jsconfig.json", "prepublishOnly": "npm test && npm run lint && npm run build" }, "dependencies": { diff --git a/perf/arrow-perf.js b/perf/arrow-perf.js index f4fad370..9f9c2f53 100644 --- a/perf/arrow-perf.js +++ b/perf/arrow-perf.js @@ -1,7 +1,7 @@ import tape from 'tape'; import { time } from './time.js'; import { bools, floats, ints, sample, strings } from './data-gen.js'; -import { fromArrow, table } from '../src/index.js'; +import { fromArrow, table, toArrow } from '../src/index.js'; import { Bool, Dictionary, Float64, Int32, Table, Uint32, Utf8, tableToIPC, vectorFromArray @@ -76,7 +76,7 @@ function encode(name, type, values) { const dt = table({ values }); // measure encoding times - const qt = time(() => tableToIPC(dt.toArrow({ types: { values: type } }))); + const qt = time(() => tableToIPC(toArrow(dt, { types: { values: type } }))); const at = time( () => tableToIPC(new Table({ values: vectorFromArray(values, type) })) ); @@ -86,7 +86,7 @@ function encode(name, type, values) { const ab = tableToIPC(new Table({ values: vectorFromArray(values, type) })).length; - const qb = tableToIPC(dt.toArrow({ types: { values: type }})).length; + const qb = tableToIPC(toArrow(dt, { types: { values: type }})).length; const jb = (new TextEncoder().encode(JSON.stringify(values))).length; // check that arrow and arquero produce the same result diff --git a/perf/csv-perf.js b/perf/csv-perf.js index b8e11643..fc024d97 100644 --- a/perf/csv-perf.js +++ b/perf/csv-perf.js @@ -1,11 +1,11 @@ import tape from 'tape'; import { time } from './time.js'; import { bools, dates, floats, ints, sample, strings } from './data-gen.js'; -import { fromCSV, table } from '../src/index.js'; +import { toCSV as _toCSV, fromCSV, table } from '../src/index.js'; function toCSV(...values) { const cols = values.map((v, i) => [`col${i}`, v]); - return table(cols).toCSV(); + return _toCSV(table(cols)); } function parse(csv, opt) { diff --git a/perf/filter-perf.js b/perf/filter-perf.js index 906e9329..5584ccd6 100644 --- a/perf/filter-perf.js +++ b/perf/filter-perf.js @@ -19,21 +19,21 @@ function run(N, nulls, msg) { table: time(() => dt.filter('d.a > 0')), reify: time(() => dt.filter('d.a > 0').reify()), object: time(a => a.filter(d => d.a > 0), dt.objects()), - array: time(a => a.filter(v => v > 0), dt.column('a').data) + array: time(a => a.filter(v => v > 0), dt.column('a')) }, { type: 'float', table: time(() => dt.filter('d.b > 0')), reify: time(() => dt.filter('d.b > 0').reify()), object: time(a => a.filter(d => d.b > 0), dt.objects()), - array: time(a => a.filter(v => v > 0), dt.column('b').data) + array: time(a => a.filter(v => v > 0), dt.column('b')) }, { type: 'string', table: time(() => dt.filter(`d.c === '${str}'`)), reify: time(() => dt.filter(`d.c === '${str}'`).reify()), object: time(a => a.filter(d => d.c === str), dt.objects()), - array: time(a => a.filter(v => v === str), dt.column('c').data) + array: time(a => a.filter(v => v === str), dt.column('c')) } ]); t.end(); diff --git a/rollup.config.js b/rollup.config.js index fc1962b1..45d59c88 100644 --- a/rollup.config.js +++ b/rollup.config.js @@ -2,14 +2,8 @@ import bundleSize from 'rollup-plugin-bundle-size'; import { nodeResolve } from '@rollup/plugin-node-resolve'; import terser from '@rollup/plugin-terser'; -function onwarn(warning, defaultHandler) { - if (warning.code !== 'CIRCULAR_DEPENDENCY') { - defaultHandler(warning); - } -} - const name = 'aq'; -const external = [ 'apache-arrow', 'node-fetch' ]; +const external = [ 'apache-arrow' ]; const globals = { 'apache-arrow': 'Arrow' }; const plugins = [ bundleSize(), @@ -18,23 +12,9 @@ const plugins = [ export default [ { - input: 'src/index-node.js', - external: ['acorn'].concat(external), - plugins, - onwarn, - output: [ - { - file: 'dist/arquero.node.js', - format: 'cjs', - name - } - ] - }, - { - input: 'src/index.js', + input: 'src/index-browser.js', external, plugins, - onwarn, output: [ { file: 'dist/arquero.js', diff --git a/src/api.js b/src/api.js new file mode 100644 index 00000000..9960a0b8 --- /dev/null +++ b/src/api.js @@ -0,0 +1,32 @@ +// export internal class and method definitions +export { BitSet } from './table/BitSet.js'; +export { Table } from './table/Table.js'; +export { ColumnTable } from './table/ColumnTable.js'; +export { default as Reducer } from './verbs/reduce/reducer.js'; +export { default as parse } from './expression/parse.js'; +export { default as walk_ast } from './expression/ast/walk.js'; + +// public API +export { seed } from './util/random.js'; +export { default as fromArrow } from './arrow/from-arrow.js'; +export { default as fromCSV } from './format/from-csv.js'; +export { default as fromFixed } from './format/from-fixed.js'; +export { default as fromJSON } from './format/from-json.js'; +export { default as toArrow } from './arrow/to-arrow.js'; +export { default as toArrowIPC } from './arrow/to-arrow-ipc.js'; +export { default as toCSV } from './format/to-csv.js'; +export { default as toHTML } from './format/to-html.js'; +export { default as toJSON } from './format/to-json.js'; +export { default as toMarkdown } from './format/to-markdown.js'; +export { default as bin } from './helpers/bin.js'; +export { default as escape } from './helpers/escape.js'; +export { default as desc } from './helpers/desc.js'; +export { default as field } from './helpers/field.js'; +export { default as frac } from './helpers/frac.js'; +export { default as names } from './helpers/names.js'; +export { default as rolling } from './helpers/rolling.js'; +export { all, endswith, matches, not, range, startswith } from './helpers/selection.js'; +export { default as agg } from './verbs/helpers/agg.js'; +export { default as op } from './op/op-api.js'; +export { addAggregateFunction, addFunction, addWindowFunction } from './op/register.js'; +export { table, from } from './table/index.js'; diff --git a/src/arrow/arrow-column.js b/src/arrow/arrow-column.js index ce6cb96c..a8b43f10 100644 --- a/src/arrow/arrow-column.js +++ b/src/arrow/arrow-column.js @@ -1,64 +1,177 @@ -import arrowDictionary from './arrow-dictionary.js'; +import sequence from '../op/functions/sequence.js'; import error from '../util/error.js'; +import isFunction from '../util/is-function.js'; import repeat from '../util/repeat.js'; import toString from '../util/to-string.js'; import unroll from '../util/unroll.js'; -import { isDict, isFixedSizeList, isList, isStruct, isUtf8 } from './arrow-types.js'; -const isListType = type => isList(type) || isFixedSizeList(type); +// Hardwire Arrow type ids to sidestep hard dependency +// https://github.com/apache/arrow/blob/master/js/src/enum.ts +const isDict = ({ typeId }) => typeId === -1; +const isInt = ({ typeId }) => typeId === 2; +const isUtf8 = ({ typeId }) => typeId === 5; +const isDecimal = ({ typeId }) => typeId === 7; +const isDate = ({ typeId }) => typeId === 8; +const isTimestamp = ({ typeId }) => typeId === 10; +const isStruct = ({ typeId }) => typeId === 13; +const isLargeUtf8 = ({ typeId }) => typeId === 20; +const isListType = ({ typeId }) => typeId === 12 || typeId === 16; /** * Create an Arquero column that proxies access to an Arrow column. - * @param {object} arrow An Apache Arrow column. - * @return {import('../table/column').ColumnType} An Arquero-compatible column. + * @param {import('apache-arrow').Vector} vector An Apache Arrow column. + * @param {import('./types.js').ArrowColumnOptions} [options] + * Arrow conversion options. + * @return {import('../table/types.js').ColumnType} + * An Arquero-compatible column. */ -export default function arrowColumn(vector, nested) { +export default function arrowColumn(vector, options) { + return isDict(vector.type) + ? dictionaryColumn(vector) + : proxyColumn(vector, options); +} + +/** + * Internal method for Arquero column generation for Apache Arrow data + * @param {import('apache-arrow').Vector} vector An Apache Arrow column. + * @param {import('./types.js').ArrowColumnOptions} [options] + * Arrow conversion options. + * @return {import('../table/types.js').ColumnType} + * An Arquero-compatible column. + */ +function proxyColumn(vector, options = {}) { const { type, length, numChildren } = vector; - if (isDict(type)) return arrowDictionary(vector); + const { + convertDate = true, + convertDecimal = true, + convertTimestamp = true, + convertBigInt = false, + memoize = true + } = options; - const get = numChildren && nested ? getNested(vector) - : numChildren ? memoize(getNested(vector)) - : isUtf8(type) ? memoize(row => vector.get(row)) - : null; + // create a getter method for retrieving values + let get; + if (numChildren) { + // extract lists/structs to JS objects, possibly memoized + get = getNested(vector, options); + if (memoize) get = memoized(length, get); + } else if (memoize && (isUtf8(type) || isLargeUtf8(type))) { + // memoize string extraction + get = memoized(length, row => vector.get(row)); + } else if ((convertDate && isDate(type)) + || (convertTimestamp && isTimestamp(type))) { + // convert to Date type, memoized for object equality + get = memoized(length, row => { + const v = vector.get(row); + return v == null ? null : new Date(vector.get(row)); + }); + } else if (convertDecimal && isDecimal(type)) { + // map decimal to number + const scale = 1 / Math.pow(10, type.scale); + get = row => { + const v = vector.get(row); + return v == null ? null : decimalToNumber(v, scale); + }; + } else if (convertBigInt && isInt(type) && type.bitWidth >= 64) { + // map bigint to number + get = row => { + const v = vector.get(row); + return v == null ? null : Number(v); + }; + } else if (!isFunction(vector.at)) { + // backwards compatibility with older arrow versions + // the vector `at` method was added in Arrow v16 + get = row => vector.get(row); + } else { + // use the arrow column directly + return vector; + } - return get - ? { vector, length, get, [Symbol.iterator]: () => iterator(length, get) } - : vector; + // return a column proxy object using custom getter + return { + length, + at: get, + [Symbol.iterator]: () => (function* () { + for (let i = 0; i < length; ++i) { + yield get(i); + } + })() + }; } -function memoize(get) { - const values = []; +/** + * Memoize expensive getter calls by caching retrieved values. + */ +function memoized(length, get) { + const values = Array(length); return row => { const v = values[row]; return v !== undefined ? v : (values[row] = get(row)); }; } -function* iterator(n, get) { - for (let i = 0; i < n; ++i) { - yield get(i); +// generate base values for big integers represented as a Uint32Array +const BASE32 = Array.from( + { length: 8 }, + (_, i) => Math.pow(2, i * 32) +); + +/** + * Convert a fixed point decimal value to a double precision number. + * Note: if the value is sufficiently large the conversion may be lossy! + * @param {Uint32Array & { signed: boolean }} v a fixed point decimal value + * @param {number} scale a scale factor, corresponding to the + * number of fractional decimal digits in the fixed point value + * @return {number} the resulting number + */ +function decimalToNumber(v, scale) { + const n = v.length; + let x = 0; + if (v.signed && (v[n - 1] | 0) < 0) { + for (let i = 0; i < n; ++i) { + x += ~v[i] * BASE32[i]; + } + x = -(x + 1); + } else { + for (let i = 0; i < n; ++i) { + x += v[i] * BASE32[i]; + } } + return x * scale; } -const arrayFrom = vector => vector.numChildren - ? repeat(vector.length, getNested(vector)) - : vector.nullCount ? [...vector] - : vector.toArray(); +// get an array for a given vector +function arrayFrom(vector, options) { + return vector.numChildren ? repeat(vector.length, getNested(vector, options)) + : vector.nullCount ? [...vector] + : vector.toArray(); +} -const getNested = vector => isListType(vector.type) ? getList(vector) - : isStruct(vector.type) ? getStruct(vector) - : error(`Unsupported Arrow type: ${toString(vector.VectorName)}`); +// generate a getter for a nested data type +function getNested(vector, options) { + return isListType(vector.type) ? getList(vector, options) + : isStruct(vector.type) ? getStruct(vector, options) + : error(`Unsupported Arrow type: ${toString(vector.VectorName)}`); +} -const getList = vector => vector.nullCount - ? row => vector.isValid(row) ? arrayFrom(vector.get(row)) : null - : row => arrayFrom(vector.get(row)); +// generate a getter for a list data type +function getList(vector, options) { + return vector.nullCount + ? row => vector.isValid(row) + ? arrayFrom(vector.get(row), options) + : null + : row => arrayFrom(vector.get(row), options); +} -function getStruct(vector) { +// generate a getter for a struct (object) data type +function getStruct(vector, options) { + // disable memoization for nested columns as we extract JS objects + const opt = { ...options, memoize: false }; const props = []; const code = []; vector.type.children.forEach((field, i) => { - props.push(arrowColumn(vector.getChildAt(i), true)); - code.push(`${toString(field.name)}:_${i}.get(row)`); + props.push(arrowColumn(vector.getChildAt(i), opt)); + code.push(`${toString(field.name)}:_${i}.at(row)`); }); const get = unroll('row', '({' + code + '})', props); @@ -66,3 +179,99 @@ function getStruct(vector) { ? row => vector.isValid(row) ? get(row) : null : get; } + +/** + * Create a new Arquero column that proxies access to an + * Apache Arrow dictionary column. + * @param {import('apache-arrow').Vector} vector + * An Apache Arrow dictionary column. + */ +function dictionaryColumn(vector) { + const { data, length, nullCount } = vector; + const dictionary = data[data.length - 1].dictionary; + const size = dictionary.length; + const keys = dictKeys(data || [vector], length, nullCount, size); + const get = memoized(size, + k => k == null || k < 0 || k >= size ? null : dictionary.get(k) + ); + + return { + vector, + length, + at: row => get(keys[row]), + key: row => keys[row], + keyFor(value) { + if (value === null) return nullCount ? size : -1; + for (let i = 0; i < size; ++i) { + if (get(i) === value) return i; + } + return -1; + }, + groups(names) { + const s = size + (nullCount ? 1 : 0); + return { keys, get: [get], names, rows: sequence(0, s), size: s }; + }, + [Symbol.iterator]() { + return vector[Symbol.iterator](); + } + }; +} + +/** + * Generate a dictionary key array. + * @param {readonly any[]} chunks Arrow column chunks + * @param {number} length The length of the Arrow column + * @param {number} nulls The count of column null values + * @param {number} size The backing dictionary size + */ +function dictKeys(chunks, length, nulls, size) { + const v = chunks.length > 1 || nulls + ? flatten(chunks, length, chunks[0].type.indices) + : chunks[0].values; + return nulls ? nullKeys(chunks, v, size) : v; +} + +/** + * Flatten Arrow column chunks into a single array. + */ +function flatten(chunks, length, type) { + const array = new type.ArrayType(length); + const n = chunks.length; + for (let i = 0, idx = 0, len; i < n; ++i) { + len = chunks[i].length; + array.set(chunks[i].values.subarray(0, len), idx); + idx += len; + } + return array; +} + +/** + * Encode null values as an additional dictionary key. + * Returns a new key array with null values added. + * TODO: safeguard against integer overflow? + */ +function nullKeys(chunks, keys, key) { + // iterate over null bitmaps, encode null values as key + const n = chunks.length; + for (let i = 0, idx = 0, m, base, bits, byte; i < n; ++i) { + bits = chunks[i].nullBitmap; + m = chunks[i].length >> 3; + if (bits && bits.length) { + for (let j = 0; j <= m; ++j) { + if ((byte = bits[j]) !== 255) { + base = idx + (j << 3); + if ((byte & (1 << 0)) === 0) keys[base + 0] = key; + if ((byte & (1 << 1)) === 0) keys[base + 1] = key; + if ((byte & (1 << 2)) === 0) keys[base + 2] = key; + if ((byte & (1 << 3)) === 0) keys[base + 3] = key; + if ((byte & (1 << 4)) === 0) keys[base + 4] = key; + if ((byte & (1 << 5)) === 0) keys[base + 5] = key; + if ((byte & (1 << 6)) === 0) keys[base + 6] = key; + if ((byte & (1 << 7)) === 0) keys[base + 7] = key; + } + } + } + idx += chunks[i].length; + } + return keys; +} diff --git a/src/arrow/arrow-dictionary.js b/src/arrow/arrow-dictionary.js deleted file mode 100644 index 6efdab05..00000000 --- a/src/arrow/arrow-dictionary.js +++ /dev/null @@ -1,104 +0,0 @@ -import sequence from '../op/functions/sequence.js'; - -/** - * Create a new Arquero column that proxies access to an - * Apache Arrow dictionary column. - * @param {object} vector An Apache Arrow dictionary column. - */ -export default function(vector) { - const { data, length, nullCount } = vector; - const dictionary = data[data.length - 1].dictionary; - const size = dictionary.length; - const keys = dictKeys(data || [vector], length, nullCount, size); - const values = Array(size); - - const value = k => k == null || k < 0 || k >= size ? null - : values[k] !== undefined ? values[k] - : (values[k] = dictionary.get(k)); - - return { - vector, - length, - - get: row => value(keys[row]), - - key: row => keys[row], - - keyFor(value) { - if (value === null) return nullCount ? size : -1; - for (let i = 0; i < size; ++i) { - if (values[i] === undefined) values[i] = dictionary.get(i); - if (values[i] === value) return i; - } - return -1; - }, - - groups(names) { - const s = size + (nullCount ? 1 : 0); - return { keys, get: [value], names, rows: sequence(0, s), size: s }; - }, - - [Symbol.iterator]() { - return vector[Symbol.iterator](); - } - }; -} - -/** - * Generate a dictionary key array - * @param {object[]} chunks Arrow column chunks - * @param {number} length The length of the Arrow column - * @param {number} nulls The count of column null values - * @param {number} size The backing dictionary size - */ -function dictKeys(chunks, length, nulls, size) { - const v = chunks.length > 1 || nulls - ? flatten(chunks, length, chunks[0].type.indices) - : chunks[0].values; - return nulls ? nullKeys(chunks, v, size) : v; -} - -/** - * Flatten Arrow column chunks into a single array. - */ -function flatten(chunks, length, type) { - const array = new type.ArrayType(length); - const n = chunks.length; - for (let i = 0, idx = 0, len; i < n; ++i) { - len = chunks[i].length; - array.set(chunks[i].values.subarray(0, len), idx); - idx += len; - } - return array; -} - -/** - * Encode null values as an additional dictionary key. - * Returns a new key array with null values added. - * TODO: safeguard against integer overflow? - */ -function nullKeys(chunks, keys, key) { - // iterate over null bitmaps, encode null values as key - const n = chunks.length; - for (let i = 0, idx = 0, m, base, bits, byte; i < n; ++i) { - bits = chunks[i].nullBitmap; - m = chunks[i].length >> 3; - if (bits && bits.length) { - for (let j = 0; j <= m; ++j) { - if ((byte = bits[j]) !== 255) { - base = idx + (j << 3); - if ((byte & (1 << 0)) === 0) keys[base + 0] = key; - if ((byte & (1 << 1)) === 0) keys[base + 1] = key; - if ((byte & (1 << 2)) === 0) keys[base + 2] = key; - if ((byte & (1 << 3)) === 0) keys[base + 3] = key; - if ((byte & (1 << 4)) === 0) keys[base + 4] = key; - if ((byte & (1 << 5)) === 0) keys[base + 5] = key; - if ((byte & (1 << 6)) === 0) keys[base + 6] = key; - if ((byte & (1 << 7)) === 0) keys[base + 7] = key; - } - } - } - idx += chunks[i].length; - } - return keys; -} diff --git a/src/arrow/arrow-table.js b/src/arrow/arrow-table.js index f3de0de4..de93d647 100644 --- a/src/arrow/arrow-table.js +++ b/src/arrow/arrow-table.js @@ -1,27 +1,38 @@ -import { Table, tableFromIPC } from 'apache-arrow'; +import { Table, tableFromIPC, tableToIPC } from 'apache-arrow'; import error from '../util/error.js'; -const fail = () => error( +const fail = (cause) => error( 'Apache Arrow not imported, ' + - 'see https://github.com/uwdata/arquero#usage' + 'see https://github.com/uwdata/arquero#usage', + cause ); -export function table() { +export function arrowTable(...args) { // trap access to provide a helpful message // when Apache Arrow has not been imported try { - return Table; - } catch (err) { // eslint-disable-line no-unused-vars - fail(); + return new Table(...args); + } catch (err) { + fail(err); } } -export function fromIPC() { +export function arrowTableFromIPC(bytes) { // trap access to provide a helpful message // when Apache Arrow has not been imported try { - return tableFromIPC; - } catch (err) { // eslint-disable-line no-unused-vars - fail(); + return tableFromIPC(bytes); + } catch (err) { + fail(err); + } +} + +export function arrowTableToIPC(table, format) { + // trap access to provide a helpful message + // when Apache Arrow has not been imported + try { + return tableToIPC(table, format); + } catch (err) { + fail(err); } } diff --git a/src/arrow/arrow-types.js b/src/arrow/arrow-types.js deleted file mode 100644 index d008cb54..00000000 --- a/src/arrow/arrow-types.js +++ /dev/null @@ -1,8 +0,0 @@ -// Hardwire Arrow type ids to sidestep dependency -// https://github.com/apache/arrow/blob/master/js/src/enum.ts - -export const isDict = ({ typeId }) => typeId === -1; -export const isUtf8 = ({ typeId }) => typeId === 5; -export const isList = ({ typeId }) => typeId === 12; -export const isStruct = ({ typeId }) => typeId === 13; -export const isFixedSizeList = ({ typeId }) => typeId === 16; diff --git a/src/arrow/encode/index.js b/src/arrow/encode/index.js deleted file mode 100644 index ded907db..00000000 --- a/src/arrow/encode/index.js +++ /dev/null @@ -1,77 +0,0 @@ -import { Table } from 'apache-arrow'; // eslint-disable-line no-unused-vars - -import dataFromObjects from './data-from-objects.js'; -import dataFromTable from './data-from-table.js'; -import { scanArray, scanTable } from './scan.js'; -import { table } from '../arrow-table.js'; -import error from '../../util/error.js'; -import isArray from '../../util/is-array.js'; -import isFunction from '../../util/is-function.js'; - -/** - * Options for Arrow encoding. - * @typedef {object} ArrowFormatOptions - * @property {number} [limit=Infinity] The maximum number of rows to include. - * @property {number} [offset=0] The row offset indicating how many initial - * rows to skip. - * @property {string[]|(data: object) => string[]} [columns] Ordered list of - * column names to include. If function-valued, the function should accept - * a dataset as input and return an array of column name strings. - * @property {object} [types] The Arrow data types to use. If specified, - * the input should be an object with column names for keys and Arrow data - * types for values. If a column type is not explicitly provided, type - * inference will be performed to guess an appropriate type. - */ - -/** - * Create an Apache Arrow table for an input dataset. - * @param {Array|object} data An input dataset to convert to Arrow format. - * If array-valued, the data should consist of an array of objects where - * each entry represents a row and named properties represent columns. - * Otherwise, the input data should be an Arquero table. - * @param {ArrowFormatOptions} [options] Encoding options, including - * column data types. - * @return {Table} An Apache Arrow Table instance. - */ -export default function(data, options = {}) { - const { types = {} } = options; - const { dataFrom, names, nrows, scan } = init(data, options); - const cols = {}; - names.forEach(name => { - const col = dataFrom(data, name, nrows, scan, types[name]); - if (col.length !== nrows) { - error('Column length mismatch'); - } - cols[name] = col; - }); - const T = table(); - return new T(cols); -} - -function init(data, options) { - const { columns, limit = Infinity, offset = 0 } = options; - const names = isFunction(columns) ? columns(data) - : isArray(columns) ? columns - : null; - if (isArray(data)) { - return { - dataFrom: dataFromObjects, - names: names || Object.keys(data[0]), - nrows: Math.min(limit, data.length - offset), - scan: scanArray(data, limit, offset) - }; - } else if (isTable(data)) { - return { - dataFrom: dataFromTable, - names: names || data.columnNames(), - nrows: Math.min(limit, data.numRows() - offset), - scan: scanTable(data, limit, offset) - }; - } else { - error('Unsupported input data type'); - } -} - -function isTable(data) { - return data && isFunction(data.reify); -} diff --git a/src/arrow/encode/profiler.js b/src/arrow/encode/profiler.js index 2e0a3e5c..16d1b7ff 100644 --- a/src/arrow/encode/profiler.js +++ b/src/arrow/encode/profiler.js @@ -100,6 +100,7 @@ function infer(p) { return Type.Float64; } else if (p.bigints === valid) { + // @ts-ignore const v = -p.min > p.max ? -p.min - 1n : p.max; return p.min < 0 ? v < 2 ** 63 ? Type.Int64 diff --git a/src/arrow/encode/scan.js b/src/arrow/encode/scan.js index e91bbb37..b12a5886 100644 --- a/src/arrow/encode/scan.js +++ b/src/arrow/encode/scan.js @@ -14,11 +14,15 @@ export function scanTable(table, limit, offset) { && !table.isFiltered() && !table.isOrdered(); return (column, visit) => { + const isArray = isArrayType(column); let i = -1; - scanAll && isArrayType(column.data) - ? column.data.forEach(visit) + scanAll && isArray + ? column.forEach(visit) : table.scan( - row => visit(column.get(row), ++i), + // optimize column value access + isArray + ? row => visit(column[row], ++i) + : row => visit(column.at(row), ++i), true, limit, offset ); }; diff --git a/src/arrow/from-arrow.js b/src/arrow/from-arrow.js new file mode 100644 index 00000000..cd9a0602 --- /dev/null +++ b/src/arrow/from-arrow.js @@ -0,0 +1,39 @@ +import { arrowTableFromIPC } from './arrow-table.js'; +import arrowColumn from './arrow-column.js'; +import resolve, { all } from '../helpers/selection.js'; +import { columnSet } from '../table/ColumnSet.js'; +import { ColumnTable } from '../table/ColumnTable.js'; + +/** + * Create a new table backed by an Apache Arrow table instance. + * @param {import('./types.js').ArrowInput} arrow + * An Apache Arrow data table or Arrow IPC byte buffer. + * @param {import('./types.js').ArrowOptions} [options] + * Options for Arrow import. + * @return {ColumnTable} A new table containing the imported values. + */ +export default function(arrow, options) { + if (arrow instanceof ArrayBuffer || ArrayBuffer.isView(arrow)) { + arrow = arrowTableFromIPC(arrow); + } + + const { + columns = all(), + ...columnOptions + } = options || {}; + + // resolve column selection + const fields = arrow.schema.fields.map(f => f.name); + const sel = resolve({ + columnNames: test => test ? fields.filter(test) : fields.slice(), + columnIndex: name => fields.indexOf(name) + }, columns); + + // build Arquero columns for backing Arrow columns + const cols = columnSet(); + sel.forEach((name, key) => { + cols.add(name, arrowColumn(arrow.getChild(key), columnOptions)); + }); + + return new ColumnTable(cols.data, cols.names); +} diff --git a/src/arrow/to-arrow-ipc.js b/src/arrow/to-arrow-ipc.js new file mode 100644 index 00000000..9c079ec6 --- /dev/null +++ b/src/arrow/to-arrow-ipc.js @@ -0,0 +1,18 @@ +import { arrowTableToIPC } from './arrow-table.js'; +import toArrow from './to-arrow.js'; + +/** + * Format a table as binary data in the Apache Arrow IPC format. + * @param {object[]|import('../table/Table.js').Table} data The table data + * @param {import('./types.js').ArrowIPCFormatOptions} [options] + * The Arrow IPC formatting options. Set the *format* option to `'stream'` + * or `'file'` to specify the IPC format. + * @return {Uint8Array} A new Uint8Array of Arrow-encoded binary data. + */ +export default function(data, options = {}) { + const { format = 'stream', ...toArrowOptions } = options; + if (!['stream', 'file'].includes(format)) { + throw Error('Unrecognised Arrow IPC output format'); + } + return arrowTableToIPC(toArrow(data, toArrowOptions), format); +} diff --git a/src/arrow/to-arrow.js b/src/arrow/to-arrow.js new file mode 100644 index 00000000..e4adb132 --- /dev/null +++ b/src/arrow/to-arrow.js @@ -0,0 +1,59 @@ +import { arrowTable } from './arrow-table.js'; +import dataFromObjects from './encode/data-from-objects.js'; +import dataFromTable from './encode/data-from-table.js'; +import { scanArray, scanTable } from './encode/scan.js'; +import error from '../util/error.js'; +import isArray from '../util/is-array.js'; +import isFunction from '../util/is-function.js'; + +/** + * Create an Apache Arrow table for an input dataset. + * @param {object[]|import('../table/Table.js').Table} data An input dataset + * to convert to Arrow format. If array-valued, the data should consist of an + * array of objects where each entry represents a row and named properties + * represent columns. Otherwise, the input data should be an Arquero table. + * @param {import('./types.js').ArrowFormatOptions} [options] + * Encoding options, including column data types. + * @return {import('apache-arrow').Table} An Apache Arrow Table instance. + */ +export default function(data, options = {}) { + const { types = {} } = options; + const { dataFrom, names, nrows, scan } = init(data, options); + const cols = {}; + names.forEach(name => { + const col = dataFrom(data, name, nrows, scan, types[name]); + if (col.length !== nrows) { + error('Column length mismatch'); + } + cols[name] = col; + }); + return arrowTable(cols); +} + +function init(data, options) { + const { columns, limit = Infinity, offset = 0 } = options; + const names = isFunction(columns) ? columns(data) + : isArray(columns) ? columns + : null; + if (isArray(data)) { + return { + dataFrom: dataFromObjects, + names: names || Object.keys(data[0]), + nrows: Math.min(limit, data.length - offset), + scan: scanArray(data, limit, offset) + }; + } else if (isTable(data)) { + return { + dataFrom: dataFromTable, + names: names || data.columnNames(), + nrows: Math.min(limit, data.numRows() - offset), + scan: scanTable(data, limit, offset) + }; + } else { + error('Unsupported input data type'); + } +} + +function isTable(data) { + return data && isFunction(data.reify); +} diff --git a/src/arrow/types.ts b/src/arrow/types.ts new file mode 100644 index 00000000..f3cf8f10 --- /dev/null +++ b/src/arrow/types.ts @@ -0,0 +1,91 @@ +import { DataType, Table } from 'apache-arrow'; +import type { Select, TypedArray } from '../table/types.js'; + +/** Arrow input data as bytes or loaded table. */ +export type ArrowInput = + | ArrayBuffer + | TypedArray + | Table; + +/** Options for Apache Arrow column conversion. */ +export interface ArrowColumnOptions { + /** + * Flag (default `true`) to convert Arrow date values to JavaScript Date + * objects. If false, defaults to what the Arrow implementation provides, + * typically timestamps as number values. + */ + convertDate?: boolean; + /** + * Flag (default `true`) to convert Arrow fixed point decimal values to + * JavaScript numbers. If false, defaults to what the Arrow implementation + * provides, typically byte arrays. The conversion will be lossy if the + * decimal can not be exactly represented as a double-precision floating + * point number. + */ + convertDecimal?: boolean; + /** + * Flag (default `true`) to convert Arrow timestamp values to JavaScript + * Date objects. If false, defaults to what the Arrow implementation + * provides, typically timestamps as number values. + */ + convertTimestamp?: boolean; + /** + * Flag (default `false`) to convert Arrow integers with bit widths of 64 + * bits or higher to JavaScript numbers. If false, defaults to what the + * Arrow implementation provides, typically `BigInt` values. The conversion + * will be lossy if the integer is so large it can not be exactly + * represented as a double-precision floating point number. + */ + convertBigInt?: boolean; + /** + * A hint (default `true`) to enable memoization of expensive conversions. + * If true, memoization is applied for string and nested (list, struct) + * types, caching extracted values to enable faster access. Memoization + * is also applied to converted Date values, in part to ensure exact object + * equality. This hint is ignored for dictionary columns, whose values are + * always memoized. + */ + memoize?: boolean; +} + +/** Options for Apache Arrow import. */ +export interface ArrowOptions extends ArrowColumnOptions { + /** + * An ordered set of columns to import. The input may consist of column name + * strings, column integer indices, objects with current column names as + * keys and new column names as values (for renaming), or selection helper + * functions such as *all*, *not*, or *range*. + */ + columns?: Select; +} + +/** Options for Arrow encoding. */ +export interface ArrowFormatOptions { + /** The maximum number of rows to include (default `Infinity`). */ + limit?: number; + /** + * The row offset (default `0`) indicating how many initial rows to skip. + */ + offset?: number; + /** + * Ordered list of column names to include. If function-valued, the + * function should accept a dataset as input and return an array of + * column name strings. If unspecified all columns are included. + */ + columns?: string[] | ((data: any) => string[]); + /** + * The Arrow data types to use. If specified, the input should be an + * object with column names for keys and Arrow data types for values. + * If a column type is not explicitly provided, type inference will be + * performed to guess an appropriate type. + */ + types?: Record; +} + +/** Options for Arrow IPC encoding. */ +export interface ArrowIPCFormatOptions extends ArrowFormatOptions { + /** + * The Arrow IPC byte format to use. One of `'stream'` (default) or `'file'`. + */ + format?: 'stream' | 'file'; +} diff --git a/src/engine/derive.js b/src/engine/derive.js deleted file mode 100644 index 2b829f84..00000000 --- a/src/engine/derive.js +++ /dev/null @@ -1,78 +0,0 @@ -import { window } from './window/window.js'; -import { aggregate } from './reduce/util.js'; -import { hasWindow } from '../op/index.js'; -import columnSet from '../table/column-set.js'; -import repeat from '../util/repeat.js'; - -function isWindowed(op) { - return hasWindow(op.name) || - op.frame && ( - Number.isFinite(op.frame[0]) || - Number.isFinite(op.frame[1]) - ); -} - -export default function(table, { names, exprs, ops }, options = {}) { - // instantiate output data - const total = table.totalRows(); - const cols = columnSet(options.drop ? null : table); - const data = names.map(name => cols.add(name, Array(total))); - - // analyze operations, compute non-windowed aggregates - const [ aggOps, winOps ] = segmentOps(ops); - - const size = table.isGrouped() ? table.groups().size : 1; - const result = aggregate( - table, aggOps, - repeat(ops.length, () => Array(size)) - ); - - // perform table scans to generate output values - winOps.length - ? window(table, data, exprs, result, winOps) - : output(table, data, exprs, result); - - return table.create(cols); -} - -function segmentOps(ops) { - const aggOps = []; - const winOps = []; - const n = ops.length; - - for (let i = 0; i < n; ++i) { - const op = ops[i]; - op.id = i; - (isWindowed(op) ? winOps : aggOps).push(op); - } - - return [aggOps, winOps]; -} - -function output(table, cols, exprs, result) { - const bits = table.mask(); - const data = table.data(); - const { keys } = table.groups() || {}; - const op = keys - ? (id, row) => result[id][keys[row]] - : id => result[id][0]; - - const m = cols.length; - for (let j = 0; j < m; ++j) { - const get = exprs[j]; - const col = cols[j]; - - // inline the following for performance: - // table.scan((i, data) => col[i] = get(i, data, op)); - if (bits) { - for (let i = bits.next(0); i >= 0; i = bits.next(i + 1)) { - col[i] = get(i, data, op); - } - } else { - const n = table.totalRows(); - for (let i = 0; i < n; ++i) { - col[i] = get(i, data, op); - } - } - } -} diff --git a/src/engine/filter.js b/src/engine/filter.js deleted file mode 100644 index 9c7469cc..00000000 --- a/src/engine/filter.js +++ /dev/null @@ -1,22 +0,0 @@ -import BitSet from '../table/bit-set.js'; - -export default function(table, predicate) { - const n = table.totalRows(); - const bits = table.mask(); - const data = table.data(); - const filter = new BitSet(n); - - // inline the following for performance: - // table.scan((row, data) => { if (predicate(row, data)) filter.set(row); }); - if (bits) { - for (let i = bits.next(0); i >= 0; i = bits.next(i + 1)) { - if (predicate(i, data)) filter.set(i); - } - } else { - for (let i = 0; i < n; ++i) { - if (predicate(i, data)) filter.set(i); - } - } - - return table.create({ filter }); -} diff --git a/src/engine/fold.js b/src/engine/fold.js deleted file mode 100644 index db74d238..00000000 --- a/src/engine/fold.js +++ /dev/null @@ -1,18 +0,0 @@ -import unroll from './unroll.js'; -import { aggregateGet } from './reduce/util.js'; - -export default function(table, { names = [], exprs = [], ops = [] }, options = {}) { - if (names.length === 0) return table; - - const [k = 'key', v = 'value'] = options.as || []; - const vals = aggregateGet(table, ops, exprs); - - return unroll( - table, - { - names: [k, v], - exprs: [() => names, (row, data) => vals.map(fn => fn(row, data))] - }, - { ...options, drop: names } - ); -} diff --git a/src/engine/groupby.js b/src/engine/groupby.js deleted file mode 100644 index e1678b43..00000000 --- a/src/engine/groupby.js +++ /dev/null @@ -1,51 +0,0 @@ -import { aggregateGet } from './reduce/util.js'; -import keyFunction from '../util/key-function.js'; - -export default function(table, exprs) { - return table.create({ - groups: createGroups(table, exprs) - }); -} - -function createGroups(table, { names = [], exprs = [], ops = [] }) { - const n = names.length; - if (n === 0) return null; - - // check for optimized path when grouping by a single field - // use pre-calculated groups if available - if (n === 1 && !table.isFiltered() && exprs[0].field) { - const col = table.column(exprs[0].field); - if (col.groups) return col.groups(names); - } - - let get = aggregateGet(table, ops, exprs); - const getKey = keyFunction(get); - const nrows = table.totalRows(); - const keys = new Uint32Array(nrows); - const index = {}; - const rows = []; - - // inline table scan for performance - const data = table.data(); - const bits = table.mask(); - if (bits) { - for (let i = bits.next(0); i >= 0; i = bits.next(i + 1)) { - const key = getKey(i, data) + ''; - const val = index[key]; - keys[i] = val != null ? val : (index[key] = rows.push(i) - 1); - } - } else { - for (let i = 0; i < nrows; ++i) { - const key = getKey(i, data) + ''; - const val = index[key]; - keys[i] = val != null ? val : (index[key] = rows.push(i) - 1); - } - } - - if (!ops.length) { - // capture data in closure, so no interaction with select - get = get.map(f => row => f(row, data)); - } - - return { keys, get, names, rows, size: rows.length }; -} diff --git a/src/engine/impute.js b/src/engine/impute.js deleted file mode 100644 index 44b5ed6b..00000000 --- a/src/engine/impute.js +++ /dev/null @@ -1,134 +0,0 @@ -import { aggregateGet } from './reduce/util.js'; -import columnSet from '../table/column-set.js'; -import isValid from '../util/is-valid.js'; -import keyFunction from '../util/key-function.js'; -import unroll from '../util/unroll.js'; - -export default function(table, values, keys, arrays) { - const write = keys && keys.length; - return impute( - write ? expand(table, keys, arrays) : table, - values, - write - ); -} - -function impute(table, { names, exprs, ops }, write) { - const gets = aggregateGet(table, ops, exprs); - const cols = write ? null : columnSet(table); - const rows = table.totalRows(); - - names.forEach((name, i) => { - const col = table.column(name); - const out = write ? col.data : cols.add(name, Array(rows)); - const get = gets[i]; - - table.scan(idx => { - const v = col.get(idx); - out[idx] = !isValid(v) ? get(idx) : v; - }); - }); - - return write ? table : table.create(cols); -} - -function expand(table, keys, values) { - const groups = table.groups(); - const data = table.data(); - - // expansion keys and accessors - const keyNames = (groups ? groups.names : []).concat(keys); - const keyGet = (groups ? groups.get : []) - .concat(keys.map(key => table.getter(key))); - - // build hash of existing rows - const hash = new Set(); - const keyTable = keyFunction(keyGet); - table.scan((idx, data) => hash.add(keyTable(idx, data))); - - // initialize output table data - const names = table.columnNames(); - const cols = columnSet(); - const out = names.map(name => cols.add(name, [])); - names.forEach((name, i) => { - const old = data[name]; - const col = out[i]; - table.scan(row => col.push(old.get(row))); - }); - - // enumerate expanded value sets and augment output table - const keyEnum = keyFunction(keyGet.map((k, i) => a => a[i])); - const set = unroll( - 'v', - '{' + out.map((_, i) => `_${i}.push(v[$${i}]);`).join('') + '}', - out, names.map(name => keyNames.indexOf(name)) - ); - - if (groups) { - let row = groups.keys.length; - const prod = values.reduce((p, a) => p * a.length, groups.size); - const keys = new Uint32Array(prod + (row - hash.size)); - keys.set(groups.keys); - enumerate(groups, values, (vec, idx) => { - if (!hash.has(keyEnum(vec))) { - set(vec); - keys[row++] = idx[0]; - } - }); - cols.groupby({ ...groups, keys }); - } else { - enumerate(groups, values, vec => { - if (!hash.has(keyEnum(vec))) set(vec); - }); - } - - return table.create(cols.new()); -} - -function enumerate(groups, values, callback) { - const offset = groups ? groups.get.length : 0; - const pad = groups ? 1 : 0; - const len = pad + values.length; - const lens = new Int32Array(len); - const idxs = new Int32Array(len); - const set = []; - - if (groups) { - const { get, rows, size } = groups; - lens[0] = size; - set.push((vec, idx) => { - const row = rows[idx]; - for (let i = 0; i < offset; ++i) { - vec[i] = get[i](row); - } - }); - } - - values.forEach((a, i) => { - const j = i + offset; - lens[i + pad] = a.length; - set.push((vec, idx) => vec[j] = a[idx]); - }); - - const vec = Array(offset + values.length); - - // initialize value vector - for (let i = 0; i < len; ++i) { - set[i](vec, 0); - } - callback(vec, idxs); - - // enumerate all combinations of values - for (let i = len - 1; i >= 0;) { - const idx = ++idxs[i]; - if (idx < lens[i]) { - set[i](vec, idx); - callback(vec, idxs); - i = len - 1; - } else { - idxs[i] = 0; - set[i](vec, 0); - --i; - } - } -} diff --git a/src/engine/join-filter.js b/src/engine/join-filter.js deleted file mode 100644 index 4af7a079..00000000 --- a/src/engine/join-filter.js +++ /dev/null @@ -1,60 +0,0 @@ -import { rowLookup } from './join/lookup.js'; -import BitSet from '../table/bit-set.js'; -import isArray from '../util/is-array.js'; - -export default function(tableL, tableR, predicate, options = {}) { - // calculate semi-join filter mask - const filter = new BitSet(tableL.totalRows()); - const join = isArray(predicate) ? hashSemiJoin : loopSemiJoin; - join(filter, tableL, tableR, predicate); - - // if anti-join, negate the filter - if (options.anti) { - filter.not().and(tableL.mask()); - } - - return tableL.create({ filter }); -} - -function hashSemiJoin(filter, tableL, tableR, [keyL, keyR]) { - // build lookup table - const lut = rowLookup(tableR, keyR); - - // scan table, update filter with matches - tableL.scan((rowL, data) => { - const rowR = lut.get(keyL(rowL, data)); - if (rowR >= 0) filter.set(rowL); - }); -} - -function loopSemiJoin(filter, tableL, tableR, predicate) { - const nL = tableL.numRows(); - const nR = tableR.numRows(); - const dataL = tableL.data(); - const dataR = tableR.data(); - - if (tableL.isFiltered() || tableR.isFiltered()) { - // use indices as at least one table is filtered - const idxL = tableL.indices(false); - const idxR = tableR.indices(false); - for (let i = 0; i < nL; ++i) { - const rowL = idxL[i]; - for (let j = 0; j < nR; ++j) { - if (predicate(rowL, dataL, idxR[j], dataR)) { - filter.set(rowL); - break; - } - } - } - } else { - // no filters, enumerate row indices directly - for (let i = 0; i < nL; ++i) { - for (let j = 0; j < nR; ++j) { - if (predicate(i, dataL, j, dataR)) { - filter.set(i); - break; - } - } - } - } -} diff --git a/src/engine/join.js b/src/engine/join.js deleted file mode 100644 index 82bcc1ad..00000000 --- a/src/engine/join.js +++ /dev/null @@ -1,110 +0,0 @@ -import { indexLookup } from './join/lookup.js'; -import columnSet from '../table/column-set.js'; -import concat from '../util/concat.js'; -import isArray from '../util/is-array.js'; -import unroll from '../util/unroll.js'; - -function emitter(columns, getters) { - const args = ['i', 'a', 'j', 'b']; - return unroll( - args, - '{' + concat(columns, (_, i) => `_${i}.push($${i}(${args}));`) + '}', - columns, getters - ); -} - -export default function(tableL, tableR, predicate, { names, exprs }, options = {}) { - // initialize data for left table - const dataL = tableL.data(); - const idxL = tableL.indices(false); - const nL = idxL.length; - const hitL = new Int32Array(nL); - - // initialize data for right table - const dataR = tableR.data(); - const idxR = tableR.indices(false); - const nR = idxR.length; - const hitR = new Int32Array(nR); - - // initialize output data - const ncols = names.length; - const cols = columnSet(); - const columns = Array(ncols); - const getters = Array(ncols); - for (let i = 0; i < names.length; ++i) { - columns[i] = cols.add(names[i], []); - getters[i] = exprs[i]; - } - const emit = emitter(columns, getters); - - // perform join - const join = isArray(predicate) ? hashJoin : loopJoin; - join(emit, predicate, dataL, dataR, idxL, idxR, hitL, hitR, nL, nR); - - if (options.left) { - for (let i = 0; i < nL; ++i) { - if (!hitL[i]) { - emit(idxL[i], dataL, -1, dataR); - } - } - } - - if (options.right) { - for (let j = 0; j < nR; ++j) { - if (!hitR[j]) { - emit(-1, dataL, idxR[j], dataR); - } - } - } - - return tableL.create(cols.new()); -} - -function loopJoin(emit, predicate, dataL, dataR, idxL, idxR, hitL, hitR, nL, nR) { - // perform nested-loops join - for (let i = 0; i < nL; ++i) { - const rowL = idxL[i]; - for (let j = 0; j < nR; ++j) { - const rowR = idxR[j]; - if (predicate(rowL, dataL, rowR, dataR)) { - emit(rowL, dataL, rowR, dataR); - hitL[i] = 1; - hitR[j] = 1; - } - } - } -} - -function hashJoin(emit, [keyL, keyR], dataL, dataR, idxL, idxR, hitL, hitR, nL, nR) { - // determine which table to hash - let dataScan, keyScan, hitScan, idxScan; - let dataHash, keyHash, hitHash, idxHash; - let emitScan = emit; - if (nL >= nR) { - dataScan = dataL; keyScan = keyL; hitScan = hitL; idxScan = idxL; - dataHash = dataR; keyHash = keyR; hitHash = hitR; idxHash = idxR; - } else { - dataScan = dataR; keyScan = keyR; hitScan = hitR; idxScan = idxR; - dataHash = dataL; keyHash = keyL; hitHash = hitL; idxHash = idxL; - emitScan = (i, a, j, b) => emit(j, b, i, a); - } - - // build lookup table - const lut = indexLookup(idxHash, dataHash, keyHash); - - // scan other table - const m = idxScan.length; - for (let j = 0; j < m; ++j) { - const rowScan = idxScan[j]; - const list = lut.get(keyScan(rowScan, dataScan)); - if (list) { - const n = list.length; - for (let k = 0; k < n; ++k) { - const i = list[k]; - emitScan(rowScan, dataScan, idxHash[i], dataHash); - hitHash[i] = 1; - } - hitScan[j] = 1; - } - } -} diff --git a/src/engine/lookup.js b/src/engine/lookup.js deleted file mode 100644 index 1c2ace61..00000000 --- a/src/engine/lookup.js +++ /dev/null @@ -1,33 +0,0 @@ -import { rowLookup } from './join/lookup.js'; -import { aggregateGet } from './reduce/util.js'; -import columnSet from '../table/column-set.js'; -import NULL from '../util/null.js'; -import concat from '../util/concat.js'; -import unroll from '../util/unroll.js'; - -export default function(tableL, tableR, [keyL, keyR], { names, exprs, ops }) { - // instantiate output data - const cols = columnSet(tableL); - const total = tableL.totalRows(); - names.forEach(name => cols.add(name, Array(total).fill(NULL))); - - // build lookup table - const lut = rowLookup(tableR, keyR); - - // generate setter function for lookup match - const set = unroll( - ['lr', 'rr', 'data'], - '{' + concat(names, (_, i) => `_[${i}][lr] = $[${i}](rr, data);`) + '}', - names.map(name => cols.data[name]), - aggregateGet(tableR, ops, exprs) - ); - - // find matching rows, set values on match - const dataR = tableR.data(); - tableL.scan((lrow, data) => { - const rrow = lut.get(keyL(lrow, data)); - if (rrow >= 0) set(lrow, rrow, dataR); - }); - - return tableL.create(cols); -} diff --git a/src/engine/orderby.js b/src/engine/orderby.js deleted file mode 100644 index 23adbdda..00000000 --- a/src/engine/orderby.js +++ /dev/null @@ -1,3 +0,0 @@ -export default function(table, comparator) { - return table.create({ order: comparator }); -} diff --git a/src/engine/pivot.js b/src/engine/pivot.js deleted file mode 100644 index a787a889..00000000 --- a/src/engine/pivot.js +++ /dev/null @@ -1,109 +0,0 @@ -import { aggregate, aggregateGet, groupOutput } from './reduce/util.js'; -import columnSet from '../table/column-set.js'; - -const opt = (value, defaultValue) => value != null ? value : defaultValue; - -export default function(table, on, values, options = {}) { - const { keys, keyColumn } = pivotKeys(table, on, options); - const vsep = opt(options.valueSeparator, '_'); - const namefn = values.names.length > 1 - ? (i, name) => name + vsep + keys[i] - : i => keys[i]; - - // perform separate aggregate operations for each key - // if keys do not match, emit NaN so aggregate skips it - // use custom toString method for proper field resolution - const results = keys.map( - k => aggregate(table, values.ops.map(op => { - if (op.name === 'count') { // fix #273 - const fn = r => k === keyColumn[r] ? 1 : NaN; - fn.toString = () => k + ':1'; - return { ...op, name: 'sum', fields: [fn] }; - } - const fields = op.fields.map(f => { - const fn = (r, d) => k === keyColumn[r] ? f(r, d) : NaN; - fn.toString = () => k + ':' + f; - return fn; - }); - return { ...op, fields }; - })) - ); - - return table.create(output(values, namefn, table.groups(), results)); -} - -function pivotKeys(table, on, options) { - const limit = options.limit > 0 ? +options.limit : Infinity; - const sort = opt(options.sort, true); - const ksep = opt(options.keySeparator, '_'); - - // construct key accessor function - const get = aggregateGet(table, on.ops, on.exprs); - const key = get.length === 1 - ? get[0] - : (row, data) => get.map(fn => fn(row, data)).join(ksep); - - // generate vector of per-row key values - const kcol = Array(table.totalRows()); - table.scan((row, data) => kcol[row] = key(row, data)); - - // collect unique key values - const uniq = aggregate( - table.ungroup(), - [ { - id: 0, - name: 'array_agg_distinct', - fields: [(row => kcol[row])], params: [] - } ] - )[0][0]; - - // get ordered set of unique key values - const keys = sort ? uniq.sort() : uniq; - - // return key values - return { - keys: Number.isFinite(limit) ? keys.slice(0, limit) : keys, - keyColumn: kcol - }; -} - -function output({ names, exprs }, namefn, groups, results) { - const size = groups ? groups.size : 1; - const cols = columnSet(); - const m = results.length; - const n = names.length; - - let result; - const op = (id, row) => result[id][row]; - - // write groupby fields to output - if (groups) groupOutput(cols, groups); - - // write pivot values to output - for (let i = 0; i < n; ++i) { - const get = exprs[i]; - if (get.field != null) { - // if expression is op only, use aggregates directly - for (let j = 0; j < m; ++j) { - cols.add(namefn(j, names[i]), results[j][get.field]); - } - } else if (size > 1) { - // if multiple groups, evaluate expression for each - for (let j = 0; j < m; ++j) { - result = results[j]; - const col = cols.add(namefn(j, names[i]), Array(size)); - for (let k = 0; k < size; ++k) { - col[k] = get(k, null, op); - } - } - } else { - // if only one group, no need to loop - for (let j = 0; j < m; ++j) { - result = results[j]; - cols.add(namefn(j, names[i]), [ get(0, null, op) ]); - } - } - } - - return cols.new(); -} diff --git a/src/engine/rollup.js b/src/engine/rollup.js deleted file mode 100644 index e67a60de..00000000 --- a/src/engine/rollup.js +++ /dev/null @@ -1,41 +0,0 @@ -import { aggregate, groupOutput } from './reduce/util.js'; -import columnSet from '../table/column-set.js'; - -export default function(table, { names, exprs, ops }) { - // output data - const cols = columnSet(); - const groups = table.groups(); - - // write groupby fields to output - if (groups) groupOutput(cols, groups); - - // compute and write aggregate output - output(names, exprs, groups, aggregate(table, ops), cols); - - // return output table - return table.create(cols.new()); -} - -function output(names, exprs, groups, result = [], cols) { - if (!exprs.length) return; - const size = groups ? groups.size : 1; - const op = (id, row) => result[id][row]; - const n = names.length; - - for (let i = 0; i < n; ++i) { - const get = exprs[i]; - if (get.field != null) { - // if expression is op only, use aggregates directly - cols.add(names[i], result[get.field]); - } else if (size > 1) { - // if multiple groups, evaluate expression for each - const col = cols.add(names[i], Array(size)); - for (let j = 0; j < size; ++j) { - col[j] = get(j, null, op); - } - } else { - // if only one group, no need to loop - cols.add(names[i], [ get(0, null, op) ]); - } - } -} diff --git a/src/engine/sample.js b/src/engine/sample.js deleted file mode 100644 index 927ef0e9..00000000 --- a/src/engine/sample.js +++ /dev/null @@ -1,38 +0,0 @@ -import sample from '../util/sample.js'; -import _shuffle from '../util/shuffle.js'; - -export default function(table, size, weight, options = {}) { - const { replace, shuffle } = options; - const parts = table.partitions(false); - - let total = 0; - size = parts.map((idx, group) => { - let s = size(group); - total += (s = (replace ? s : Math.min(idx.length, s))); - return s; - }); - - const samples = new Uint32Array(total); - let curr = 0; - - parts.forEach((idx, group) => { - const sz = size[group]; - const buf = samples.subarray(curr, curr += sz); - - if (!replace && sz === idx.length) { - // sample size === data size, no replacement - // no need to sample, just copy indices - buf.set(idx); - } else { - sample(buf, replace, idx, weight); - } - }); - - if (shuffle !== false && (parts.length > 1 || !replace)) { - // sampling with replacement methods shuffle, so in - // that case a single partition is already good to go - _shuffle(samples); - } - - return table.reify(samples); -} diff --git a/src/engine/select.js b/src/engine/select.js deleted file mode 100644 index 8521c76a..00000000 --- a/src/engine/select.js +++ /dev/null @@ -1,17 +0,0 @@ -import columnSet from '../table/column-set.js'; -import error from '../util/error.js'; -import isString from '../util/is-string.js'; - -export default function(table, columns) { - const cols = columnSet(); - - columns.forEach((value, curr) => { - const next = isString(value) ? value : curr; - if (next) { - const col = table.column(curr) || error(`Unrecognized column: ${curr}`); - cols.add(next, col); - } - }); - - return table.create(cols); -} diff --git a/src/engine/spread.js b/src/engine/spread.js deleted file mode 100644 index c6fe88d1..00000000 --- a/src/engine/spread.js +++ /dev/null @@ -1,59 +0,0 @@ -import { aggregateGet } from './reduce/util.js'; -import columnSet from '../table/column-set.js'; -import NULL from '../util/null.js'; -import toArray from '../util/to-array.js'; - -export default function(table, { names, exprs, ops = [] }, options = {}) { - if (names.length === 0) return table; - - // ignore 'as' if there are multiple field names - const as = (names.length === 1 && options.as) || []; - const drop = options.drop == null ? true : !!options.drop; - const limit = options.limit == null - ? as.length || Infinity - : Math.max(1, +options.limit || 1); - - const get = aggregateGet(table, ops, exprs); - const cols = columnSet(); - const map = names.reduce((map, name, i) => map.set(name, i), new Map()); - - const add = (index, name) => { - const columns = spread(table, get[index], limit); - const n = columns.length; - for (let i = 0; i < n; ++i) { - cols.add(as[i] || `${name}_${i + 1}`, columns[i]); - } - }; - - table.columnNames().forEach(name => { - if (map.has(name)) { - if (!drop) cols.add(name, table.column(name)); - add(map.get(name), name); - map.delete(name); - } else { - cols.add(name, table.column(name)); - } - }); - - map.forEach(add); - - return table.create(cols); -} - -function spread(table, get, limit) { - const nrows = table.totalRows(); - const columns = []; - - table.scan((row, data) => { - const values = toArray(get(row, data)); - const n = Math.min(values.length, limit); - while (columns.length < n) { - columns.push(Array(nrows).fill(NULL)); - } - for (let i = 0; i < n; ++i) { - columns[i][row] = values[i]; - } - }); - - return columns; -} diff --git a/src/engine/unroll.js b/src/engine/unroll.js deleted file mode 100644 index bc55f92f..00000000 --- a/src/engine/unroll.js +++ /dev/null @@ -1,117 +0,0 @@ -import { aggregateGet } from './reduce/util.js'; -import columnSet from '../table/column-set.js'; -import toArray from '../util/to-array.js'; - -export default function(table, { names = [], exprs = [], ops = [] }, options = {}) { - if (!names.length) return table; - - const limit = options.limit > 0 ? +options.limit : Infinity; - const index = options.index - ? options.index === true ? 'index' : options.index + '' - : null; - const drop = new Set(options.drop); - const get = aggregateGet(table, ops, exprs); - - // initialize output columns - const cols = columnSet(); - const nset = new Set(names); - const priors = []; - const copies = []; - const unroll = []; - - // original and copied columns - table.columnNames().forEach(name => { - if (!drop.has(name)) { - const col = cols.add(name, []); - if (!nset.has(name)) { - priors.push(table.column(name)); - copies.push(col); - } - } - }); - - // unrolled output columns - names.forEach(name => { - if (!drop.has(name)) { - if (!cols.has(name)) cols.add(name, []); - unroll.push(cols.data[name]); - } - }); - - // index column, if requested - const icol = index ? cols.add(index, []) : null; - - let start = 0; - const m = priors.length; - const n = unroll.length; - - const copy = (row, maxlen) => { - for (let i = 0; i < m; ++i) { - copies[i].length = start + maxlen; - copies[i].fill(priors[i].get(row), start, start + maxlen); - } - }; - - const indices = icol - ? (row, maxlen) => { - for (let i = 0; i < maxlen; ++i) { - icol[row + i] = i; - } - } - : () => {}; - - if (n === 1) { - // optimize common case of one array-valued column - const fn = get[0]; - const col = unroll[0]; - - table.scan((row, data) => { - // extract array data - const array = toArray(fn(row, data)); - const maxlen = Math.min(array.length, limit); - - // copy original table data - copy(row, maxlen); - - // copy unrolled array data - for (let j = 0; j < maxlen; ++j) { - col[start + j] = array[j]; - } - - // fill in array indices - indices(start, maxlen); - - start += maxlen; - }); - } else { - table.scan((row, data) => { - let maxlen = 0; - - // extract parallel array data - const arrays = get.map(fn => { - const value = toArray(fn(row, data)); - maxlen = Math.min(Math.max(maxlen, value.length), limit); - return value; - }); - - // copy original table data - copy(row, maxlen); - - // copy unrolled array data - for (let i = 0; i < n; ++i) { - const col = unroll[i]; - const arr = arrays[i]; - for (let j = 0; j < maxlen; ++j) { - col[start + j] = arr[j]; - } - } - - // fill in array indices - indices(start, maxlen); - - start += maxlen; - }); - } - - return table.create(cols.new()); -} diff --git a/src/expression/codegen.js b/src/expression/codegen.js index f4a345f2..0c16ee45 100644 --- a/src/expression/codegen.js +++ b/src/expression/codegen.js @@ -33,9 +33,14 @@ const ref = (node, opt, method) => { return `data${table}${name(node)}.${method}(${opt.index}${table})`; }; +const get = (node, opt) => { + const table = node.table || ''; + return `data${table}${name(node)}[${opt.index}${table}]`; +}; + const visitors = { Constant: node => node.raw, - Column: (node, opt) => ref(node, opt, 'get'), + Column: (node, opt) => node.array ? get(node, opt) : ref(node, opt, 'at'), Dictionary: (node, opt) => ref(node, opt, 'key'), Function: node => `fn.${node.name}`, Parameter: node => `$${name(node)}`, diff --git a/src/expression/compare.js b/src/expression/compare.js index e527fd8e..399bd6a7 100644 --- a/src/expression/compare.js +++ b/src/expression/compare.js @@ -1,6 +1,6 @@ import codegen from './codegen.js'; import parse from './parse.js'; -import { aggregate } from '../engine/reduce/util.js'; +import { aggregate } from '../verbs/reduce/util.js'; // generate code to compare a single field const _compare = (u, v, lt, gt) => diff --git a/src/expression/parse-escape.js b/src/expression/parse-escape.js index fb0b20b2..22233c93 100644 --- a/src/expression/parse-escape.js +++ b/src/expression/parse-escape.js @@ -9,9 +9,7 @@ export default function(ctx, spec, params) { if (ctx.aggronly) error(ERROR_ESC_AGGRONLY); // generate escaped function invocation code - const code = '(row,data)=>fn(' - + rowObjectCode(ctx.table.columnNames()) - + ',$)'; + const code = `(row,data)=>fn(${rowObjectCode(ctx.table)},$)`; return { escape: compile.escape(code, toFunction(spec.expr), params) }; } diff --git a/src/expression/parse-expression.js b/src/expression/parse-expression.js index 765236fc..71d196f3 100644 --- a/src/expression/parse-expression.js +++ b/src/expression/parse-expression.js @@ -93,9 +93,10 @@ function parseAST(expr) { const code = expr.field ? fieldRef(expr) : isArray(expr) ? toString(expr) : expr; + // @ts-ignore return parse(`expr=(${code})`, PARSER_OPT).body[0].expression.right; - } catch (err) { - error(`Expression parse error: ${expr+''}`, err); + } catch (err) { // eslint-disable-line no-unused-vars + error(`Expression parse error: ${expr+''}`); } } @@ -378,7 +379,7 @@ function updateFunctionNode(node, name, ctx) { if (name === ROW_OBJECT) { const t = ctx.table; if (!t) ctx.error(node, ERROR_ROW_OBJECT); - rowObjectExpression(node, + rowObjectExpression(node, t, node.arguments.length ? node.arguments.map(node => { const col = ctx.param(node); diff --git a/src/expression/rewrite.js b/src/expression/rewrite.js index 1271846b..87d63c33 100644 --- a/src/expression/rewrite.js +++ b/src/expression/rewrite.js @@ -1,4 +1,5 @@ import { Column, Dictionary, Literal } from './ast/constants.js'; +import isArrayType from '../util/is-array-type.js'; import isFunction from '../util/is-function.js'; const dictOps = { @@ -13,15 +14,20 @@ const dictOps = { * Additionally optimizes dictionary column operations. * @param {object} ref AST node to rewrite to a column reference. * @param {string} name The name of the column. - * @param {number} index The table index of the column. - * @param {object} col The actual table column instance. - * @param {object} op Parent AST node operating on the column reference. + * @param {number} [index] The table index of the column. + * @param {object} [col] The actual table column instance. + * @param {object} [op] Parent AST node operating on the column reference. */ -export default function(ref, name, index = 0, col, op) { +export default function(ref, name, index = 0, col = undefined, op = undefined) { ref.type = Column; ref.name = name; ref.table = index; + // annotate arrays as such for optimized access + if (isArrayType(col)) { + ref.array = true; + } + // proceed only if has parent op and is a dictionary column if (op && col && isFunction(col.keyFor)) { // get other arg if op is an optimizeable operation diff --git a/src/expression/row-object.js b/src/expression/row-object.js index ef3a4df4..b12d0ff7 100644 --- a/src/expression/row-object.js +++ b/src/expression/row-object.js @@ -8,7 +8,11 @@ import toString from '../util/to-string.js'; export const ROW_OBJECT = 'row_object'; -export function rowObjectExpression(node, props) { +export function rowObjectExpression( + node, + table, + props = table.columnNames()) +{ node.type = ObjectExpression; const p = node.properties = []; @@ -17,17 +21,17 @@ export function rowObjectExpression(node, props) { p.push({ type: Property, key: { type: Literal, raw: toString(key) }, - value: rewrite({ computed: true }, name) + value: rewrite({ computed: true }, name, 0, table.column(name)) }); } return node; } -export function rowObjectCode(props) { - return codegen(rowObjectExpression({}, props)); +export function rowObjectCode(table, props) { + return codegen(rowObjectExpression({}, table, props)); } -export function rowObjectBuilder(props) { - return compile.expr(rowObjectCode(props)); +export function rowObjectBuilder(table, props) { + return compile.expr(rowObjectCode(table, props)); } diff --git a/src/format/from-arrow.js b/src/format/from-arrow.js deleted file mode 100644 index a5a0c776..00000000 --- a/src/format/from-arrow.js +++ /dev/null @@ -1,42 +0,0 @@ -import { fromIPC } from '../arrow/arrow-table.js'; -import arrowColumn from '../arrow/arrow-column.js'; -import resolve, { all } from '../helpers/selection.js'; -import columnSet from '../table/column-set.js'; -import ColumnTable from '../table/column-table.js'; - -/** - * Options for Apache Arrow import. - * @typedef {object} ArrowOptions - * @property {import('../table/transformable').Select} columns - * An ordered set of columns to import. The input may consist of column name - * strings, column integer indices, objects with current column names as keys - * and new column names as values (for renaming), or selection helper - * functions such as {@link all}, {@link not}, or {@link range}. - */ - -/** - * Create a new table backed by an Apache Arrow table instance. - * @param {object} arrow An Apache Arrow data table or byte buffer. - * @param {ArrowOptions} options Options for Arrow import. - * @return {ColumnTable} A new table containing the imported values. - */ -export default function(arrow, options = {}) { - if (arrow && !arrow.batches) { - arrow = fromIPC()(arrow); - } - - // resolve column selection - const fields = arrow.schema.fields.map(f => f.name); - const sel = resolve({ - columnNames: test => test ? fields.filter(test) : fields.slice(), - columnIndex: name => fields.indexOf(name) - }, options.columns || all()); - - // build Arquero columns for backing Arrow columns - const cols = columnSet(); - sel.forEach((name, key) => { - cols.add(name, arrowColumn(arrow.getChild(key))); - }); - - return new ColumnTable(cols.data, cols.names); -} diff --git a/src/format/from-csv.js b/src/format/from-csv.js index d54b931a..044fd1eb 100644 --- a/src/format/from-csv.js +++ b/src/format/from-csv.js @@ -32,8 +32,8 @@ import parseDelimited from './parse/parse-delimited.js'; * behavior, set the autoType option to false. To perform custom parsing * of input column values, use the parse option. * @param {string} text A string in a delimited-value format. - * @param {CSVParseOptions} options The formatting options. - * @return {import('../table/column-table.js').ColumnTable} A new table + * @param {CSVParseOptions} [options] The formatting options. + * @return {import('../table/ColumnTable.js').ColumnTable} A new table * containing the parsed values. */ export default function(text, options = {}) { diff --git a/src/format/from-fixed.js b/src/format/from-fixed.js index 5e4cafc1..ccfecacb 100644 --- a/src/format/from-fixed.js +++ b/src/format/from-fixed.js @@ -32,7 +32,7 @@ import error from '../util/error.js'; * parsing of input column values, use the parse option. * @param {string} text A string in a fixed-width file format. * @param {FixedParseOptions} options The formatting options. - * @return {import('../table/column-table.js').ColumnTable} A new table + * @return {import('../table/ColumnTable.js').ColumnTable} A new table * containing the parsed values. */ export default function(text, options = {}) { @@ -50,10 +50,10 @@ export default function(text, options = {}) { ); } -function positions({ positions, widths }) { +function positions({ positions = undefined, widths = undefined }) { if (!positions && !widths) { error('Fixed width files require a "positions" or "widths" option'); } let i = 0; return positions || widths.map(w => [i, i += w]); -} \ No newline at end of file +} diff --git a/src/format/from-json.js b/src/format/from-json.js index 782cdc89..75279588 100644 --- a/src/format/from-json.js +++ b/src/format/from-json.js @@ -1,4 +1,4 @@ -import ColumnTable from '../table/column-table.js'; +import { ColumnTable } from '../table/ColumnTable.js'; import defaultTrue from '../util/default-true.js'; import isArrayType from '../util/is-array-type.js'; import isDigitString from '../util/is-digit-string.js'; @@ -27,7 +27,7 @@ import isString from '../util/is-string.js'; * The data payload can also be provided as the "data" property of an * enclosing object, with an optional "schema" property containing table * metadata such as a "fields" array of ordered column information. - * @param {string|object} data A string in JSON format, or pre-parsed object. + * @param {string|object} json A string in JSON format, or pre-parsed object. * @param {JSONParseOptions} options The formatting options. * @return {ColumnTable} A new table containing the parsed values. */ diff --git a/src/format/from-text-rows.js b/src/format/from-text-rows.js index b65b2adc..bdcda246 100644 --- a/src/format/from-text-rows.js +++ b/src/format/from-text-rows.js @@ -1,4 +1,4 @@ -import ColumnTable from '../table/column-table.js'; +import { ColumnTable } from '../table/ColumnTable.js'; import identity from '../util/identity.js'; import isFunction from '../util/is-function.js'; import repeat from '../util/repeat.js'; @@ -44,6 +44,7 @@ export default function(next, names, options) { } } + /** @type {import('../table/types.js').ColumnData} */ const columns = {}; names.forEach((name, i) => columns[name] = values[i]); return new ColumnTable(columns, names); diff --git a/src/format/load-file.js b/src/format/load-file.js index f33d5ce6..db5975a0 100644 --- a/src/format/load-file.js +++ b/src/format/load-file.js @@ -1,15 +1,14 @@ import fetch from 'node-fetch'; import { readFile } from 'fs'; - -import fromArrow from './from-arrow.js'; import fromCSV from './from-csv.js'; import fromFixed from './from-fixed.js'; import fromJSON from './from-json.js'; +import fromArrow from '../arrow/from-arrow.js'; import { from } from '../table/index.js'; import isArray from '../util/is-array.js'; /** - * @typedef {import('../table/column-table.js').ColumnTable} ColumnTable + * @typedef {import('../table/ColumnTable.js').ColumnTable} ColumnTable */ /** @@ -30,7 +29,7 @@ import isArray from '../util/is-array.js'; * otherwise CSV format is assumed. The options to this method are * passed as the second argument to the format parser. * @param {string} path The URL or file path to load. - * @param {LoadOptions & object} options The loading and formatting options. + * @param {LoadOptions & object} [options] The loading and formatting options. * @return {Promise} A Promise for an Arquero table. * @example aq.load('data/table.csv') * @example aq.load('data/table.json', { using: aq.fromJSON }) @@ -68,7 +67,8 @@ function loadFile(file, options, parse) { /** * Load an Arrow file from a URL and return a Promise for an Arquero table. * @param {string} path The URL or file path to load. - * @param {LoadOptions & import('./from-arrow').ArrowOptions} options Arrow format options. + * @param {LoadOptions & import('../arrow/types.js').ArrowOptions} [options] + * Arrow format options. * @return {Promise} A Promise for an Arquero table. * @example aq.loadArrow('data/table.arrow') */ @@ -79,7 +79,8 @@ export function loadArrow(path, options) { /** * Load a CSV file from a URL and return a Promise for an Arquero table. * @param {string} path The URL or file path to load. - * @param {LoadOptions & import('./from-csv').CSVParseOptions} options CSV format options. + * @param {LoadOptions & import('./from-csv.js').CSVParseOptions} [options] + * CSV format options. * @return {Promise} A Promise for an Arquero table. * @example aq.loadCSV('data/table.csv') * @example aq.loadTSV('data/table.tsv', { delimiter: '\t' }) @@ -91,7 +92,8 @@ export function loadCSV(path, options) { /** * Load a fixed width file from a URL and return a Promise for an Arquero table. * @param {string} path The URL or file path to load. - * @param {LoadOptions & import('./from-fixed').FixedParseOptions} options Fixed width format options. + * @param {LoadOptions & import('./from-fixed.js').FixedParseOptions} [options] + * Fixed width format options. * @return {Promise} A Promise for an Arquero table. * @example aq.loadFixedWidth('data/table.txt', { names: ['name', 'city', state'], widths: [10, 20, 2] }) */ @@ -105,7 +107,8 @@ export function loadCSV(path, options) { * and the aq.from method is used to construct the table. Otherwise, a * column object format is assumed and aq.fromJSON is applied. * @param {string} path The URL or file path to load. - * @param {LoadOptions & import('./from-json').JSONParseOptions} options JSON format options. + * @param {LoadOptions & import('./from-json.js').JSONParseOptions} [options] + * JSON format options. * @return {Promise} A Promise for an Arquero table. * @example aq.loadJSON('data/table.json') */ diff --git a/src/format/load-url.js b/src/format/load-url.js index 2b3b4ff9..aded0fd4 100644 --- a/src/format/load-url.js +++ b/src/format/load-url.js @@ -1,4 +1,4 @@ -import fromArrow from './from-arrow.js'; +import fromArrow from '../arrow/from-arrow.js'; import fromCSV from './from-csv.js'; import fromFixed from './from-fixed.js'; import fromJSON from './from-json.js'; @@ -6,7 +6,7 @@ import { from } from '../table/index.js'; import isArray from '../util/is-array.js'; /** - * @typedef {import('../table/column-table.js').ColumnTable} ColumnTable + * @typedef {import('../table/ColumnTable.js').ColumnTable} ColumnTable */ /** @@ -43,7 +43,8 @@ export function load(url, options = {}) { /** * Load an Arrow file from a URL and return a Promise for an Arquero table. * @param {string} url The URL to load. - * @param {LoadOptions & import('./from-arrow').ArrowOptions} options Arrow format options. + * @param {LoadOptions & import('../arrow/types.js').ArrowOptions} [options] + * Arrow format options. * @return {Promise} A Promise for an Arquero table. * @example aq.loadArrow('data/table.arrow') */ @@ -54,7 +55,8 @@ export function loadArrow(url, options) { /** * Load a CSV file from a URL and return a Promise for an Arquero table. * @param {string} url The URL to load. - * @param {LoadOptions & import('./from-csv').CSVParseOptions} options CSV format options. + * @param {LoadOptions & import('./from-csv.js').CSVParseOptions} [options] + * CSV format options. * @return {Promise} A Promise for an Arquero table. * @example aq.loadCSV('data/table.csv') * @example aq.loadTSV('data/table.tsv', { delimiter: '\t' }) @@ -66,7 +68,8 @@ export function loadCSV(url, options) { /** * Load a fixed width file from a URL and return a Promise for an Arquero table. * @param {string} url The URL to load. - * @param {LoadOptions & import('./from-fixed').FixedParseOptions} options Fixed width format options. + * @param {LoadOptions & import('./from-fixed.js').FixedParseOptions} [options] + * Fixed width format options. * @return {Promise} A Promise for an Arquero table. * @example aq.loadFixedWidth('data/table.txt', { names: ['name', 'city', state'], widths: [10, 20, 2] }) */ @@ -80,7 +83,8 @@ export function loadCSV(url, options) { * and the aq.from method is used to construct the table. Otherwise, a * column object format is assumed and aq.fromJSON is applied. * @param {string} url The URL to load. - * @param {LoadOptions & import('./from-json').JSONParseOptions} options JSON format options. + * @param {LoadOptions & import('./from-json.js').JSONParseOptions} [options] + * JSON format options. * @return {Promise} A Promise for an Arquero table. * @example aq.loadJSON('data/table.json') */ diff --git a/src/format/parse/parse-delimited.js b/src/format/parse/parse-delimited.js index 5f2e86ed..ae622841 100644 --- a/src/format/parse/parse-delimited.js +++ b/src/format/parse/parse-delimited.js @@ -26,7 +26,11 @@ import error from '../../util/error.js'; // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -export default function(text, { delimiter = ',', skip, comment }) { +export default function(text, { + delimiter = ',', + skip = 0, + comment = undefined +}) { if (delimiter.length !== 1) { error(`Text "delimiter" should be a single character, found "${delimiter}"`); } diff --git a/src/format/parse/parse-lines.js b/src/format/parse/parse-lines.js index a5c3c2b5..4a1d2eca 100644 --- a/src/format/parse/parse-lines.js +++ b/src/format/parse/parse-lines.js @@ -1,7 +1,7 @@ import { NEWLINE, RETURN } from './constants.js'; import filter from './text-filter.js'; -export default function(text, { skip, comment }) { +export default function(text, { skip = 0, comment = undefined }) { let N = text.length; let I = 0; // current character index diff --git a/src/format/to-arrow.js b/src/format/to-arrow.js deleted file mode 100644 index 7fec6196..00000000 --- a/src/format/to-arrow.js +++ /dev/null @@ -1,13 +0,0 @@ -import { tableToIPC } from 'apache-arrow'; -import toArrow from '../arrow/encode/index.js'; - -export default toArrow; - -export function toArrowIPC(table, options = {}) { - const { format: format, ...toArrowOptions } = options; - const outputFormat = format ? format : 'stream'; - if (!['stream', 'file'].includes(outputFormat)) { - throw Error('Unrecognised output format'); - } - return tableToIPC(toArrow(table, toArrowOptions), format); -} diff --git a/src/format/to-csv.js b/src/format/to-csv.js index e2ae06e0..db7ff675 100644 --- a/src/format/to-csv.js +++ b/src/format/to-csv.js @@ -21,8 +21,7 @@ import isDate from '../util/is-date.js'; * Format a table as a comma-separated values (CSV) string. Other * delimiters, such as tabs or pipes ('|'), can be specified using * the options argument. - * @param {import('../table/column-table.js').ColumnTable} table The table - * to format. + * @param {import('../table/Table.js').Table} table The table to format. * @param {CSVFormatOptions} options The formatting options. * @return {string} A delimited-value format string. */ diff --git a/src/format/to-html.js b/src/format/to-html.js index a400fa73..54b7b264 100644 --- a/src/format/to-html.js +++ b/src/format/to-html.js @@ -55,8 +55,7 @@ import mapObject from '../util/map-object.js'; /** * Format a table as an HTML table string. - * @param {import('../table/column-table.js').ColumnTable} table The table - * to format. + * @param {import('../table/Table.js').Table} table The table to format. * @param {HTMLFormatOptions} options The formatting options. * @return {string} An HTML table string. */ diff --git a/src/format/to-json.js b/src/format/to-json.js index d3a84309..e9e51075 100644 --- a/src/format/to-json.js +++ b/src/format/to-json.js @@ -27,8 +27,7 @@ const defaultFormatter = value => isDate(value) /** * Format a table as a JavaScript Object Notation (JSON) string. - * @param {import('../table/column-table.js').ColumnTable} table The - * table to format. + * @param {import('../table/Table.js').Table} table The table to format. * @param {JSONFormatOptions} options The formatting options. * @return {string} A JSON string. */ @@ -51,7 +50,7 @@ export default function(table, options = {}) { const formatter = format[name] || defaultFormatter; let r = -1; table.scan(row => { - const value = column.get(row); + const value = column.at(row); text += (++r ? ',' : '') + JSON.stringify(formatter(value)); }, true, options.limit, options.offset); @@ -59,4 +58,4 @@ export default function(table, options = {}) { }); return text + '}' + (schema ? '}' : ''); -} \ No newline at end of file +} diff --git a/src/format/to-markdown.js b/src/format/to-markdown.js index c4d39f72..70d5c251 100644 --- a/src/format/to-markdown.js +++ b/src/format/to-markdown.js @@ -25,7 +25,7 @@ import { columns, formats, scan } from './util.js'; /** * Format a table as a GitHub-Flavored Markdown table string. - * @param {import('../table/column-table.js').ColumnTable} table The table to format. + * @param {import('../table/Table.js').Table} table The table to format. * @param {MarkdownFormatOptions} options The formatting options. * @return {string} A GitHub-Flavored Markdown table string. */ diff --git a/src/format/util.js b/src/format/util.js index 9f68e01e..082fa599 100644 --- a/src/format/util.js +++ b/src/format/util.js @@ -3,7 +3,7 @@ import isFunction from '../util/is-function.js'; /** * Column selection function. - * @typedef {(table: import('../table/table.js').Table) => string[]} ColumnSelectFunction + * @typedef {(table: import('../table/Table.js').Table) => string[]} ColumnSelectFunction */ /** @@ -15,7 +15,7 @@ import isFunction from '../util/is-function.js'; * Column format options. The object keys should be column names. * The object values should be formatting functions or objects. * If specified, these override any automatically inferred options. - * @typedef {Object.} ColumnFormatOptions */ /** @@ -49,7 +49,7 @@ export function formats(table, names, options) { function values(table, columnName) { const column = table.column(columnName); - return fn => table.scan(row => fn(column.get(row))); + return fn => table.scan(row => fn(column.at(row))); } export function scan(table, names, limit = 100, offset, ctx) { @@ -59,7 +59,7 @@ export function scan(table, names, limit = 100, offset, ctx) { ctx.row(row); for (let i = 0; i < n; ++i) { const name = names[i]; - ctx.cell(data[names[i]].get(row), name, i); + ctx.cell(data[names[i]].at(row), name, i); } }, true, limit, offset); } diff --git a/src/format/value.js b/src/format/value.js index 0bca2b12..a3a180c9 100644 --- a/src/format/value.js +++ b/src/format/value.js @@ -32,6 +32,7 @@ import isTypedArray from '../util/is-typed-array.js'; */ export default function(v, options = {}) { if (isFunction(options)) { + // @ts-ignore return options(v) + ''; } @@ -39,18 +40,22 @@ export default function(v, options = {}) { if (type === 'object') { if (isDate(v)) { + // @ts-ignore return options.utc ? formatUTCDate(v) : formatDate(v); } else { const s = JSON.stringify( v, + // @ts-ignore (k, v) => isTypedArray(v) ? Array.from(v) : v ); + // @ts-ignore const maxlen = options.maxlen || 30; return s.length > maxlen ? s.slice(0, 28) + '\u2026' + (s[0] === '[' ? ']' : '}') : s; } } else if (type === 'number') { + // @ts-ignore const digits = options.digits || 0; let a; return v !== 0 && ((a = Math.abs(v)) >= 1e18 || a < Math.pow(10, -digits)) diff --git a/src/helpers/selection.js b/src/helpers/selection.js index c26eb841..7032cf25 100644 --- a/src/helpers/selection.js +++ b/src/helpers/selection.js @@ -39,7 +39,7 @@ function toObject(value) { /** * Proxy type for SelectHelper function. - * @typedef {import('../table/transformable').SelectHelper} SelectHelper + * @typedef {import('../table/types.js').SelectHelper} SelectHelper */ /** @@ -98,7 +98,9 @@ export function range(start, end) { export function matches(pattern) { if (isString(pattern)) pattern = RegExp(escapeRegExp(pattern)); return decorate( + // @ts-ignore table => table.columnNames(name => pattern.test(name)), + // @ts-ignore () => ({ matches: [pattern.source, pattern.flags] }) ); } diff --git a/src/index-node.js b/src/index-browser.js similarity index 55% rename from src/index-node.js rename to src/index-browser.js index 56c9e4ec..7dcd8f9c 100644 --- a/src/index-node.js +++ b/src/index-browser.js @@ -1,2 +1,2 @@ -export * from './index.js'; -export { load, loadArrow, loadCSV, loadFixed, loadJSON } from './format/load-file.js'; +export * from './api.js'; +export { load, loadArrow, loadCSV, loadFixed, loadJSON } from './format/load-url.js'; diff --git a/src/index.js b/src/index.js index 30d8a33b..275a5a8c 100644 --- a/src/index.js +++ b/src/index.js @@ -1,45 +1,2 @@ -// export internal class definitions -import Table from './table/table.js'; -import { columnFactory } from './table/column.js'; -import ColumnTable from './table/column-table.js'; -import Transformable from './table/transformable.js'; -import Reducer from './engine/reduce/reducer.js'; -import parse from './expression/parse.js'; -import walk_ast from './expression/ast/walk.js'; -import Query from './query/query.js'; -import { Verb, Verbs } from './query/verb.js'; - -export const internal = { - Table, - ColumnTable, - Transformable, - Query, - Reducer, - Verb, - Verbs, - columnFactory, - parse, - walk_ast -}; - -// export public API -export { seed } from './util/random.js'; -export { default as fromArrow } from './format/from-arrow.js'; -export { default as fromCSV } from './format/from-csv.js'; -export { default as fromFixed } from './format/from-fixed.js'; -export { default as fromJSON } from './format/from-json.js'; -export { load, loadArrow, loadCSV, loadFixed, loadJSON } from './format/load-url.js'; -export { default as toArrow } from './arrow/encode/index.js'; -export { default as bin } from './helpers/bin.js'; -export { default as escape } from './helpers/escape.js'; -export { default as desc } from './helpers/desc.js'; -export { default as field } from './helpers/field.js'; -export { default as frac } from './helpers/frac.js'; -export { default as names } from './helpers/names.js'; -export { default as rolling } from './helpers/rolling.js'; -export { all, endswith, matches, not, range, startswith } from './helpers/selection.js'; -export { default as agg } from './verbs/helpers/agg.js'; -export { default as op } from './op/op-api.js'; -export { query, queryFrom } from './query/query.js'; -export * from './register.js'; -export * from './table/index.js'; +export * from './api.js'; +export { load, loadArrow, loadCSV, loadFixed, loadJSON } from './format/load-file.js'; diff --git a/src/op/aggregate-functions.js b/src/op/aggregate-functions.js index ebe43bda..e122fe46 100644 --- a/src/op/aggregate-functions.js +++ b/src/op/aggregate-functions.js @@ -54,8 +54,8 @@ function initProduct(s, value) { * An operator instance for an aggregate function. * @typedef {object} AggregateOperator * @property {AggregateInit} init Initialize the operator. - * @property {AggregateAdd} add Add a value to the operator state. - * @property {AggregateRem} rem Remove a value from the operator state. + * @property {AggregateAdd} [add] Add a value to the operator state. + * @property {AggregateRem} [rem] Remove a value from the operator state. * @property {AggregateValue} value Retrieve an output value. */ diff --git a/src/op/functions/array.js b/src/op/functions/array.js index 1033a9e7..14e3167e 100644 --- a/src/op/functions/array.js +++ b/src/op/functions/array.js @@ -6,20 +6,123 @@ import isValid from '../../util/is-valid.js'; const isSeq = (seq) => isArrayType(seq) || isString(seq); export default { - compact: (arr) => isArrayType(arr) ? arr.filter(v => isValid(v)) : arr, - concat: (...values) => [].concat(...values), - includes: (seq, value, index) => isSeq(seq) - ? seq.includes(value, index) - : false, - indexof: (seq, value) => isSeq(seq) ? seq.indexOf(value) : -1, - join: (arr, delim) => isArrayType(arr) ? arr.join(delim) : NULL, - lastindexof: (seq, value) => isSeq(seq) ? seq.lastIndexOf(value) : -1, - length: (seq) => isSeq(seq) ? seq.length : 0, - pluck: (arr, prop) => isArrayType(arr) - ? arr.map(v => isValid(v) ? v[prop] : NULL) - : NULL, - reverse: (seq) => isArrayType(seq) ? seq.slice().reverse() - : isString(seq) ? seq.split('').reverse().join('') - : NULL, - slice: (seq, start, end) => isSeq(seq) ? seq.slice(start, end) : NULL + /** + * Returns a new compacted array with invalid values + * (`null`, `undefined`, `NaN`) removed. + * @template T + * @param {T[]} array The input array. + * @return {T[]} A compacted array. + */ + compact: (array) => isArrayType(array) + ? array.filter(v => isValid(v)) + : array, + + /** + * Merges two or more arrays in sequence, returning a new array. + * @template T + * @param {...(T|T[])} values The arrays to merge. + * @return {T[]} The merged array. + */ + concat: (...values) => [].concat(...values), + + /** + * Determines whether an *array* includes a certain *value* among its + * entries, returning `true` or `false` as appropriate. + * @template T + * @param {T[]} sequence The input array value. + * @param {T} value The value to search for. + * @param {number} [index=0] The integer index to start searching + * from (default `0`). + * @return {boolean} True if the value is included, false otherwise. + */ + includes: (sequence, value, index) => isSeq(sequence) + ? sequence.includes(value, index) + : false, + + /** + * Returns the first index at which a given *value* can be found in the + * *sequence* (array or string), or -1 if it is not present. + * @template T + * @param {T[]|string} sequence The input array or string value. + * @param {T} value The value to search for. + * @return {number} The index of the value, or -1 if not present. + */ + indexof: (sequence, value) => isSeq(sequence) + // @ts-ignore + ? sequence.indexOf(value) + : -1, + + /** + * Creates and returns a new string by concatenating all of the elements + * in an *array* (or an array-like object), separated by commas or a + * specified *delimiter* string. If the *array* has only one item, then + * that item will be returned without using the delimiter. + * @template T + * @param {T[]} array The input array value. + * @param {string} delim The delimiter string (default `','`). + * @return {string} The joined string. + */ + join: (array, delim) => isArrayType(array) ? array.join(delim) : NULL, + + /** + * Returns the last index at which a given *value* can be found in the + * *sequence* (array or string), or -1 if it is not present. + * @template T + * @param {T[]|string} sequence The input array or string value. + * @param {T} value The value to search for. + * @return {number} The last index of the value, or -1 if not present. + */ + lastindexof: (sequence, value) => isSeq(sequence) + // @ts-ignore + ? sequence.lastIndexOf(value) + : -1, + + /** + * Returns the length of the input *sequence* (array or string). + * @param {Array|string} sequence The input array or string value. + * @return {number} The length of the sequence. + */ + length: (sequence) => isSeq(sequence) ? sequence.length : 0, + + /** + * Returns a new array in which the given *property* has been extracted + * for each element in the input *array*. + * @param {Array} array The input array value. + * @param {string} property The property name string to extract. Nested + * properties are not supported: the input `"a.b"` will indicates a + * property with that exact name, *not* a nested property `"b"` of + * the object `"a"`. + * @return {Array} An array of plucked properties. + */ + pluck: (array, property) => isArrayType(array) + ? array.map(v => isValid(v) ? v[property] : NULL) + : NULL, + + /** + * Returns a new array or string with the element order reversed: the first + * *sequence* element becomes the last, and the last *sequence* element + * becomes the first. The input *sequence* is unchanged. + * @template T + * @param {T[]|string} sequence The input array or string value. + * @return {T[]|string} The reversed sequence. + */ + reverse: (sequence) => isArrayType(sequence) ? sequence.slice().reverse() + : isString(sequence) ? sequence.split('').reverse().join('') + : NULL, + + /** + * Returns a copy of a portion of the input *sequence* (array or string) + * selected from *start* to *end* (*end* not included) where *start* and + * *end* represent the index of items in the sequence. + * @template T + * @param {T[]|string} sequence The input array or string value. + * @param {number} [start=0] The starting integer index to copy from + * (inclusive, default `0`). + * @param {number} [end] The ending integer index to copy from (exclusive, + * default `sequence.length`). + * @return {T[]|string} The sliced sequence. + */ + slice: (sequence, start, end) => isSeq(sequence) + ? sequence.slice(start, end) + : NULL }; diff --git a/src/op/functions/bin.js b/src/op/functions/bin.js index c074e34a..7206343c 100644 --- a/src/op/functions/bin.js +++ b/src/op/functions/bin.js @@ -3,11 +3,11 @@ * Useful for creating equal-width histograms. * Values outside the [min, max] range will be mapped to * -Infinity (< min) or +Infinity (> max). - * @param {number} value - The value to bin. - * @param {number} min - The minimum bin boundary. - * @param {number} max - The maximum bin boundary. - * @param {number} step - The step size between bin boundaries. - * @param {number} [offset=0] - Offset in steps by which to adjust + * @param {number} value The value to bin. + * @param {number} min The minimum bin boundary. + * @param {number} max The maximum bin boundary. + * @param {number} step The step size between bin boundaries. + * @param {number} [offset=0] Offset in steps by which to adjust * the bin value. An offset of 1 will return the next boundary. */ export default function(value, min, max, step, offset) { diff --git a/src/op/functions/date.js b/src/op/functions/date.js index bf2c9d17..1b7971b2 100644 --- a/src/op/functions/date.js +++ b/src/op/functions/date.js @@ -22,7 +22,7 @@ const t = d => ( * @param {number} [minutes=0] The minute within the hour. * @param {number} [seconds=0] The second within the minute. * @param {number} [milliseconds=0] The milliseconds within the second. - * @return {date} The resuting Date value. + * @return {Date} The resuting Date value. */ function datetime(year, month, date, hours, minutes, seconds, milliseconds) { return !arguments.length @@ -48,7 +48,7 @@ function datetime(year, month, date, hours, minutes, seconds, milliseconds) { * @param {number} [minutes=0] The minute within the hour. * @param {number} [seconds=0] The second within the minute. * @param {number} [milliseconds=0] The milliseconds within the second. - * @return {date} The resuting Date value. + * @return {Date} The resuting Date value. */ function utcdatetime(year, month, date, hours, minutes, seconds, milliseconds) { return !arguments.length @@ -64,6 +64,12 @@ function utcdatetime(year, month, date, hours, minutes, seconds, milliseconds) { )); } +/** + * Return the current day of the year in local time as a number + * between 1 and 366. + * @param {Date|number} date A date or timestamp. + * @return {number} The day of the year in local time. + */ function dayofyear(date) { t1.setTime(+date); t1.setHours(0, 0, 0, 0); @@ -71,16 +77,28 @@ function dayofyear(date) { t0.setMonth(0); t0.setDate(1); const tz = (t1.getTimezoneOffset() - t0.getTimezoneOffset()) * msMinute; - return Math.floor(1 + ((t1 - t0) - tz) / msDay); + return Math.floor(1 + ((+t1 - +t0) - tz) / msDay); } +/** + * Return the current day of the year in UTC time as a number + * between 1 and 366. + * @param {Date|number} date A date or timestamp. + * @return {number} The day of the year in UTC time. + */ function utcdayofyear(date) { t1.setTime(+date); t1.setUTCHours(0, 0, 0, 0); const t0 = Date.UTC(t1.getUTCFullYear(), 0, 1); - return Math.floor(1 + (t1 - t0) / msDay); + return Math.floor(1 + (+t1 - t0) / msDay); } +/** + * Return the current week of the year in local time as a number + * between 1 and 52. + * @param {Date|number} date A date or timestamp. + * @return {number} The week of the year in local time. + */ function week(date, firstday) { const i = firstday || 0; t1.setTime(+date); @@ -92,9 +110,15 @@ function week(date, firstday) { t0.setDate(1 - (t0.getDay() + 7 - i) % 7); t0.setHours(0, 0, 0, 0); const tz = (t1.getTimezoneOffset() - t0.getTimezoneOffset()) * msMinute; - return Math.floor((1 + (t1 - t0) - tz) / msWeek); + return Math.floor((1 + (+t1 - +t0) - tz) / msWeek); } +/** + * Return the current week of the year in UTC time as a number + * between 1 and 52. + * @param {Date|number} date A date or timestamp. + * @return {number} The week of the year in UTC time. + */ function utcweek(date, firstday) { const i = firstday || 0; t1.setTime(+date); @@ -105,36 +129,263 @@ function utcweek(date, firstday) { t0.setUTCDate(1); t0.setUTCDate(1 - (t0.getUTCDay() + 7 - i) % 7); t0.setUTCHours(0, 0, 0, 0); - return Math.floor((1 + (t1 - t0)) / msWeek); + return Math.floor((1 + (+t1 - +t0)) / msWeek); } export default { - format_date: (date, shorten) => formatDate(t(date), !shorten), - format_utcdate: (date, shorten) => formatUTCDate(t(date), !shorten), - timestamp: (date) => +t(date), - year: (date) => t(date).getFullYear(), - quarter: (date) => Math.floor(t(date).getMonth() / 3), - month: (date) => t(date).getMonth(), - date: (date) => t(date).getDate(), - dayofweek: (date) => t(date).getDay(), - hours: (date) => t(date).getHours(), - minutes: (date) => t(date).getMinutes(), - seconds: (date) => t(date).getSeconds(), - milliseconds: (date) => t(date).getMilliseconds(), - utcyear: (date) => t(date).getUTCFullYear(), - utcquarter: (date) => Math.floor(t(date).getUTCMonth() / 3), - utcmonth: (date) => t(date).getUTCMonth(), - utcdate: (date) => t(date).getUTCDate(), - utcdayofweek: (date) => t(date).getUTCDay(), - utchours: (date) => t(date).getUTCHours(), - utcminutes: (date) => t(date).getUTCMinutes(), - utcseconds: (date) => t(date).getUTCSeconds(), - utcmilliseconds: (date) => t(date).getUTCMilliseconds(), + /** + * Returns an [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) formatted + * string for the given *date* in local timezone. The resulting string is + * compatible with *parse_date* and JavaScript's built-in *Date.parse*. + * @param {Date | number} date The input Date or timestamp value. + * @param {boolean} [shorten=false] A boolean flag (default `false`) + * indicating if the formatted string should be shortened if possible. + * For example, the local date `2001-01-01` will shorten from + * `"2001-01-01T00:00:00.000"` to `"2001-01-01T00:00"`. + * @return {string} The formatted date string in local time. + */ + format_date: (date, shorten) => formatDate(t(date), !shorten), + + /** + * Returns an [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) formatted + * string for the given *date* in Coordinated Universal Time (UTC). The + * resulting string is compatible with *parse_date* and JavaScript's + * built-in *Date.parse*. + * @param {Date | number} date The input Date or timestamp value. + * @param {boolean} [shorten=false] A boolean flag (default `false`) + * indicating if the formatted string should be shortened if possible. + * For example, the the UTC date `2001-01-01` will shorten from + * `"2001-01-01T00:00:00.000Z"` to `"2001-01-01"` + * @return {string} The formatted date string in UTC time. + */ + format_utcdate: (date, shorten) => formatUTCDate(t(date), !shorten), + + /** + * Returns the number of milliseconds elapsed since midnight, January 1, + * 1970 Universal Coordinated Time (UTC). + * @return {number} The timestamp for now. + */ + now: Date.now, + + /** + * Returns the timestamp for a *date* as the number of milliseconds elapsed + * since January 1, 1970 00:00:00 UTC. + * @param {Date | number} date The input Date value. + * @return {number} The timestamp value. + */ + timestamp: (date) => +t(date), + + /** + * Creates and returns a new Date value. If no arguments are provided, + * the current date and time are used. + * @param {number} [year] The year. + * @param {number} [month=0] The (zero-based) month. + * @param {number} [date=1] The date within the month. + * @param {number} [hours=0] The hour within the day. + * @param {number} [minutes=0] The minute within the hour. + * @param {number} [seconds=0] The second within the minute. + * @param {number} [milliseconds=0] The milliseconds within the second. + * @return {Date} The Date value. + */ datetime, - dayofyear, + + /** + * Returns the year of the specified *date* according to local time. + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The year value in local time. + */ + year: (date) => t(date).getFullYear(), + + /** + * Returns the zero-based quarter of the specified *date* according to + * local time. + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The quarter value in local time. + */ + quarter: (date) => Math.floor(t(date).getMonth() / 3), + + /** + * Returns the zero-based month of the specified *date* according to local + * time. A value of `0` indicates January, `1` indicates February, and so on. + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The month value in local time. + */ + month: (date) => t(date).getMonth(), + + /** + * Returns the week number of the year (0-53) for the specified *date* + * according to local time. By default, Sunday is used as the first day + * of the week. All days in a new year preceding the first Sunday are + * considered to be in week 0. + * @param {Date | number} date The input Date or timestamp value. + * @param {number} firstday The number of first day of the week (default + * `0` for Sunday, `1` for Monday and so on). + * @return {number} The week of the year in local time. + */ week, + + /** + * Returns the date (day of month) of the specified *date* according + * to local time. + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The date (day of month) value. + */ + date: (date) => t(date).getDate(), + + /** + * Returns the day of the year (1-366) of the specified *date* according + * to local time. + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The day of the year in local time. + */ + dayofyear, + + /** + * Returns the Sunday-based day of the week (0-6) of the specified *date* + * according to local time. A value of `0` indicates Sunday, `1` indicates + * Monday, and so on. + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The day of the week value in local time. + */ + dayofweek: (date) => t(date).getDay(), + + /** + * Returns the hour of the day for the specified *date* according + * to local time. + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The hour value in local time. + */ + hours: (date) => t(date).getHours(), + + /** + * Returns the minute of the hour for the specified *date* according + * to local time. + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The minutes value in local time. + */ + minutes: (date) => t(date).getMinutes(), + + /** + * Returns the seconds of the minute for the specified *date* according + * to local time. + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The seconds value in local time. + */ + seconds: (date) => t(date).getSeconds(), + + /** + * Returns the milliseconds of the second for the specified *date* according + * to local time. + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The milliseconds value in local time. + */ + milliseconds: (date) => t(date).getMilliseconds(), + + /** + * Creates and returns a new Date value using Coordinated Universal Time + * (UTC). If no arguments are provided, the current date and time are used. + * @param {number} [year] The year. + * @param {number} [month=0] The (zero-based) month. + * @param {number} [date=1] The date within the month. + * @param {number} [hours=0] The hour within the day. + * @param {number} [minutes=0] The minute within the hour. + * @param {number} [seconds=0] The second within the minute. + * @param {number} [milliseconds=0] The milliseconds within the second. + * @return {Date} The Date value. + */ utcdatetime, - utcdayofyear, + + /** + * Returns the year of the specified *date* according to Coordinated + * Universal Time (UTC). + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The year value in UTC time. + */ + utcyear: (date) => t(date).getUTCFullYear(), + + /** + * Returns the zero-based quarter of the specified *date* according to + * Coordinated Universal Time (UTC) + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The quarter value in UTC time. + */ + utcquarter: (date) => Math.floor(t(date).getUTCMonth() / 3), + + /** + * Returns the zero-based month of the specified *date* according to + * Coordinated Universal Time (UTC). A value of `0` indicates January, + * `1` indicates February, and so on. + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The month value in UTC time. + */ + utcmonth: (date) => t(date).getUTCMonth(), + + /** + * Returns the week number of the year (0-53) for the specified *date* + * according to Coordinated Universal Time (UTC). By default, Sunday is + * used as the first day of the week. All days in a new year preceding the + * first Sunday are considered to be in week 0. + * @param {Date | number} date The input Date or timestamp value. + * @param {number} firstday The number of first day of the week (default + * `0` for Sunday, `1` for Monday and so on). + * @return {number} The week of the year in UTC time. + */ utcweek, - now: Date.now + + /** + * Returns the date (day of month) of the specified *date* according to + * Coordinated Universal Time (UTC). + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The date (day of month) value in UTC time. + */ + utcdate: (date) => t(date).getUTCDate(), + + /** + * Returns the day of the year (1-366) of the specified *date* according + * to Coordinated Universal Time (UTC). + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The day of the year in UTC time. + */ + utcdayofyear, + + /** + * Returns the Sunday-based day of the week (0-6) of the specified *date* + * according to Coordinated Universal Time (UTC). A value of `0` indicates + * Sunday, `1` indicates Monday, and so on. + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The day of the week in UTC time. + */ + utcdayofweek: (date) => t(date).getUTCDay(), + + /** + * Returns the hour of the day for the specified *date* according to + * Coordinated Universal Time (UTC). + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The hours value in UTC time. + */ + utchours: (date) => t(date).getUTCHours(), + + /** + * Returns the minute of the hour for the specified *date* according to + * Coordinated Universal Time (UTC). + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The minutes value in UTC time. + */ + utcminutes: (date) => t(date).getUTCMinutes(), + + /** + * Returns the seconds of the minute for the specified *date* according to + * Coordinated Universal Time (UTC). + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The seconds value in UTC time. + */ + utcseconds: (date) => t(date).getUTCSeconds(), + + /** + * Returns the milliseconds of the second for the specified *date* according to + * Coordinated Universal Time (UTC). + * @param {Date | number} date The input Date or timestamp value. + * @return {number} The milliseconds value in UTC time. + */ + utcmilliseconds: (date) => t(date).getUTCMilliseconds() }; diff --git a/src/op/functions/json.js b/src/op/functions/json.js index bd980eb6..d0e4e6ef 100644 --- a/src/op/functions/json.js +++ b/src/op/functions/json.js @@ -1,4 +1,16 @@ export default { - parse_json: (str) => JSON.parse(str), - to_json: (val) => JSON.stringify(val) + /** + * Parses a string *value* in JSON format, constructing the JavaScript + * value or object described by the string. + * @param {string} value The input string value. + * @return {any} The parsed JSON. + */ + parse_json: (value) => JSON.parse(value), + + /** + * Converts a JavaScript object or value to a JSON string. + * @param {*} value The value to convert to a JSON string. + * @return {string} The JSON string. + */ + to_json: (value) => JSON.stringify(value) }; diff --git a/src/op/functions/math.js b/src/op/functions/math.js index df891fad..b7064fff 100644 --- a/src/op/functions/math.js +++ b/src/op/functions/math.js @@ -1,43 +1,300 @@ import { random } from '../../util/random.js'; export default { + /** + * Return a random floating point number between 0 (inclusive) and 1 + * (exclusive). By default uses *Math.random*. Use the *seed* method + * to instead use a seeded random number generator. + * @return {number} A pseudorandom number between 0 and 1. + */ random, - is_nan: Number.isNaN, + + /** + * Tests if the input *value* is not a number (`NaN`); equivalent + * to *Number.isNaN*. + * @param {*} value The value to test. + * @return {boolean} True if the value is not a number, false otherwise. + */ + is_nan: Number.isNaN, + + /** + * Tests if the input *value* is finite; equivalent to *Number.isFinite*. + * @param {*} value The value to test. + * @return {boolean} True if the value is finite, false otherwise. + */ is_finite: Number.isFinite, - abs: Math.abs, - cbrt: Math.cbrt, - ceil: Math.ceil, - clz32: Math.clz32, - exp: Math.exp, - expm1: Math.expm1, - floor: Math.floor, - fround: Math.fround, + /** + * Returns the absolute value of the input *value*; equivalent to *Math.abs*. + * @param {number} value The input number value. + * @return {number} The absolute value. + */ + abs: Math.abs, + + /** + * Returns the cube root value of the input *value*; equivalent to + * *Math.cbrt*. + * @param {number} value The input number value. + * @return {number} The cube root value. + */ + cbrt: Math.cbrt, + + /** + * Returns the ceiling of the input *value*, the nearest integer equal to + * or greater than the input; equivalent to *Math.ceil*. + * @param {number} value The input number value. + * @return {number} The ceiling value. + */ + ceil: Math.ceil, + + /** + * Returns the number of leading zero bits in the 32-bit binary + * representation of a number *value*; equivalent to *Math.clz32*. + * @param {number} value The input number value. + * @return {number} The leading zero bits value. + */ + clz32: Math.clz32, + + /** + * Returns *evalue*, where *e* is Euler's number, the base of the + * natural logarithm; equivalent to *Math.exp*. + * @param {number} value The input number value. + * @return {number} The base-e exponentiated value. + */ + exp: Math.exp, + + /** + * Returns *evalue - 1*, where *e* is Euler's number, the base of + * the natural logarithm; equivalent to *Math.expm1*. + * @param {number} value The input number value. + * @return {number} The base-e exponentiated value minus 1. + */ + expm1: Math.expm1, + + /** + * Returns the floor of the input *value*, the nearest integer equal to or + * less than the input; equivalent to *Math.floor*. + * @param {number} value The input number value. + * @return {number} The floor value. + */ + floor: Math.floor, + + /** + * Returns the nearest 32-bit single precision float representation of the + * input number *value*; equivalent to *Math.fround*. Useful for translating + * between 64-bit `Number` values and values from a `Float32Array`. + * @param {number} value The input number value. + * @return {number} The rounded value. + */ + fround: Math.fround, + + /** + * Returns the greatest (maximum) value among the input *values*; equivalent + * to *Math.max*. This is _not_ an aggregate function, see *op.max* to + * compute a maximum value across multiple rows. + * @param {...number} values The input number values. + * @return {number} The greatest (maximum) value among the inputs. + */ greatest: Math.max, - least: Math.min, - log: Math.log, - log10: Math.log10, - log1p: Math.log1p, - log2: Math.log2, - pow: Math.pow, - round: Math.round, - sign: Math.sign, - sqrt: Math.sqrt, - trunc: Math.trunc, - - degrees: (rad) => 180 * rad / Math.PI, - radians: (deg) => Math.PI * deg / 180, - acos: Math.acos, - acosh: Math.acosh, - asin: Math.asin, - asinh: Math.asinh, - atan: Math.atan, - atan2: Math.atan2, - atanh: Math.atanh, - cos: Math.cos, - cosh: Math.cosh, - sin: Math.sin, - sinh: Math.sinh, - tan: Math.tan, - tanh: Math.tanh + + /** + * Returns the least (minimum) value among the input *values*; equivalent + * to *Math.min*. This is _not_ an aggregate function, see *op.min* to + * compute a minimum value across multiple rows. + * @param {...number} values The input number values. + * @return {number} The least (minimum) value among the inputs. + */ + least: Math.min, + + /** + * Returns the natural logarithm (base *e*) of a number *value*; equivalent + * to *Math.log*. + * @param {number} value The input number value. + * @return {number} The base-e log value. + */ + log: Math.log, + + /** + * Returns the base 10 logarithm of a number *value*; equivalent + * to *Math.log10*. + * @param {number} value The input number value. + * @return {number} The base-10 log value. + */ + log10: Math.log10, + + /** + * Returns the natural logarithm (base *e*) of 1 + a number *value*; + * equivalent to *Math.log1p*. + * @param {number} value The input number value. + * @return {number} The base-e log of value + 1. + */ + log1p: Math.log1p, + + /** + * Returns the base 2 logarithm of a number *value*; equivalent + * to *Math.log2*. + * @param {number} value The input number value. + * @return {number} The base-2 log value. + */ + log2: Math.log2, + + /** + * Returns the *base* raised to the *exponent* power, that is, + * *base**exponent*; equivalent to *Math.pow*. + * @param {number} base The base number value. + * @param {number} exponent The exponent number value. + * @return {number} The exponentiated value. + */ + pow: Math.pow, + + /** + * Returns the value of a number rounded to the nearest integer; + * equivalent to *Math.round*. + * @param {number} value The input number value. + * @return {number} The rounded value. + */ + round: Math.round, + + /** + * Returns either a positive or negative +/- 1, indicating the sign of the + * input *value*; equivalent to *Math.sign*. + * @param {number} value The input number value. + * @return {number} The sign of the value. + */ + sign: Math.sign, + + /** + * Returns the square root of the input *value*; equivalent to *Math.sqrt*. + * @param {number} value The input number value. + * @return {number} The square root value. + */ + sqrt: Math.sqrt, + + /** + * Returns the integer part of a number by removing any fractional digits; + * equivalent to *Math.trunc*. + * @param {number} value The input number value. + * @return {number} The truncated value. + */ + trunc: Math.trunc, + + /** + * Converts the input *radians* value to degrees. + * @param {number} radians The input radians value. + * @return {number} The value in degrees + */ + degrees: (radians) => 180 * radians / Math.PI, + + /** + * Converts the input *degrees* value to radians. + * @param {number} degrees The input degrees value. + * @return {number} The value in radians. + */ + radians: (degrees) => Math.PI * degrees / 180, + + /** + * Returns the arc-cosine (in radians) of a number *value*; + * equivalent to *Math.acos*. + * @param {number} value The input number value. + * @return {number} The arc-cosine value. + */ + acos: Math.acos, + + /** + * Returns the hyperbolic arc-cosine of a number *value*; + * equivalent to *Math.acosh*. + * @param {number} value The input number value. + * @return {number} The hyperbolic arc-cosine value. + */ + acosh: Math.acosh, + + /** + * Returns the arc-sine (in radians) of a number *value*; + * equivalent to *Math.asin*. + * @param {number} value The input number value. + * @return {number} The arc-sine value. + */ + asin: Math.asin, + + /** + * Returns the hyperbolic arc-sine of a number *value*; + * equivalent to *Math.asinh*. + * @param {number} value The input number value. + * @return {number} The hyperbolic arc-sine value. + */ + asinh: Math.asinh, + + /** + * Returns the arc-tangent (in radians) of a number *value*; + * equivalent to *Math.atan*. + * @param {number} value The input number value. + * @return {number} The arc-tangent value. + */ + atan: Math.atan, + + /** + * Returns the angle in the plane (in radians) between the positive x-axis + * and the ray from (0, 0) to the point (*x*, *y*); + * equivalent to *Math.atan2*. + * @param {number} y The y coordinate of the point. + * @param {number} x The x coordinate of the point. + * @return {number} The arc-tangent angle. + */ + atan2: Math.atan2, + + /** + * Returns the hyperbolic arc-tangent of a number *value*; + * equivalent to *Math.atanh*. + * @param {number} value The input number value. + * @return {number} The hyperbolic arc-tangent value. + */ + atanh: Math.atanh, + + /** + * Returns the cosine (in radians) of a number *value*; + * equivalent to *Math.cos*. + * @param {number} value The input number value. + * @return {number} The cosine value. + */ + cos: Math.cos, + + /** + * Returns the hyperbolic cosine (in radians) of a number *value*; + * equivalent to *Math.cosh*. + * @param {number} value The input number value. + * @return {number} The hyperbolic cosine value. + */ + cosh: Math.cosh, + + /** + * Returns the sine (in radians) of a number *value*; + * equivalent to *Math.sin*. + * @param {number} value The input number value. + * @return {number} The sine value. + */ + sin: Math.sin, + + /** + * Returns the hyperbolic sine (in radians) of a number *value*; + * equivalent to *Math.sinh*. + * @param {number} value The input number value. + * @return {number} The hyperbolic sine value. + */ + sinh: Math.sinh, + + /** + * Returns the tangent (in radians) of a number *value*; + * equivalent to *Math.tan*. + * @param {number} value The input number value. + * @return {number} The tangent value. + */ + tan: Math.tan, + + /** + * Returns the hyperbolic tangent (in radians) of a number *value*; + * equivalent to *Math.tanh*. + * @param {number} value The input number value. + * @return {number} The hyperbolic tangent value. + */ + tanh: Math.tanh }; diff --git a/src/op/functions/object.js b/src/op/functions/object.js index e6d570fe..a00f7d7b 100644 --- a/src/op/functions/object.js +++ b/src/op/functions/object.js @@ -8,17 +8,67 @@ function array(iter) { } export default { - has: (obj, key) => isMapOrSet(obj) ? obj.has(key) - : obj != null ? has(obj, key) - : false, - keys: (obj) => isMap(obj) ? array(obj.keys()) - : obj != null ? Object.keys(obj) - : [], - values: (obj) => isMapOrSet(obj) ? array(obj.values()) - : obj != null ? Object.values(obj) - : [], - entries: (obj) => isMapOrSet(obj) ? array(obj.entries()) - : obj != null ? Object.entries(obj) - : [], - object: (entries) => entries ? Object.fromEntries(entries) : NULL + /** + * Returns a boolean indicating whether the *object* has the specified *key* + * as its own property (as opposed to inheriting it). If the *object* is a + * *Map* or *Set* instance, the *has* method will be invoked directly on the + * object, otherwise *Object.hasOwnProperty* is used. + * @template K, V + * @param {Map|Set|Record} object The object, Map, or Set to + * test for property membership. + * @param {K} key The property key to test for. + * @return {boolean} True if the object has the given key, false otherwise. + */ + has: (object, key) => isMapOrSet(object) ? object.has(key) + : object != null ? has(object, `${key}`) + : false, + + /** + * Returns an array of a given *object*'s own enumerable property names. If + * the *object* is a *Map* instance, the *keys* method will be invoked + * directly on the object, otherwise *Object.keys* is used. + * @template K, V + * @param {Map|Record} object The input object or Map value. + * @return {K[]} An array of property key name strings. + */ + keys: (object) => isMap(object) ? array(object.keys()) + : object != null ? Object.keys(object) + : [], + + /** + * Returns an array of a given *object*'s own enumerable property values. If + * the *object* is a *Map* or *Set* instance, the *values* method will be + * invoked directly on the object, otherwise *Object.values* is used. + * @template K, V + * @param {Map|Set|Record} object The input + * object, Map, or Set value. + * @return {V[]} An array of property values. + */ + values: (object) => isMapOrSet(object) ? array(object.values()) + : object != null ? Object.values(object) + : [], + + /** + * Returns an array of a given *object*'s own enumerable keyed property + * `[key, value]` pairs. If the *object* is a *Map* or *Set* instance, the + * *entries* method will be invoked directly on the object, otherwise + * *Object.entries* is used. + * @template K, V + * @param {Map|Set|Record} object The input + * object, Map, or Set value. + * @return {[K, V][]} An array of property values. + */ + entries: (object) => isMapOrSet(object) ? array(object.entries()) + : object != null ? Object.entries(object) + : [], + + /** + * Returns a new object given iterable *entries* of `[key, value]` pairs. + * This method is Arquero's version of the *Object.fromEntries* method. + * @template K, V + * @param {Iterable<[K, V]>} entries An iterable collection of `[key, value]` + * pairs, such as an array of two-element arrays or a *Map*. + * @return {Record} An object of consolidated key-value pairs. + */ + object: (entries) => entries ? Object.fromEntries(entries) : NULL }; diff --git a/src/op/functions/recode.js b/src/op/functions/recode.js index 5cfc13e5..34b9b5eb 100644 --- a/src/op/functions/recode.js +++ b/src/op/functions/recode.js @@ -5,20 +5,22 @@ import has from '../../util/has.js'; * value map. If a fallback value is specified, it will be returned when * a matching value is not found in the map; otherwise, the input value * is returned unchanged. - * @param {*} value The value to recode. The value must be safely + * @template T + * @param {T} value The value to recode. The value must be safely * coercible to a string for lookup against the value map. - * @param {object|Map} map An object or Map with input values for keys and - * output recoded values as values. If a non-Map object, only the object's - * own properties will be considered. - * @param {*} [fallback] A default fallback value to use if the input + * @param {Map|Record} map An object or Map with input values + * for keys and output recoded values as values. If a non-Map object, only + * the object's own properties will be considered. + * @param {T} [fallback] A default fallback value to use if the input * value is not found in the value map. - * @return {*} The recoded value. + * @return {T} The recoded value. */ export default function(value, map, fallback) { if (map instanceof Map) { if (map.has(value)) return map.get(value); - } else if (has(map, value)) { - return map[value]; + } else { + const key = `${value}`; + if (has(map, key)) return map[key]; } return fallback !== undefined ? fallback : value; } diff --git a/src/op/functions/string.js b/src/op/functions/string.js index 06a7268e..581f795f 100644 --- a/src/op/functions/string.js +++ b/src/op/functions/string.js @@ -1,36 +1,222 @@ export default { - parse_date: (str) => str == null ? str : new Date(str), - parse_float: (str) => str == null ? str : Number.parseFloat(str), - parse_int: (str, radix) => str == null ? str : Number.parseInt(str, radix), - endswith: (str, search, length) => str == null ? false - : String(str).endsWith(search, length), - match: (str, regexp, index) => { - const m = str == null ? str : String(str).match(regexp); - return index == null || m == null ? m - : typeof index === 'number' ? m[index] - : m.groups ? m.groups[index] - : null; - }, - normalize: (str, form) => str == null ? str - : String(str).normalize(form), - padend: (str, len, fill) => str == null ? str - : String(str).padEnd(len, fill), - padstart: (str, len, fill) => str == null ? str - : String(str).padStart(len, fill), - upper: (str) => str == null ? str - : String(str).toUpperCase(), - lower: (str) => str == null ? str - : String(str).toLowerCase(), - repeat: (str, num) => str == null ? str - : String(str).repeat(num), - replace: (str, pattern, replacement) => str == null ? str - : String(str).replace(pattern, String(replacement)), - substring: (str, start, end) => str == null ? str - : String(str).substring(start, end), - split: (str, separator, limit) => str == null ? [] - : String(str).split(separator, limit), - startswith: (str, search, length) => str == null ? false - : String(str).startsWith(search, length), - trim: (str) => str == null ? str - : String(str).trim() + /** + * Parses a string *value* and returns a Date instance. Beware: this method + * uses JavaScript's *Date.parse()* functionality, which is inconsistently + * implemented across browsers. That said, + * [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) formatted strings such + * as those produced by *op.format_date* and *op.format_utcdate* should be + * supported across platforms. Note that "bare" ISO date strings such as + * `"2001-01-01"` are interpreted by JavaScript as indicating midnight of + * that day in Coordinated Universal Time (UTC), *not* local time. To + * indicate the local timezone, an ISO string can include additional time + * components and no `Z` suffix: `"2001-01-01T00:00"`. + * @param {*} value The input value. + * @return {Date} The parsed date value. + */ + parse_date: (value) => value == null ? value : new Date(value), + + /** + * Parses a string *value* and returns a floating point number. + * @param {*} value The input value. + * @return {number} The parsed number value. + */ + parse_float: (value) => value == null ? value : Number.parseFloat(value), + + /** + * Parses a string *value* and returns an integer of the specified radix + * (the base in mathematical numeral systems). + * @param {*} value The input value. + * @param {number} [radix] An integer between 2 and 36 that represents the + * radix (the base in mathematical numeral systems) of the string. Be + * careful: this does not default to 10! If *radix* is `undefined`, `0`, + * or unspecified, JavaScript assumes the following: If the input string + * begins with `"0x"` or `"0X"` (a zero, followed by lowercase or + * uppercase X), the radix is assumed to be 16 and the rest of the string + * is parsed as a hexidecimal number. If the input string begins with `"0"` + * (a zero), the radix is assumed to be 8 (octal) or 10 (decimal). Exactly + * which radix is chosen is implementation-dependent. If the input string + * begins with any other value, the radix is 10 (decimal). + * @return {number} The parsed integer value. + */ + parse_int: (value, radix) => value == null ? value + : Number.parseInt(value, radix), + + /** + * Determines whether a string *value* ends with the characters of a + * specified *search* string, returning `true` or `false` as appropriate. + * @param {any} value The input string value. + * @param {string} search The search string to test for. + * @param {number} [length] If provided, used as the length of *value* + * (default `value.length`). + * @return {boolean} True if the value ends with the search string, + * false otherwise. + */ + endswith: (value, search, length) => value == null ? false + : String(value).endsWith(search, length), + + /** + * Retrieves the result of matching a string *value* against a regular + * expression *regexp*. If no *index* is specified, returns an array + * whose contents depend on the presence or absence of the regular + * expression global (`g`) flag, or `null` if no matches are found. If the + * `g` flag is used, all results matching the complete regular expression + * will be returned, but capturing groups will not. If the `g` flag is not + * used, only the first complete match and its related capturing groups are + * returned. + * + * If specified, the *index* looks up a value of the resulting match. If + * *index* is a number, the corresponding index of the result array is + * returned. If *index* is a string, the value of the corresponding + * named capture group is returned, or `null` if there is no such group. + * @param {*} value The input string value. + * @param {*} regexp The regular expression to match against. + * @param {number|string} index The index into the match result array + * or capture group. + * @return {string|string[]} The match result. + */ + match: (value, regexp, index) => { + const m = value == null ? value : String(value).match(regexp); + return index == null || m == null ? m + : typeof index === 'number' ? m[index] + : m.groups ? m.groups[index] + : null; + }, + + /** + * Returns the Unicode normalization form of the string *value*. + * @param {*} value The input value to normalize. + * @param {string} form The Unicode normalization form, one of + * `'NFC'` (default, canonical decomposition, followed by canonical + * composition), `'NFD'` (canonical decomposition), `'NFKC'` (compatibility + * decomposition, followed by canonical composition), + * or `'NFKD'` (compatibility decomposition). + * @return {string} The normalized string value. + */ + normalize: (value, form) => value == null ? value + : String(value).normalize(form), + + /** + * Pad a string *value* with a given *fill* string (applied from the end of + * *value* and repeated, if needed) so that the resulting string reaches a + * given *length*. + * @param {*} value The input value to pad. + * @param {number} length The length of the resulting string once the + * *value* string has been padded. If the length is lower than + * `value.length`, the *value* string will be returned as-is. + * @param {string} [fill] The string to pad the *value* string with + * (default `''`). If *fill* is too long to stay within the target + * *length*, it will be truncated: for left-to-right languages the + * left-most part and for right-to-left languages the right-most will + * be applied. + * @return {string} The padded string. + */ + padend: (value, length, fill) => value == null ? value + : String(value).padEnd(length, fill), + + /** + * Pad a string *value* with a given *fill* string (applied from the start + * of *value* and repeated, if needed) so that the resulting string reaches + * a given *length*. + * @param {*} value The input value to pad. + * @param {number} length The length of the resulting string once the + * *value* string has been padded. If the length is lower than + * `value.length`, the *value* string will be returned as-is. + * @param {string} [fill] The string to pad the *value* string with + * (default `''`). If *fill* is too long to stay within the target + * *length*, it will be truncated: for left-to-right languages the + * left-most part and for right-to-left languages the right-most will + * be applied. + * @return {string} The padded string. + */ + padstart: (value, length, fill) => value == null ? value + : String(value).padStart(length, fill), + + /** + * Returns the string *value* converted to upper case. + * @param {*} value The input string value. + * @return {string} The upper case string. + */ + upper: (value) => value == null ? value : String(value).toUpperCase(), + + /** + * Returns the string *value* converted to lower case. + * @param {*} value The input string value. + * @return {string} The lower case string. + */ + lower: (value) => value == null ? value : String(value).toLowerCase(), + + /** + * Returns a new string which contains the specified *number* of copies of + * the *value* string concatenated together. + * @param {*} value The input string to repeat. + * @param {*} number An integer between `0` and `+Infinity`, indicating the + * number of times to repeat the string. + * @return {string} The repeated string. + */ + repeat: (value, number) => value == null ? value + : String(value).repeat(number), + + /** + * Returns a new string with some or all matches of a *pattern* replaced by + * a *replacement*. The *pattern* can be a string or a regular expression, + * and the *replacement* must be a string. If *pattern* is a string, only + * the first occurrence will be replaced; to make multiple replacements, use + * a regular expression *pattern* with a `g` (global) flag. + * @param {*} value The input string value. + * @param {*} pattern The pattern string or regular expression to replace. + * @param {*} replacement The replacement string to use. + * @return {string} The string with patterns replaced. + */ + replace: (value, pattern, replacement) => value == null ? value + : String(value).replace(pattern, String(replacement)), + + /** + * Divides a string *value* into an ordered list of substrings based on a + * *separator* pattern, puts these substrings into an array, and returns the + * array. + * @param {*} value The input string value. + * @param {*} separator A string or regular expression pattern describing + * where each split should occur. + * @param {number} [limit] An integer specifying a limit on the number of + * substrings to be included in the array. + * @return {string[]} + */ + split: (value, separator, limit) => value == null ? [] + : String(value).split(separator, limit), + + /** + * Determines whether a string *value* starts with the characters of a + * specified *search* string, returning `true` or `false` as appropriate. + * @param {*} value The input string value. + * @param {string} search The search string to test for. + * @param {number} [position=0] The position in the *value* string at which + * to begin searching (default `0`). + * @return {boolean} True if the string starts with the search pattern, + * false otherwise. + */ + startswith: (value, search, position) => value == null ? false + : String(value).startsWith(search, position), + + /** + * Returns the part of the string *value* between the *start* and *end* + * indexes, or to the end of the string. + * @param {*} value The input string value. + * @param {number} [start=0] The index of the first character to include in + * the returned substring (default `0`). + * @param {number} [end] The index of the first character to exclude from + * the returned substring (default `value.length`). + * @return {string} The substring. + */ + substring: (value, start, end) => value == null ? value + : String(value).substring(start, end), + + /** + * Returns a new string with whitespace removed from both ends of the input + * *value* string. Whitespace in this context is all the whitespace + * characters (space, tab, no-break space, etc.) and all the line terminator + * characters (LF, CR, etc.). + * @param {*} value The input string value to trim. + * @return {string} The trimmed string. + */ + trim: (value) => value == null ? value : String(value).trim() }; diff --git a/src/op/index.js b/src/op/index.js index b341780b..b4c032b6 100644 --- a/src/op/index.js +++ b/src/op/index.js @@ -37,8 +37,8 @@ export function hasWindow(name) { /** * Get an aggregate function definition. * @param {string} name The name of the aggregate function. - * @return {AggregateDef} The aggregate function definition, - * or undefined if not found. + * @return {import('./aggregate-functions.js').AggregateDef} + * The aggregate function definition, or undefined if not found. */ export function getAggregate(name) { return hasAggregate(name) && aggregateFunctions[name]; @@ -47,8 +47,8 @@ export function getAggregate(name) { /** * Get a window function definition. * @param {string} name The name of the window function. - * @return {WindowDef} The window function definition, - * or undefined if not found. + * @return {import('./window-functions.js').WindowDef} + * The window function definition, or undefined if not found. */ export function getWindow(name) { return hasWindow(name) && windowFunctions[name]; diff --git a/src/op/op-api.js b/src/op/op-api.js index 5d67bcab..534846b6 100644 --- a/src/op/op-api.js +++ b/src/op/op-api.js @@ -1,5 +1,33 @@ import functions from './functions/index.js'; -import op from './op.js'; +import toArray from '../util/to-array.js'; +import toString from '../util/to-string.js'; + +export class Op { + constructor(name, fields, params) { + this.name = name; + this.fields = fields; + this.params = params; + } + toString() { + const args = [ + ...this.fields.map(f => `d[${toString(f)}]`), + ...this.params.map(toString) + ]; + return `d => op.${this.name}(${args})`; + } + toObject() { + return { expr: this.toString(), func: true }; + } +} + +/** + * @param {string} name + * @param {any | any[]} [fields] + * @param {any | any[]} [params] + */ +export function op(name, fields = [], params = []) { + return new Op(name, toArray(fields), toArray(params)); +} export const any = (field) => op('any', field); export const count = () => op('count'); @@ -10,7 +38,7 @@ export const object_agg = (key, value) => op('object_agg', [key, value]); export const entries_agg = (key, value) => op('entries_agg', [key, value]); /** - * @typedef {import('../table/transformable').Struct} Struct + * @typedef {import('../table/types.js').Struct} Struct */ /** @@ -36,47 +64,53 @@ export default { /** * Aggregate function returning an arbitrary observed value. - * @param {*} field The data field. - * @return {*} An arbitrary observed value. + * @template T + * @param {T} field The data field. + * @return {T} An arbitrary observed value. */ any, /** * Aggregate function to collect an array of values. - * @param {*} field The data field. - * @return {Array} A list of values. + * @template T + * @param {T} field The data field. + * @return {Array} A list of values. */ array_agg, /** * Aggregate function to collect an array of distinct (unique) values. - * @param {*} field The data field. - * @return {Array} An array of unique values. + * @template T + * @param {T} field The data field. + * @return {Array} An array of unique values. */ array_agg_distinct, /** * Aggregate function to create an object given input key and value fields. - * @param {*} key The object key field. - * @param {*} value The object value field. - * @return {Struct} An object of key-value pairs. + * @template K, V + * @param {K} key The object key field. + * @param {V} value The object value field. + * @return {Record} An object of key-value pairs. */ object_agg, /** * Aggregate function to create a Map given input key and value fields. - * @param {*} key The object key field. - * @param {*} value The object value field. - * @return {Map} A Map of key-value pairs. + * @template K, V + * @param {K} key The object key field. + * @param {V} value The object value field. + * @return {Map} A Map of key-value pairs. */ map_agg, /** * Aggregate function to create an array in the style of Object.entries() * given input key and value fields. - * @param {*} key The object key field. - * @param {*} value The object value field. - * @return {[[any, any]]} An array of [key, value] arrays. + * @template K, V + * @param {K} key The object key field. + * @param {V} value The object value field. + * @return {[K, V][]} An array of [key, value] arrays. */ entries_agg, @@ -86,6 +120,7 @@ export default { * @param {*} field The data field. * @return {number} The count of valid values. */ + // @ts-ignore valid: (field) => op('valid', field), /** @@ -94,6 +129,7 @@ export default { * @param {*} field The data field. * @return {number} The count of invalid values. */ + // @ts-ignore invalid: (field) => op('invalid', field), /** @@ -101,20 +137,24 @@ export default { * @param {*} field The data field. * @return {number} The count of distinct values. */ + // @ts-ignore distinct: (field) => op('distinct', field), /** * Aggregate function to determine the mode (most frequent) value. - * @param {*} field The data field. - * @return {number} The mode value. + * @template T + * @param {T} field The data field. + * @return {T} The mode value. */ + // @ts-ignore mode: (field) => op('mode', field), /** * Aggregate function to sum values. - * @param {string} field The data field. + * @param {*} field The data field. * @return {number} The sum of the values. */ + // @ts-ignore sum: (field) => op('sum', field), /** @@ -122,6 +162,7 @@ export default { * @param {*} field The data field. * @return {number} The product of the values. */ + // @ts-ignore product: (field) => op('product', field), /** @@ -129,6 +170,7 @@ export default { * @param {*} field The data field. * @return {number} The mean (average) of the values. */ + // @ts-ignore mean: (field) => op('mean', field), /** @@ -136,6 +178,7 @@ export default { * @param {*} field The data field. * @return {number} The average (mean) of the values. */ + // @ts-ignore average: (field) => op('average', field), /** @@ -143,6 +186,7 @@ export default { * @param {*} field The data field. * @return {number} The sample variance of the values. */ + // @ts-ignore variance: (field) => op('variance', field), /** @@ -150,6 +194,7 @@ export default { * @param {*} field The data field. * @return {number} The population variance of the values. */ + // @ts-ignore variancep: (field) => op('variancep', field), /** @@ -157,6 +202,7 @@ export default { * @param {*} field The data field. * @return {number} The sample standard deviation of the values. */ + // @ts-ignore stdev: (field) => op('stdev', field), /** @@ -164,20 +210,25 @@ export default { * @param {*} field The data field. * @return {number} The population standard deviation of the values. */ + // @ts-ignore stdevp: (field) => op('stdevp', field), /** * Aggregate function for the minimum value. - * @param {*} field The data field. - * @return {number} The minimum value. + * @template T + * @param {T} field The data field. + * @return {T} The minimum value. */ + // @ts-ignore min: (field) => op('min', field), /** * Aggregate function for the maximum value. - * @param {*} field The data field. - * @return {number} The maximum value. + * @template T + * @param {T} field The data field. + * @return {T} The maximum value. */ + // @ts-ignore max: (field) => op('max', field), /** @@ -187,6 +238,7 @@ export default { * @param {number} p The probability threshold. * @return {number} The quantile value. */ + // @ts-ignore quantile: (field, p) => op('quantile', field, p), /** @@ -195,6 +247,7 @@ export default { * @param {*} field The data field. * @return {number} The median value. */ + // @ts-ignore median: (field) => op('median', field), /** @@ -203,6 +256,7 @@ export default { * @param {*} field2 The second data field. * @return {number} The sample covariance of the values. */ + // @ts-ignore covariance: (field1, field2) => op('covariance', [field1, field2]), /** @@ -211,6 +265,7 @@ export default { * @param {*} field2 The second data field. * @return {number} The population covariance of the values. */ + // @ts-ignore covariancep: (field1, field2) => op('covariancep', [field1, field2]), /** @@ -221,6 +276,7 @@ export default { * @param {*} field2 The second data field. * @return {number} The correlation between the field values. */ + // @ts-ignore corr: (field1, field2) => op('corr', [field1, field2]), /** @@ -235,13 +291,18 @@ export default { * If specified, the maxbins and minstep arguments are ignored. * @return {[number, number, number]} The bin [min, max, and step] values. */ - bins: (field, maxbins, nice, minstep) => - op('bins', field, [maxbins, nice, minstep]), + // @ts-ignore + bins: (field, maxbins, nice, minstep, step) => op( + 'bins', + field, + [maxbins, nice, minstep, step] + ), /** * Window function to assign consecutive row numbers, starting from 1. * @return {number} The row number value. */ + // @ts-ignore row_number: () => op('row_number'), /** @@ -251,6 +312,7 @@ export default { * rank 1, the third value is assigned rank 3. * @return {number} The rank value. */ + // @ts-ignore rank: () => op('rank'), /** @@ -259,6 +321,7 @@ export default { * indices: if the first two values tie, both will be assigned rank 1.5. * @return {number} The peer-averaged rank value. */ + // @ts-ignore avg_rank: () => op('avg_rank'), /** @@ -268,6 +331,7 @@ export default { * values tie for rank 1, the third value is assigned rank 2. * @return {number} The dense rank value. */ + // @ts-ignore dense_rank: () => op('dense_rank'), /** @@ -275,6 +339,7 @@ export default { * The percent is calculated as (rank - 1) / (group_size - 1). * @return {number} The percentage rank value. */ + // @ts-ignore percent_rank: () => op('percent_rank'), /** @@ -282,6 +347,7 @@ export default { * to each value in a group. * @return {number} The cumulative distribution value. */ + // @ts-ignore cume_dist: () => op('cume_dist'), /** @@ -291,68 +357,83 @@ export default { * @param {number} num The number of buckets for ntile calculation. * @return {number} The quantile value. */ + // @ts-ignore ntile: (num) => op('ntile', null, num), /** * Window function to assign a value that precedes the current value by * a specified number of positions. If no such value exists, returns a * default value instead. - * @param {*} field The data field. + * @template T + * @param {T} field The data field. * @param {number} [offset=1] The lag offset from the current value. - * @param {*} [defaultValue=undefined] The default value. - * @return {*} The lagging value. + * @param {T} [defaultValue=undefined] The default value. + * @return {T} The lagging value. */ + // @ts-ignore lag: (field, offset, defaultValue) => op('lag', field, [offset, defaultValue]), /** * Window function to assign a value that follows the current value by * a specified number of positions. If no such value exists, returns a * default value instead. - * @param {*} field The data field. + * @template T + * @param {T} field The data field. * @param {number} [offset=1] The lead offset from the current value. - * @param {*} [defaultValue=undefined] The default value. - * @return {*} The leading value. + * @param {T} [defaultValue=undefined] The default value. + * @return {T} The leading value. */ + // @ts-ignore lead: (field, offset, defaultValue) => op('lead', field, [offset, defaultValue]), /** * Window function to assign the first value in a sliding window frame. - * @param {*} field The data field. - * @return {*} The first value in the current frame. + * @template T + * @param {T} field The data field. + * @return {T} The first value in the current frame. */ + // @ts-ignore first_value: (field) => op('first_value', field), /** * Window function to assign the last value in a sliding window frame. - * @param {*} field The data field. - * @return {*} The last value in the current frame. + * @template T + * @param {T} field The data field. + * @return {T} The last value in the current frame. */ + // @ts-ignore last_value: (field) => op('last_value', field), /** * Window function to assign the nth value in a sliding window frame * (counting from 1), or undefined if no such value exists. - * @param {*} field The data field. + * @template T + * @param {T} field The data field. * @param {number} nth The nth position, starting from 1. - * @return {*} The nth value in the current frame. + * @return {T} The nth value in the current frame. */ + // @ts-ignore nth_value: (field, nth) => op('nth_value', field, nth), /** * Window function to fill in missing values with preceding values. - * @param {*} field The data field. - * @param {*} [defaultValue=undefined] The default value. - * @return {*} The current value if valid, otherwise the first preceding + * @template T + * @param {T} field The data field. + * @param {T} [defaultValue=undefined] The default value. + * @return {T} The current value if valid, otherwise the first preceding * valid value. If no such value exists, returns the default value. */ + // @ts-ignore fill_down: (field, defaultValue) => op('fill_down', field, defaultValue), /** * Window function to fill in missing values with subsequent values. - * @param {*} field The data field. - * @param {*} [defaultValue=undefined] The default value. - * @return {*} The current value if valid, otherwise the first subsequent + * @template T + * @param {T} field The data field. + * @param {T} [defaultValue=undefined] The default value. + * @return {T} The current value if valid, otherwise the first subsequent * valid value. If no such value exists, returns the default value. */ + // @ts-ignore fill_up: (field, defaultValue) => op('fill_up', field, defaultValue) }; diff --git a/src/op/op.js b/src/op/op.js deleted file mode 100644 index efad98b5..00000000 --- a/src/op/op.js +++ /dev/null @@ -1,24 +0,0 @@ -import toArray from '../util/to-array.js'; -import toString from '../util/to-string.js'; - -export default function(name, fields = [], params = []) { - return new Op(name, toArray(fields), toArray(params)); -} - -export class Op { - constructor(name, fields, params) { - this.name = name; - this.fields = fields; - this.params = params; - } - toString() { - const args = [ - ...this.fields.map(f => `d[${toString(f)}]`), - ...this.params.map(toString) - ]; - return `d => op.${this.name}(${args})`; - } - toObject() { - return { expr: this.toString(), func: true }; - } -} diff --git a/src/op/register.js b/src/op/register.js new file mode 100644 index 00000000..560ed116 --- /dev/null +++ b/src/op/register.js @@ -0,0 +1,112 @@ +import aggregateFunctions from './aggregate-functions.js'; +import windowFunctions from './window-functions.js'; +import functions from './functions/index.js'; +import ops, { op } from './op-api.js'; +import { ROW_OBJECT } from '../expression/row-object.js'; +import error from '../util/error.js'; +import has from '../util/has.js'; +import toString from '../util/to-string.js'; + +const onIllegal = (name, type) => + error(`Illegal ${type} name: ${toString(name)}`); + +const onDefined = (name, type) => + error(`The ${type} ${toString(name)} is already defined. Use override option?`); + +const onReserve = (name, type) => + error(`The ${type} name ${toString(name)} is reserved and can not be overridden.`); + +function check(name, options, obj = ops, type = 'function') { + if (!name) onIllegal(name, type); + if (!options.override && has(obj, name)) onDefined(name, type); +} + +function verifyFunction(name, def, object, options) { + return object[name] === def || check(name, options); +} + +/** + * Register an aggregate or window operation. + * @param {string} name The name of the operation + * @param {AggregateDef|WindowDef} def The operation definition. + * @param {object} object The registry object to add the definition to. + * @param {RegisterOptions} [options] Registration options. + */ +function addOp(name, def, object, options = {}) { + if (verifyFunction(name, def, object, options)) return; + const [nf = 0, np = 0] = def.param; // num fields, num params + object[name] = def; + ops[name] = (...params) => op( + name, + params.slice(0, nf), + params.slice(nf, nf + np) + ); +} + +/** + * Register a custom aggregate function. + * @param {string} name The name to use for the aggregate function. + * @param {AggregateDef} def The aggregate operator definition. + * @param {RegisterOptions} [options] Function registration options. + * @throws If a function with the same name is already registered and + * the override option is not specified. + */ +export function addAggregateFunction(name, def, options) { + addOp(name, def, aggregateFunctions, options); +} + +/** + * Register a custom window function. + * @param {string} name The name to use for the window function. + * @param {WindowDef} def The window operator definition. + * @param {RegisterOptions} [options] Function registration options. + * @throws If a function with the same name is already registered and + * the override option is not specified. + */ +export function addWindowFunction(name, def, options) { + addOp(name, def, windowFunctions, options); +} + +/** + * Register a function for use within table expressions. + * If only a single argument is provided, it will be assumed to be a + * function and the system will try to extract its name. + * @param {string} name The name to use for the function. + * @param {Function} fn A standard JavaScript function. + * @param {RegisterOptions} [options] Function registration options. + * @throws If a function with the same name is already registered and + * the override option is not specified, or if no name is provided + * and the input function is anonymous. + */ +export function addFunction(name, fn, options = {}) { + if (arguments.length === 1) { + // @ts-ignore + fn = name; + name = fn.name; + if (name === '' || name === 'anonymous') { + error('Anonymous function provided, please include a name argument.'); + } else if (name === ROW_OBJECT) { + onReserve(ROW_OBJECT, 'function'); + } + } + if (verifyFunction(name, fn, functions, options)) return; + functions[name] = fn; + ops[name] = fn; +} + +/** + * Aggregate function definition. + * @typedef {import('./aggregate-functions.js').AggregateDef} AggregateDef + */ + +/** + * Window function definition. + * @typedef {import('./window-functions.js').WindowDef} WindowDef + */ + +/** + * Options for registering new functions. + * @typedef {object} RegisterOptions + * @property {boolean} [override=false] Flag indicating if the added + * function can override an existing function with the same name. + */ diff --git a/src/op/window-functions.js b/src/op/window-functions.js index ae01bdd6..60710ba4 100644 --- a/src/op/window-functions.js +++ b/src/op/window-functions.js @@ -11,7 +11,7 @@ import NULL from '../util/null.js'; /** * A storage object for the state of the window. - * @typedef {import('../engine/window/window-state').default} WindowState + * @typedef {import('../verbs/window/window-state.js').default} WindowState */ /** @@ -23,12 +23,12 @@ import NULL from '../util/null.js'; /** * Initialize an aggregate operator. - * @typedef {import('./aggregate-functions').AggregateInit} AggregateInit + * @typedef {import('./aggregate-functions.js').AggregateInit} AggregateInit */ /** * Retrive an output value from an aggregate operator. - * @typedef {import('./aggregate-functions').AggregateValue} AggregateValue + * @typedef {import('./aggregate-functions.js').AggregateValue} AggregateValue */ /** @@ -47,7 +47,7 @@ import NULL from '../util/null.js'; /** * Create a new aggregate operator instance. - * @typedef {import('./aggregate-functions').AggregateCreate} AggregateCreate + * @typedef {import('./aggregate-functions.js').AggregateCreate} AggregateCreate */ /** diff --git a/src/query/constants.js b/src/query/constants.js deleted file mode 100644 index df524125..00000000 --- a/src/query/constants.js +++ /dev/null @@ -1,17 +0,0 @@ -export const Expr = 'Expr'; -export const ExprList = 'ExprList'; -export const ExprNumber = 'ExprNumber'; -export const ExprObject = 'ExprObject'; -export const JoinKeys = 'JoinKeys'; -export const JoinValues = 'JoinValues'; -export const Options = 'Options'; -export const OrderbyKeys = 'OrderKeys'; -export const SelectionList = 'SelectionList'; -export const TableRef = 'TableRef'; -export const TableRefList = 'TableRefList'; - -export const Descending = 'Descending'; -export const Query = 'Query'; -export const Selection = 'Selection'; -export const Verb = 'Verb'; -export const Window = 'Window'; diff --git a/src/query/query.js b/src/query/query.js deleted file mode 100644 index 61c8d0a0..00000000 --- a/src/query/query.js +++ /dev/null @@ -1,190 +0,0 @@ -import Transformable from '../table/transformable.js'; -import { Query as QueryType } from './constants.js'; -import { Verb, Verbs } from './verb.js'; - -/** - * Create a new query instance. The query interface provides - * a table-like verb API to construct a query that can be - * serialized or evaluated against Arquero tables. - * @param {string} [tableName] The name of the table to query. If - * provided, will be used as the default input table to pull from - * a provided catalog to run the query against. - * @return {Query} A new builder instance. - */ -export function query(tableName) { - return new Query(null, null, tableName); -} - -/** - * Create a new query instance from a serialized object. - * @param {object} object A serialized query representation, such as - * those generated by query(...).toObject(). - * @returns {Query} The instantiated query instance. - */ -export function queryFrom(object) { - return Query.from(object); -} - -/** - * Model a query as a collection of serializble verbs. - * Provides a table-like interface for constructing queries. - */ -export default class Query extends Transformable { - - /** - * Construct a new query instance. - * @param {Verb[]} verbs An array of verb instances. - * @param {object} [params] Optional query parameters, corresponding - * to parameter references in table expressions. - * @param {string} [table] Optional name of the table to query. - */ - constructor(verbs, params, table) { - super(params); - this._verbs = verbs || []; - this._table = table; - } - - /** - * Create a new query instance from the given serialized object. - * @param {QueryObject} object A serialized query representation, such as - * those generated by Query.toObject. - * @returns {Query} The instantiated query. - */ - static from({ verbs, table, params }) { - return new Query(verbs.map(Verb.from), params, table); - } - - /** - * Provide an informative object string tag. - */ - get [Symbol.toStringTag]() { - if (!this._verbs) return 'Object'; // bail if called on prototype - const ns = this._verbs.length; - return `Query: ${ns} verbs` + (this._table ? ` on '${this._table}'` : ''); - } - - /** - * Return the number of verbs in this query. - */ - get length() { - return this._verbs.length; - } - - /** - * Return the name of the table this query applies to. - * @return {string} The name of the source table, or undefined. - */ - get tableName() { - return this._table; - } - - /** - * Get or set table expression parameter values. - * If called with no arguments, returns the current parameter values - * as an object. Otherwise, adds the provided parameters to this - * query's parameter set and returns the table. Any prior parameters - * with names matching the input parameters are overridden. - * @param {object} values The parameter values. - * @return {Query|object} The current parameter values (if called - * with no arguments) or this query. - */ - params(values) { - if (arguments.length) { - this._params = { ...this._params, ...values }; - return this; - } else { - return this._params; - } - } - - /** - * Evaluate this query against a given table and catalog. - * @param {Table} table The Arquero table to process. - * @param {Function} catalog A table lookup function that accepts a table - * name string as input and returns a corresponding Arquero table. - * @returns {Table} The resulting Arquero table. - */ - evaluate(table, catalog) { - table = table || catalog(this._table); - for (const verb of this._verbs) { - table = verb.evaluate(table.params(this._params), catalog); - } - return table; - } - - /** - * Serialize this query as a JSON-compatible object. The resulting - * object can be passed to Query.from to re-instantiate this query. - * @returns {object} A JSON-compatible object representing this query. - */ - toObject() { - return serialize(this, 'toObject'); - } - - /** - * Serialize this query as a JSON-compatible object. The resulting - * object can be passed to Query.from to re-instantiate this query. - * This method simply returns the result of toObject, but is provided - * as a separate method to allow later customization of JSON export. - * @returns {object} A JSON-compatible object representing this query. - */ - toJSON() { - return this.toObject(); - } - - /** - * Serialize this query to a JSON-compatible abstract syntax tree. - * All table expressions will be parsed and represented as AST instances - * using a modified form of the Mozilla JavaScript AST format. - * This method can be used to output parsed and serialized representations - * to translate Arquero queries to alternative data processing platforms. - * @returns {object} A JSON-compatible abstract syntax tree object. - */ - toAST() { - return serialize(this, 'toAST', { type: QueryType }); - } -} - -/** - * Abstract class representing a data table. - * @typedef {import('../table/table').default} Table - */ - -/** - * Serialized object representation of a query. - * @typedef {object} QueryObject - * @property {object[]} verbs An array of verb definitions. - * @property {object} [params] An object of parameter values. - * @property {string} [table] The name of the table to query. - */ - -function serialize(query, method, props) { - return { - ...props, - verbs: query._verbs.map(verb => verb[method]()), - ...(query._params ? { params: query._params } : null), - ...(query._table ? { table: query._table } : null) - }; -} - -function append(qb, verb) { - return new Query( - qb._verbs.concat(verb), - qb._params, - qb._table - ); -} - -export function addQueryVerb(name, verb) { - Query.prototype[name] = function(...args) { - return append(this, verb(...args)); - }; -} - -// Internal verb handlers -for (const name in Verbs) { - const verb = Verbs[name]; - Query.prototype['__' + name] = function(qb, ...args) { - return append(qb, verb(...args)); - }; -} diff --git a/src/query/to-ast.js b/src/query/to-ast.js deleted file mode 100644 index 6e0460f6..00000000 --- a/src/query/to-ast.js +++ /dev/null @@ -1,160 +0,0 @@ -import error from '../util/error.js'; -import isArray from '../util/is-array.js'; -import isFunction from '../util/is-function.js'; -import isNumber from '../util/is-number.js'; -import isObject from '../util/is-object.js'; -import isString from '../util/is-string.js'; -import toArray from '../util/to-array.js'; -import parse from '../expression/parse.js'; -import { isSelection, toObject } from './util.js'; - -import { Column } from '../expression/ast/constants.js'; -import { - Descending, - Expr, - ExprList, - ExprNumber, - ExprObject, - JoinKeys, - JoinValues, - Options, - OrderbyKeys, - Selection, - SelectionList, - TableRef, - TableRefList, - Window -} from './constants.js'; - -const Methods = { - [Expr]: astExpr, - [ExprList]: astExprList, - [ExprNumber]: astExprNumber, - [ExprObject]: astExprObject, - [JoinKeys]: astJoinKeys, - [JoinValues]: astJoinValues, - [OrderbyKeys]: astExprList, - [SelectionList]: astSelectionList -}; - -export default function(value, type, propTypes) { - return type === TableRef ? astTableRef(value) - : type === TableRefList ? value.map(astTableRef) - : ast(toObject(value), type, propTypes); -} - -function ast(value, type, propTypes) { - return type === Options - ? (value ? astOptions(value, propTypes) : value) - : Methods[type](value); -} - -function astOptions(value, types = {}) { - const output = {}; - for (const key in value) { - const prop = value[key]; - output[key] = types[key] ? ast(prop, types[key]) : prop; - } - return output; -} - -function astParse(expr, opt) { - return parse({ expr }, { ...opt, ast: true }).exprs[0]; -} - -function astColumn(name) { - return { type: Column, name }; -} - -function astColumnIndex(index) { - return { type: Column, index }; -} - -function astExprObject(obj, opt) { - if (isString(obj)) { - return astParse(obj, opt); - } - - if (obj.expr) { - let ast; - if (obj.field === true) { - ast = astColumn(obj.expr); - } else if (obj.func === true) { - ast = astExprObject(obj.expr, opt); - } - if (ast) { - if (obj.desc) { - ast = { type: Descending, expr: ast }; - } - if (obj.window) { - ast = { type: Window, expr: ast, ...obj.window }; - } - return ast; - } - } - - return Object.keys(obj) - .map(key => ({ - ...astExprObject(obj[key], opt), - as: key - })); -} - -function astSelection(sel) { - const type = Selection; - return sel.all ? { type, operator: 'all' } - : sel.not ? { type, operator: 'not', arguments: astExprList(sel.not) } - : sel.range ? { type, operator: 'range', arguments: astExprList(sel.range) } - : sel.matches ? { type, operator: 'matches', arguments: sel.matches } - : error('Invalid input'); -} - -function astSelectionList(arr) { - return toArray(arr).map(astSelectionItem).flat(); -} - -function astSelectionItem(val) { - return isSelection(val) ? astSelection(val) - : isNumber(val) ? astColumnIndex(val) - : isString(val) ? astColumn(val) - : isObject(val) ? Object.keys(val) - .map(name => ({ type: Column, name, as: val[name] })) - : error('Invalid input'); -} - -function astExpr(val) { - return isSelection(val) ? astSelection(val) - : isNumber(val) ? astColumnIndex(val) - : isString(val) ? astColumn(val) - : isObject(val) ? astExprObject(val) - : error('Invalid input'); -} - -function astExprList(arr) { - return toArray(arr).map(astExpr).flat(); -} - -function astExprNumber(val) { - return isNumber(val) ? val : astExprObject(val); -} - -function astJoinKeys(val) { - return isArray(val) - ? val.map(astExprList) - : astExprObject(val, { join: true }); -} - -function astJoinValues(val) { - return isArray(val) - ? val.map((v, i) => i < 2 - ? astExprList(v) - : astExprObject(v, { join: true }) - ) - : astExprObject(val, { join: true }); -} - -function astTableRef(value) { - return value && isFunction(value.toAST) - ? value.toAST() - : value; -} diff --git a/src/query/util.js b/src/query/util.js deleted file mode 100644 index be8f9878..00000000 --- a/src/query/util.js +++ /dev/null @@ -1,130 +0,0 @@ -import desc from '../helpers/desc.js'; -import field from '../helpers/field.js'; -import rolling from '../helpers/rolling.js'; -import { all, matches, not, range } from '../helpers/selection.js'; -import Query from './query.js'; -import error from '../util/error.js'; -import isArray from '../util/is-array.js'; -import isFunction from '../util/is-function.js'; -import isNumber from '../util/is-number.js'; -import isObject from '../util/is-object.js'; -import isString from '../util/is-string.js'; -import map from '../util/map-object.js'; -import toArray from '../util/to-array.js'; - -function func(expr) { - const f = d => d; - f.toString = () => expr; - return f; -} - -export function getTable(catalog, ref) { - ref = ref && isFunction(ref.query) ? ref.query() : ref; - return ref && isFunction(ref.evaluate) - ? ref.evaluate(null, catalog) - : catalog(ref); -} - -export function isSelection(value) { - return isObject(value) && ( - isArray(value.all) || - isArray(value.matches) || - isArray(value.not) || - isArray(value.range) - ); -} - -export function toObject(value) { - return value && isFunction(value.toObject) ? value.toObject() - : isFunction(value) ? { expr: String(value), func: true } - : isArray(value) ? value.map(toObject) - : isObject(value) ? map(value, _ => toObject(_)) - : value; -} - -export function fromObject(value) { - return isArray(value) ? value.map(fromObject) - : !isObject(value) ? value - : isArray(value.verbs) ? Query.from(value) - : isArray(value.all) ? all() - : isArray(value.range) ? range(...value.range) - : isArray(value.match) ? matches(RegExp(...value.match)) - : isArray(value.not) ? not(value.not.map(toObject)) - : fromExprObject(value); -} - -function fromExprObject(value) { - let output = value; - let expr = value.expr; - - if (expr != null) { - if (value.field === true) { - output = expr = field(expr); - } else if (value.func === true) { - output = expr = func(expr); - } - - if (isObject(value.window)) { - const { frame, peers } = value.window; - output = expr = rolling(expr, frame, peers); - } - - if (value.desc === true) { - output = desc(expr); - } - } - - return value === output - ? map(value, _ => fromObject(_)) - : output; -} - -export function joinKeys(keys) { - return isArray(keys) ? keys.map(parseJoinKeys) - : keys; -} - -function parseJoinKeys(keys) { - const list = []; - - toArray(keys).forEach(param => { - isNumber(param) ? list.push(param) - : isString(param) ? list.push(field(param, null)) - : isObject(param) && param.expr ? list.push(param) - : isFunction(param) ? list.push(param) - : error(`Invalid key value: ${param+''}`); - }); - - return list; -} - -export function joinValues(values) { - return isArray(values) - ? values.map(parseJoinValues) - : values; -} - -function parseJoinValues(values, index) { - return index < 2 ? toArray(values) : values; -} - -export function orderbyKeys(keys) { - const list = []; - - keys.forEach(param => { - const expr = param.expr != null ? param.expr : param; - if (isObject(expr) && !isFunction(expr)) { - for (const key in expr) { - list.push(expr[key]); - } - } else { - param = isNumber(expr) ? expr - : isString(expr) ? field(param) - : isFunction(expr) ? param - : error(`Invalid orderby field: ${param+''}`); - list.push(param); - } - }); - - return list; -} diff --git a/src/query/verb.js b/src/query/verb.js deleted file mode 100644 index 8551c0a3..00000000 --- a/src/query/verb.js +++ /dev/null @@ -1,248 +0,0 @@ -import { Verb as VerbType } from './constants.js'; - -import { - fromObject, - getTable, - joinKeys, - joinValues, - orderbyKeys, - toObject -} from './util.js'; - -import { - Expr, - ExprList, - ExprNumber, - ExprObject, - JoinKeys, - JoinValues, - Options, - OrderbyKeys, - SelectionList, - TableRef, - TableRefList -} from './constants.js'; - -import toAST from './to-ast.js'; - -/** - * Model an Arquero verb as a serializable object. - */ -export class Verb { - - /** - * Construct a new verb instance. - * @param {string} verb The verb name. - * @param {object[]} schema Schema describing verb parameters. - * @param {any[]} params Array of parameter values. - */ - constructor(verb, schema = [], params = []) { - this.verb = verb; - this.schema = schema; - schema.forEach((s, index) => { - const type = s.type; - const param = params[index]; - const value = type === JoinKeys ? joinKeys(param) - : type === JoinValues ? joinValues(param) - : type === OrderbyKeys ? orderbyKeys(param) - : param; - this[s.name] = value !== undefined ? value : s.default; - }); - } - - /** - * Create new verb instance from the given serialized object. - * @param {object} object A serialized verb representation, such as - * those generated by Verb.toObject. - * @returns {Verb} The instantiated verb. - */ - static from(object) { - const verb = Verbs[object.verb]; - const params = (verb.schema || []) - .map(({ name }) => fromObject(object[name])); - return verb(...params); - } - - /** - * Evaluate this verb against a given table and catalog. - * @param {Table} table The Arquero table to process. - * @param {Function} catalog A table lookup function that accepts a table - * name string as input and returns a corresponding Arquero table. - * @returns {Table} The resulting Arquero table. - */ - evaluate(table, catalog) { - const params = this.schema.map(({ name, type }) => { - const value = this[name]; - return type === TableRef ? getTable(catalog, value) - : type === TableRefList ? value.map(t => getTable(catalog, t)) - : value; - }); - return table[this.verb](...params); - } - - /** - * Serialize this verb as a JSON-compatible object. The resulting - * object can be passed to Verb.from to re-instantiate this verb. - * @returns {object} A JSON-compatible object representing this verb. - */ - toObject() { - const obj = { verb: this.verb }; - this.schema.forEach(({ name }) => { - obj[name] = toObject(this[name]); - }); - return obj; - } - - /** - * Serialize this verb to a JSON-compatible abstract syntax tree. - * All table expressions will be parsed and represented as AST instances - * using a modified form of the Mozilla JavaScript AST format. - * This method can be used to output parsed and serialized representations - * to translate Arquero verbs to alternative data processing platforms. - * @returns {object} A JSON-compatible abstract syntax tree object. - */ - toAST() { - const obj = { type: VerbType, verb: this.verb }; - this.schema.forEach(({ name, type, props }) => { - obj[name] = toAST(this[name], type, props); - }); - return obj; - } -} - -/** - * Verb parameter type. - * @typedef {Expr|ExprList|ExprNumber|ExprObject|JoinKeys|JoinValues|Options|OrderbyKeys|SelectionList|TableRef|TableRefList} ParamType - */ - -/** - * Verb parameter schema. - * @typedef {object} ParamDef - * @property {string} name The name of the parameter. - * @property {ParamType} type The type of the parameter. - * @property {{ [key: string]: ParamType }} [props] Types for non-literal properties. - */ - -/** - * Create a new constructors. - * @param {string} name The name of the verb. - * @param {ParamDef[]} schema The verb parameter schema. - * @return {Function} A verb constructor function. - */ -export function createVerb(name, schema) { - return Object.assign( - (...params) => new Verb(name, schema, params), - { schema } - ); -} - -/** - * A lookup table of verb classes. - */ -export const Verbs = { - count: createVerb('count', [ - { name: 'options', type: Options } - ]), - derive: createVerb('derive', [ - { name: 'values', type: ExprObject }, - { name: 'options', type: Options, - props: { before: SelectionList, after: SelectionList } - } - ]), - filter: createVerb('filter', [ - { name: 'criteria', type: ExprObject } - ]), - groupby: createVerb('groupby', [ - { name: 'keys', type: ExprList } - ]), - orderby: createVerb('orderby', [ - { name: 'keys', type: OrderbyKeys } - ]), - relocate: createVerb('relocate', [ - { name: 'columns', type: SelectionList }, - { name: 'options', type: Options, - props: { before: SelectionList, after: SelectionList } - } - ]), - rename: createVerb('rename', [ - { name: 'columns', type: SelectionList } - ]), - rollup: createVerb('rollup', [ - { name: 'values', type: ExprObject } - ]), - sample: createVerb('sample', [ - { name: 'size', type: ExprNumber }, - { name: 'options', type: Options, props: { weight: Expr } } - ]), - select: createVerb('select', [ - { name: 'columns', type: SelectionList } - ]), - ungroup: createVerb('ungroup'), - unorder: createVerb('unorder'), - reify: createVerb('reify'), - dedupe: createVerb('dedupe', [ - { name: 'keys', type: ExprList, default: [] } - ]), - impute: createVerb('impute', [ - { name: 'values', type: ExprObject }, - { name: 'options', type: Options, props: { expand: ExprList } } - ]), - fold: createVerb('fold', [ - { name: 'values', type: ExprList }, - { name: 'options', type: Options } - ]), - pivot: createVerb('pivot', [ - { name: 'keys', type: ExprList }, - { name: 'values', type: ExprList }, - { name: 'options', type: Options } - ]), - spread: createVerb('spread', [ - { name: 'values', type: ExprList }, - { name: 'options', type: Options } - ]), - unroll: createVerb('unroll', [ - { name: 'values', type: ExprList }, - { name: 'options', type: Options, props: { drop: ExprList } } - ]), - lookup: createVerb('lookup', [ - { name: 'table', type: TableRef }, - { name: 'on', type: JoinKeys }, - { name: 'values', type: ExprList } - ]), - join: createVerb('join', [ - { name: 'table', type: TableRef }, - { name: 'on', type: JoinKeys }, - { name: 'values', type: JoinValues }, - { name: 'options', type: Options } - ]), - cross: createVerb('cross', [ - { name: 'table', type: TableRef }, - { name: 'values', type: JoinValues }, - { name: 'options', type: Options } - ]), - semijoin: createVerb('semijoin', [ - { name: 'table', type: TableRef }, - { name: 'on', type: JoinKeys } - ]), - antijoin: createVerb('antijoin', [ - { name: 'table', type: TableRef }, - { name: 'on', type: JoinKeys } - ]), - concat: createVerb('concat', [ - { name: 'tables', type: TableRefList } - ]), - union: createVerb('union', [ - { name: 'tables', type: TableRefList } - ]), - intersect: createVerb('intersect', [ - { name: 'tables', type: TableRefList } - ]), - except: createVerb('except', [ - { name: 'tables', type: TableRefList } - ]) -}; - -/** - * Abstract class representing a data table. - * @typedef {import('../table/table').default} Table - */ diff --git a/src/register.js b/src/register.js deleted file mode 100644 index 90e3b93a..00000000 --- a/src/register.js +++ /dev/null @@ -1,260 +0,0 @@ -import ColumnTable from './table/column-table.js'; -import aggregateFunctions from './op/aggregate-functions.js'; -import windowFunctions from './op/window-functions.js'; -import functions from './op/functions/index.js'; -import op from './op/op.js'; -import ops from './op/op-api.js'; -import Query, { addQueryVerb } from './query/query.js'; -import { Verbs, createVerb } from './query/verb.js'; -import { ROW_OBJECT } from './expression/row-object.js'; -import error from './util/error.js'; -import has from './util/has.js'; -import toString from './util/to-string.js'; - -const onIllegal = (name, type) => - error(`Illegal ${type} name: ${toString(name)}`); - -const onDefined = (name, type) => - error(`The ${type} ${toString(name)} is already defined. Use override option?`); - -const onReserve = (name, type) => - error(`The ${type} name ${toString(name)} is reserved and can not be overridden.`); - -function check(name, options, obj = ops, type = 'function') { - if (!name) onIllegal(name, type); - if (!options.override && has(obj, name)) onDefined(name, type); -} - -// -- Op Functions -------------------------------------------------- - -function verifyFunction(name, def, object, options) { - return object[name] === def || check(name, options); -} - -/** - * Register an aggregate or window operation. - * @param {string} name The name of the operation - * @param {AggregateDef|WindowDef} def The operation definition. - * @param {object} object The registry object to add the definition to. - * @param {RegisterOptions} [options] Registration options. - */ -function addOp(name, def, object, options = {}) { - if (verifyFunction(name, def, object, options)) return; - const [nf = 0, np = 0] = def.param; - object[name] = def; - ops[name] = (...params) => op( - name, - params.slice(0, nf), - params.slice(nf, nf + np) - ); -} - -/** - * Register a custom aggregate function. - * @param {string} name The name to use for the aggregate function. - * @param {AggregateDef} def The aggregate operator definition. - * @param {RegisterOptions} [options] Function registration options. - * @throws If a function with the same name is already registered and - * the override option is not specified. - */ -export function addAggregateFunction(name, def, options) { - addOp(name, def, aggregateFunctions, options); -} - -/** - * Register a custom window function. - * @param {string} name The name to use for the window function. - * @param {WindowDef} def The window operator definition. - * @param {RegisterOptions} [options] Function registration options. - * @throws If a function with the same name is already registered and - * the override option is not specified. - */ -export function addWindowFunction(name, def, options) { - addOp(name, def, windowFunctions, options); -} - -/** - * Register a function for use within table expressions. - * If only a single argument is provided, it will be assumed to be a - * function and the system will try to extract its name. - * @param {string} name The name to use for the function. - * @param {Function} fn A standard JavaScript function. - * @param {RegisterOptions} [options] Function registration options. - * @throws If a function with the same name is already registered and - * the override option is not specified, or if no name is provided - * and the input function is anonymous. - */ -export function addFunction(name, fn, options = {}) { - if (arguments.length === 1) { - fn = name; - name = fn.name; - if (name === '' || name === 'anonymous') { - error('Anonymous function provided, please include a name argument.'); - } else if (name === ROW_OBJECT) { - onReserve(ROW_OBJECT, 'function'); - } - } - if (verifyFunction(name, fn, functions, options)) return; - functions[name] = fn; - ops[name] = fn; -} - -// -- Table Methods and Verbs --------------------------------------- - -const proto = ColumnTable.prototype; - -/** - * Reserved table/query methods that must not be overwritten. - */ -let RESERVED; - -function addReserved(obj) { - for (; obj; obj = Object.getPrototypeOf(obj)) { - Object.getOwnPropertyNames(obj).forEach(name => RESERVED[name] = 1); - } -} - -function verifyTableMethod(name, fn, options) { - const type = 'method'; - - // exit early if duplicate re-assignment - if (proto[name] && proto[name].fn === fn) return true; - - // initialize reserved properties to avoid overriding internals - if (!RESERVED) { - RESERVED = {}; - addReserved(proto); - addReserved(Query.prototype); - } - - // perform name checks - if (RESERVED[name]) onReserve(name, type); - if ((name + '')[0] === '_') onIllegal(name, type); - check(name, options, proto, type); -} - -/** - * Register a new table method. A new method will be added to the column - * table prototype. When invoked from a table, the registered method will - * be invoked with the table as the first argument, followed by all the - * provided arguments. - * @param {string} name The name of the table method. - * @param {Function} method The table method. - * @param {RegisterOptions} options - */ -export function addTableMethod(name, method, options = {}) { - if (verifyTableMethod(name, method, options)) return; - proto[name] = function(...args) { return method(this, ...args); }; - proto[name].fn = method; -} - -/** - * Register a new transformation verb. - * @param {string} name The name of the verb. - * @param {Function} method The verb implementation. - * @param {ParamDef[]} params The verb parameter schema. - * @param {RegisterOptions} options Function registration options. - */ -export function addVerb(name, method, params, options = {}) { - // register table method first - // if that doesn't throw, add serializable verb entry - addTableMethod(name, method, options); - addQueryVerb(name, Verbs[name] = createVerb(name, params)); -} - -// -- Package Bundles ----------------------------------------------- - -const PACKAGE = 'arquero_package'; - -/** - * Add an extension package of functions, table methods, and/or verbs. - * @param {Package|PackageBundle} bundle The package of extensions. - * @throws If package validation fails. - */ -export function addPackage(bundle, options = {}) { - const pkg = bundle && bundle[PACKAGE] || bundle; - const parts = { - functions: [ - (name, def, opt) => verifyFunction(name, def, functions, opt), - addFunction - ], - aggregateFunctions: [ - (name, def, opt) => verifyFunction(name, def, aggregateFunctions, opt), - addAggregateFunction - ], - windowFunctions: [ - (name, def, opt) => verifyFunction(name, def, windowFunctions, opt), - addWindowFunction - ], - tableMethods: [ - verifyTableMethod, - addTableMethod - ], - verbs: [ - (name, obj, opt) => verifyTableMethod(name, obj.method, opt), - (name, obj, opt) => addVerb(name, obj.method, obj.params, opt) - ] - }; - - function scan(index) { - for (const key in parts) { - const part = parts[key]; - const p = pkg[key]; - for (const name in p) part[index](name, p[name], options); - } - } - scan(0); // first validate package, throw if validation fails - scan(1); // then add package content -} - -/** - * Aggregate function definition. - * @typedef {import('./op/aggregate-functions').AggregateDef} AggregateDef - */ - -/** - * Window function definition. - * @typedef {import('./op/window-functions').WindowDef} WindowDef - */ - -/** - * Verb parameter definition. - * @typedef {import('./query/verb').ParamDef} ParamDef - */ - -/** - * Verb definition. - * @typedef {object} VerbDef - * @property {Function} method A function implementing the verb. - * @property {ParamDef[]} params The verb parameter schema. - */ - -/** - * Verb parameter definition. - * @typedef {object} ParamDef - * @property {string} name The verb parameter name. - * @property {ParamType} type The verb parameter type. - */ - -/** - * A package of op function and table method definitions. - * @typedef {object} Package - * @property {{[name: string]: Function}} [functions] Standard function entries. - * @property {{[name: string]: AggregateDef}} [aggregateFunctions] Aggregate function entries. - * @property {{[name: string]: WindowDef}} [windowFunctions] Window function entries. - * @property {{[name: string]: Function}} [tableMethods] Table method entries. - * @property {{[name: string]: VerbDef}} [verbs] Verb entries. - */ - -/** - * An object containing an extension package. - * @typedef {object} PackageBundle - * @property {Package} arquero.package The package bundle. - */ - -/** - * Options for registering new functions. - * @typedef {object} RegisterOptions - * @property {boolean} [override=false] Flag indicating if the added - * function can override an existing function with the same name. - */ \ No newline at end of file diff --git a/src/table/bit-set.js b/src/table/BitSet.js similarity index 99% rename from src/table/bit-set.js rename to src/table/BitSet.js index bc8e49bf..7cf546f7 100644 --- a/src/table/bit-set.js +++ b/src/table/BitSet.js @@ -4,7 +4,7 @@ const ALL = 0xFFFFFFFF; /** * Represent an indexable set of bits. */ -export default class BitSet { +export class BitSet { /** * Instantiate a new BitSet instance. * @param {number} size The number of bits. diff --git a/src/table/ColumnSet.js b/src/table/ColumnSet.js new file mode 100644 index 00000000..1b1d49c2 --- /dev/null +++ b/src/table/ColumnSet.js @@ -0,0 +1,83 @@ +import has from '../util/has.js'; + +/** + * Return a new column set instance. + * @param {import('./Table.js').Table} [table] A base table whose columns + * should populate the returned set. If unspecified, create an empty set. + * @return {ColumnSet} The column set. + */ +export function columnSet(table) { + return table + ? new ColumnSet({ ...table.data() }, table.columnNames()) + : new ColumnSet(); +} + +/** An editable collection of named columns. */ +export class ColumnSet { + /** + * Create a new column set instance. + * @param {import('./types.js').ColumnData} [data] Initial column data. + * @param {string[]} [names] Initial column names. + */ + constructor(data, names) { + this.data = data || {}; + this.names = names || []; + } + + /** + * Add a new column to this set and return the column values. + * @template {import('./types.js').ColumnType} T + * @param {string} name The column name. + * @param {T} values The column values. + * @return {T} The provided column values. + */ + add(name, values) { + if (!this.has(name)) this.names.push(name + ''); + return this.data[name] = values; + } + + /** + * Test if this column set has a columns with the given name. + * @param {string} name A column name + * @return {boolean} True if this set contains a column with the given name, + * false otherwise. + */ + has(name) { + return has(this.data, name); + } + + /** + * Add a groupby specification to this column set. + * @param {import('./types.js').GroupBySpec} groups A groupby specification. + * @return {this} This column set. + */ + groupby(groups) { + this.groups = groups; + return this; + } + + /** + * Create a new table with the contents of this column set, using the same + * type as a given prototype table. The new table does not inherit the + * filter, groupby, or orderby state of the prototype. + * @template {import('./Table.js').Table} T + * @param {T} proto A prototype table + * @return {T} The new table. + */ + new(proto) { + const { data, names, groups = null } = this; + return proto.create({ data, names, groups, filter: null, order: null }); + } + + /** + * Create a derived table with the contents of this column set, using the same + * type as a given prototype table. The new table will inherit the filter, + * groupby, and orderby state of the prototype. + * @template {import('./Table.js').Table} T + * @param {T} proto A prototype table + * @return {T} The new table. + */ + derive(proto) { + return proto.create(this); + } +} diff --git a/src/table/ColumnTable.js b/src/table/ColumnTable.js new file mode 100644 index 00000000..43320650 --- /dev/null +++ b/src/table/ColumnTable.js @@ -0,0 +1,848 @@ +import { Table } from './Table.js'; +import { + antijoin, + assign, + concat, + cross, + dedupe, + derive, + except, + filter, + fold, + groupby, + impute, + intersect, + join, + lookup, + orderby, + pivot, + reduce, + relocate, + rename, + rollup, + sample, + select, + semijoin, + slice, + spread, + ungroup, + union, + unorder, + unroll +} from '../verbs/index.js'; +import { count } from '../op/op-api.js'; +import toArrow from '../arrow/to-arrow.js'; +import toArrowIPC from '../arrow/to-arrow-ipc.js'; +import toCSV from '../format/to-csv.js'; +import toHTML from '../format/to-html.js'; +import toJSON from '../format/to-json.js'; +import toMarkdown from '../format/to-markdown.js'; +import toArray from '../util/to-array.js'; + +/** + * A data table with transformation verbs. + */ +export class ColumnTable extends Table { + /** + * Create a new table with additional columns drawn from one or more input + * tables. All tables must have the same numer of rows and are reified + * prior to assignment. In the case of repeated column names, input table + * columns overwrite existing columns. + * @param {...(Table|import('./types.js').ColumnData)} tables + * The tables to merge with this table. + * @return {this} A new table with merged columns. + * @example table.assign(table1, table2) + */ + assign(...tables) { + return assign(this, ...tables); + } + + /** + * Count the number of values in a group. This method is a shorthand + * for *rollup* with a count aggregate function. + * @param {import('./types.js').CountOptions} [options] + * Options for the count. + * @return {this} A new table with groupby and count columns. + * @example table.groupby('colA').count() + * @example table.groupby('colA').count({ as: 'num' }) + */ + count(options = {}) { + const { as = 'count' } = options; + return rollup(this, { [as]: count() }); + } + + /** + * Derive new column values based on the provided expressions. By default, + * new columns are added after (higher indices than) existing columns. Use + * the before or after options to place new columns elsewhere. + * @param {import('./types.js').ExprObject} values + * Object of name-value pairs defining the columns to derive. The input + * object should have output column names for keys and table expressions + * for values. + * @param {import('./types.js').DeriveOptions} [options] + * Options for dropping or relocating derived columns. Use either a before + * or after property to indicate where to place derived columns. Specifying + * both before and after is an error. Unlike the *relocate* verb, this + * option affects only new columns; updated columns with existing names + * are excluded from relocation. + * @return {this} A new table with derived columns added. + * @example table.derive({ sumXY: d => d.x + d.y }) + * @example table.derive({ z: d => d.x * d.y }, { before: 'x' }) + */ + derive(values, options) { + return derive(this, values, options); + } + + /** + * Filter a table to a subset of rows based on the input criteria. + * The resulting table provides a filtered view over the original data; no + * data copy is made. To create a table that copies only filtered data to + * new data structures, call *reify* on the output table. + * @param {import('./types.js').TableExpr} criteria + * Filter criteria as a table expression. Both aggregate and window + * functions are permitted, taking into account *groupby* or *orderby* + * settings. + * @return {this} A new table with filtered rows. + * @example table.filter(d => abs(d.value) < 5) + */ + filter(criteria) { + return filter(this, criteria); + } + + /** + * Extract rows with indices from start to end (end not included), where + * start and end represent per-group ordered row numbers in the table. + * @param {number} [start] Zero-based index at which to start extraction. + * A negative index indicates an offset from the end of the group. + * If start is undefined, slice starts from the index 0. + * @param {number} [end] Zero-based index before which to end extraction. + * A negative index indicates an offset from the end of the group. + * If end is omitted, slice extracts through the end of the group. + * @return {this} A new table with sliced rows. + * @example table.slice(1, -1) + */ + slice(start, end) { + return slice(this, start, end); + } + + /** + * Group table rows based on a set of column values. + * Subsequent operations that are sensitive to grouping (such as + * aggregate functions) will operate over the grouped rows. + * To undo grouping, use *ungroup*. + * @param {...import('./types.js').ExprList} keys + * Key column values to group by. The keys may be specified using column + * name strings, column index numbers, value objects with output column + * names for keys and table expressions for values, or selection helper + * functions. + * @return {this} A new table with grouped rows. + * @example table.groupby('colA', 'colB') + * @example table.groupby({ key: d => d.colA + d.colB }) + */ + groupby(...keys) { + return groupby(this, ...keys); + } + + /** + * Order table rows based on a set of column values. Subsequent operations + * sensitive to ordering (such as window functions) will operate over sorted + * values. The resulting table provides an view over the original data, + * without any copying. To create a table with sorted data copied to new + * data strucures, call *reify* on the result of this method. To undo + * ordering, use *unorder*. + * @param {...import('./types.js').OrderKeys} keys + * Key values to sort by, in precedence order. + * By default, sorting is done in ascending order. + * To sort in descending order, wrap values using *desc*. + * If a string, order by the column with that name. + * If a number, order by the column with that index. + * If a function, must be a valid table expression; aggregate functions + * are permitted, but window functions are not. + * If an object, object values must be valid values parameters + * with output column names for keys and table expressions + * for values (the output names will be ignored). + * If an array, array values must be valid key parameters. + * @return {this} A new ordered table. + * @example table.orderby('a', desc('b')) + * @example table.orderby({ a: 'a', b: desc('b') )}) + * @example table.orderby(desc(d => d.a)) + */ + orderby(...keys) { + return orderby(this, ...keys); + } + + /** + * Relocate a subset of columns to change their positions, also + * potentially renaming them. + * @param {import('./types.js').Select} columns + * An ordered selection of columns to relocate. + * The input may consist of column name strings, column integer indices, + * rename objects with current column names as keys and new column names + * as values, or functions that take a table as input and returns a valid + * selection parameter (typically the output of selection helper functions + * such as *all*, *not*, or *range*). + * @param {import('./types.js').RelocateOptions} options + * Options for relocating. Must include either the before or after property + * to indicate where to place the relocated columns. Specifying both before + * and after is an error. + * @return {this} A new table with relocated columns. + * @example table.relocate(['colY', 'colZ'], { after: 'colX' }) + * @example table.relocate(not('colB', 'colC'), { before: 'colA' }) + * @example table.relocate({ colA: 'newA', colB: 'newB' }, { after: 'colC' }) + */ + relocate(columns, options) { + return relocate(this, toArray(columns), options); + } + + /** + * Rename one or more columns, preserving column order. + * @param {...import('./types.js').Select} columns + * One or more rename objects with current column names as keys and new + * column names as values. + * @return {this} A new table with renamed columns. + * @example table.rename({ oldName: 'newName' }) + * @example table.rename({ a: 'a2', b: 'b2' }) + */ + rename(...columns) { + return rename(this, ...columns); + } + + /** + * Reduce a table, processing all rows to produce a new table. + * To produce standard aggregate summaries, use the rollup verb. + * This method allows the use of custom reducer implementations, + * for example to produce multiple rows for an aggregate. + * @param {import('../verbs/reduce/reducer.js').default} reducer + * The reducer to apply. + * @return {this} A new table of reducer outputs. + */ + reduce(reducer) { + return reduce(this, reducer); + } + + /** + * Rollup a table to produce an aggregate summary. + * Often used in conjunction with *groupby*. + * To produce counts only, *count* is a shortcut. + * @param {import('./types.js').ExprObject} [values] + * Object of name-value pairs defining aggregate output columns. The input + * object should have output column names for keys and table expressions + * for values. The expressions must be valid aggregate expressions: window + * functions are not allowed and column references must be arguments to + * aggregate functions. + * @return {this} A new table of aggregate summary values. + * @example table.groupby('colA').rollup({ mean: d => mean(d.colB) }) + * @example table.groupby('colA').rollup({ mean: op.median('colB') }) + */ + rollup(values) { + return rollup(this, values); + } + + /** + * Generate a table from a random sample of rows. + * If the table is grouped, performs a stratified sample by + * sampling from each group separately. + * @param {number | import('./types.js').TableExpr} size + * The number of samples to draw per group. + * If number-valued, the same sample size is used for each group. + * If function-valued, the input should be an aggregate table + * expression compatible with *rollup*. + * @param {import('./types.js').SampleOptions} [options] + * Options for sampling. + * @return {this} A new table with sampled rows. + * @example table.sample(50) + * @example table.sample(100, { replace: true }) + * @example table.groupby('colA').sample(() => op.floor(0.5 * op.count())) + */ + sample(size, options) { + return sample(this, size, options); + } + + /** + * Select a subset of columns into a new table, potentially renaming them. + * @param {...import('./types.js').Select} columns + * An ordered selection of columns. + * The input may consist of column name strings, column integer indices, + * rename objects with current column names as keys and new column names + * as values, or functions that take a table as input and returns a valid + * selection parameter (typically the output of selection helper functions + * such as *all*, *not*, or *range*.). + * @return {this} A new table of selected columns. + * @example table.select('colA', 'colB') + * @example table.select(not('colB', 'colC')) + * @example table.select({ colA: 'newA', colB: 'newB' }) + */ + select(...columns) { + return select(this, ...columns); + } + + /** + * Ungroup a table, removing any grouping criteria. + * Undoes the effects of *groupby*. + * @return {this} A new ungrouped table, or this table if not grouped. + * @example table.ungroup() + */ + ungroup() { + return ungroup(this); + } + + /** + * Unorder a table, removing any sorting criteria. + * Undoes the effects of *orderby*. + * @return {this} A new unordered table, or this table if not ordered. + * @example table.unorder() + */ + unorder() { + return unorder(this); + } + + // -- Cleaning Verbs ------------------------------------------------------ + + /** + * De-duplicate table rows by removing repeated row values. + * @param {...import('./types.js').ExprList} keys + * Key columns to check for duplicates. + * Two rows are considered duplicates if they have matching values for + * all keys. If keys are unspecified, all columns are used. + * The keys may be specified using column name strings, column index + * numbers, value objects with output column names for keys and table + * expressions for values, or selection helper functions. + * @return {this} A new de-duplicated table. + * @example table.dedupe() + * @example table.dedupe('a', 'b') + * @example table.dedupe({ abs: d => op.abs(d.a) }) + */ + dedupe(...keys) { + return dedupe(this, ...keys); + } + + /** + * Impute missing values or rows. Accepts a set of column-expression pairs + * and evaluates the expressions to replace any missing (null, undefined, + * or NaN) values in the original column. + * If the expand option is specified, imputes new rows for missing + * combinations of values. All combinations of key values (a full cross + * product) are considered for each level of grouping (specified by + * *groupby*). New rows will be added for any combination + * of key and groupby values not already contained in the table. For all + * non-key and non-group columns the new rows are populated with imputation + * values (first argument) if specified, otherwise undefined. + * If the expand option is specified, any filter or orderby settings are + * removed from the output table, but groupby settings persist. + * @param {import('./types.js').ExprObject} values + * Object of name-value pairs for the column values to impute. The input + * object should have existing column names for keys and table expressions + * for values. The expressions will be evaluated to determine replacements + * for any missing values. + * @param {import('./types.js').ImputeOptions} [options] Imputation options. + * The expand property specifies a set of column values to consider for + * imputing missing rows. All combinations of expanded values are + * considered, and new rows are added for each combination that does not + * appear in the input table. + * @return {this} A new table with imputed values and/or rows. + * @example table.impute({ v: () => 0 }) + * @example table.impute({ v: d => op.mean(d.v) }) + * @example table.impute({ v: () => 0 }, { expand: ['x', 'y'] }) + */ + impute(values, options) { + return impute(this, values, options); + } + + // -- Reshaping Verbs ----------------------------------------------------- + + /** + * Fold one or more columns into two key-value pair columns. + * The fold transform is an inverse of the *pivot* transform. + * The resulting table has two new columns, one containing the column + * names (named "key") and the other the column values (named "value"). + * The number of output rows equals the original row count multiplied + * by the number of folded columns. + * @param {import('./types.js').ExprList} values The columns to fold. + * The columns may be specified using column name strings, column index + * numbers, value objects with output column names for keys and table + * expressions for values, or selection helper functions. + * @param {import('./types.js').FoldOptions} [options] Options for folding. + * @return {this} A new folded table. + * @example table.fold('colA') + * @example table.fold(['colA', 'colB']) + * @example table.fold(range(5, 8)) + */ + fold(values, options) { + return fold(this, values, options); + } + + /** + * Pivot columns into a cross-tabulation. + * The pivot transform is an inverse of the *fold* transform. + * The resulting table has new columns for each unique combination + * of the provided *keys*, populated with the provided *values*. + * The provided *values* must be aggregates, as a single set of keys may + * include more than one row. If string-valued, the *any* aggregate is used. + * If only one *values* column is defined, the new pivoted columns will + * be named using key values directly. Otherwise, input value column names + * will be included as a component of the output column names. + * @param {import('./types.js').ExprList} keys + * Key values to map to new column names. The keys may be specified using + * column name strings, column index numbers, value objects with output + * column names for keys and table expressions for values, or selection + * helper functions. + * @param {import('./types.js').ExprList} values Output values for pivoted + * columns. Column references will be wrapped in an *any* aggregate. If + * object-valued, the input object should have output value names for keys + * and aggregate table expressions for values. + * @param {import('./types.js').PivotOptions} [options] + * Options for pivoting. + * @return {this} A new pivoted table. + * @example table.pivot('key', 'value') + * @example table.pivot(['keyA', 'keyB'], ['valueA', 'valueB']) + * @example table.pivot({ key: d => d.key }, { value: d => op.sum(d.value) }) + */ + pivot(keys, values, options) { + return pivot(this, keys, values, options); + } + + /** + * Spread array elements into a set of new columns. + * Output columns are named based on the value key and array index. + * @param {import('./types.js').ExprList} values + * The column values to spread. The values may be specified using column + * name strings, column index numbers, value objects with output column + * names for keys and table expressions for values, or selection helper + * functions. + * @param {import('./types.js').SpreadOptions } [options] + * Options for spreading. + * @return {this} A new table with the spread columns added. + * @example table.spread({ a: d => op.split(d.text, '') }) + * @example table.spread('arrayCol', { limit: 100 }) + */ + spread(values, options) { + return spread(this, values, options); + } + + /** + * Unroll one or more array-valued columns into new rows. + * If more than one array value is used, the number of new rows + * is the smaller of the limit and the largest length. + * Values for all other columns are copied over. + * @param {import('./types.js').ExprList} values + * The column values to unroll. The values may be specified using column + * name strings, column index numbers, value objects with output column + * names for keys and table expressions for values, or selection helper + * functions. + * @param {import('./types.js').UnrollOptions} [options] + * Options for unrolling. + * @return {this} A new unrolled table. + * @example table.unroll('colA', { limit: 1000 }) + */ + unroll(values, options) { + return unroll(this, values, options); + } + + // -- Joins --------------------------------------------------------------- + + /** + * Lookup values from a secondary table and add them as new columns. + * A lookup occurs upon matching key values for rows in both tables. + * If the secondary table has multiple rows with the same key, only + * the last observed instance will be considered in the lookup. + * Lookup is similar to *join_left*, but with a simpler + * syntax and the added constraint of allowing at most one match only. + * @param {import('./types.js').TableRef} other + * The secondary table to look up values from. + * @param {import('./types.js').JoinKeys} [on] + * Lookup keys (column name strings or table expressions) for this table + * and the secondary table, respectively. + * @param {...import('./types.js').ExprList} values + * The column values to add from the secondary table. Can be column name + * strings or objects with column names as keys and table expressions as + * values. + * @return {this} A new table with lookup values added. + * @example table.lookup(other, ['key1', 'key2'], 'value1', 'value2') + */ + lookup(other, on, ...values) { + return lookup(this, other, on, ...values); + } + + /** + * Join two tables, extending the columns of one table with + * values from the other table. The current table is considered + * the "left" table in the join, and the new table input is + * considered the "right" table in the join. By default an inner + * join is performed, removing all rows that do not match the + * join criteria. To perform left, right, or full outer joins, use + * the *join_left*, *join_right*, or *join_full* methods, or provide + * an options argument. + * @param {import('./types.js').TableRef} other + * The other (right) table to join with. + * @param {import('./types.js').JoinPredicate} [on] + * The join criteria for matching table rows. If unspecified, the values of + * all columns with matching names are compared. + * If array-valued, a two-element array should be provided, containing + * the columns to compare for the left and right tables, respectively. + * If a one-element array or a string value is provided, the same + * column names will be drawn from both tables. + * If function-valued, should be a two-table table expression that + * returns a boolean value. When providing a custom predicate, note that + * join key values can be arrays or objects, and that normal join + * semantics do not consider null or undefined values to be equal (that is, + * null !== null). Use the op.equal function to handle these cases. + * @param {import('./types.js').JoinValues} [values] + * The columns to include in the join output. + * If unspecified, all columns from both tables are included; paired + * join keys sharing the same column name are included only once. + * If array-valued, a two element array should be provided, containing + * the columns to include for the left and right tables, respectively. + * Array input may consist of column name strings, objects with output + * names as keys and single-table table expressions as values, or the + * selection helper functions *all*, *not*, or *range*. + * If object-valued, specifies the key-value pairs for each output, + * defined using two-table table expressions. + * @param {import('./types.js').JoinOptions} [options] + * Options for the join. + * @return {this} A new joined table. + * @example table.join(other, ['keyL', 'keyR']) + * @example table.join(other, (a, b) => op.equal(a.keyL, b.keyR)) + */ + join(other, on, values, options) { + return join(this, other, on, values, options); + } + + /** + * Perform a left outer join on two tables. Rows in the left table + * that do not match a row in the right table will be preserved. + * This is a convenience method with fixed options for *join*. + * @param {import('./types.js').TableRef} other + * The other (right) table to join with. + * @param {import('./types.js').JoinPredicate} [on] + * The join criteria for matching table rows. + * If unspecified, the values of all columns with matching names + * are compared. + * If array-valued, a two-element array should be provided, containing + * the columns to compare for the left and right tables, respectively. + * If a one-element array or a string value is provided, the same + * column names will be drawn from both tables. + * If function-valued, should be a two-table table expression that + * returns a boolean value. When providing a custom predicate, note that + * join key values can be arrays or objects, and that normal join + * semantics do not consider null or undefined values to be equal (that is, + * null !== null). Use the op.equal function to handle these cases. + * @param {import('./types.js').JoinValues} [values] + * he columns to include in the join output. + * If unspecified, all columns from both tables are included; paired + * join keys sharing the same column name are included only once. + * If array-valued, a two element array should be provided, containing + * the columns to include for the left and right tables, respectively. + * Array input may consist of column name strings, objects with output + * names as keys and single-table table expressions as values, or the + * selection helper functions *all*, *not*, or *range*. + * If object-valued, specifies the key-value pairs for each output, + * defined using two-table table expressions. + * @param {import('./types.js').JoinOptions} [options] + * Options for the join. With this method, any options will be + * overridden with `{left: true, right: false}`. + * @return {this} A new joined table. + * @example table.join_left(other, ['keyL', 'keyR']) + * @example table.join_left(other, (a, b) => op.equal(a.keyL, b.keyR)) + */ + join_left(other, on, values, options) { + const opt = { ...options, left: true, right: false }; + return join(this, other, on, values, opt); + } + + /** + * Perform a right outer join on two tables. Rows in the right table + * that do not match a row in the left table will be preserved. + * This is a convenience method with fixed options for *join*. + * @param {import('./types.js').TableRef} other + * The other (right) table to join with. + * @param {import('./types.js').JoinPredicate} [on] + * The join criteria for matching table rows. + * If unspecified, the values of all columns with matching names + * are compared. + * If array-valued, a two-element array should be provided, containing + * the columns to compare for the left and right tables, respectively. + * If a one-element array or a string value is provided, the same + * column names will be drawn from both tables. + * If function-valued, should be a two-table table expression that + * returns a boolean value. When providing a custom predicate, note that + * join key values can be arrays or objects, and that normal join + * semantics do not consider null or undefined values to be equal (that is, + * null !== null). Use the op.equal function to handle these cases. + * @param {import('./types.js').JoinValues} [values] + * The columns to include in the join output. + * If unspecified, all columns from both tables are included; paired + * join keys sharing the same column name are included only once. + * If array-valued, a two element array should be provided, containing + * the columns to include for the left and right tables, respectively. + * Array input may consist of column name strings, objects with output + * names as keys and single-table table expressions as values, or the + * selection helper functions *all*, *not*, or *range*. + * If object-valued, specifies the key-value pairs for each output, + * defined using two-table table expressions. + * @param {import('./types.js').JoinOptions} [options] + * Options for the join. With this method, any options will be overridden + * with `{left: false, right: true}`. + * @return {this} A new joined table. + * @example table.join_right(other, ['keyL', 'keyR']) + * @example table.join_right(other, (a, b) => op.equal(a.keyL, b.keyR)) + */ + join_right(other, on, values, options) { + const opt = { ...options, left: false, right: true }; + return join(this, other, on, values, opt); + } + + /** + * Perform a full outer join on two tables. Rows in either the left or + * right table that do not match a row in the other will be preserved. + * This is a convenience method with fixed options for *join*. + * @param {import('./types.js').TableRef} other + * The other (right) table to join with. + * @param {import('./types.js').JoinPredicate} [on] + * The join criteria for matching table rows. + * If unspecified, the values of all columns with matching names + * are compared. + * If array-valued, a two-element array should be provided, containing + * the columns to compare for the left and right tables, respectively. + * If a one-element array or a string value is provided, the same + * column names will be drawn from both tables. + * If function-valued, should be a two-table table expression that + * returns a boolean value. When providing a custom predicate, note that + * join key values can be arrays or objects, and that normal join + * semantics do not consider null or undefined values to be equal (that is, + * null !== null). Use the op.equal function to handle these cases. + * @param {import('./types.js').JoinValues} [values] + * The columns to include in the join output. + * If unspecified, all columns from both tables are included; paired + * join keys sharing the same column name are included only once. + * If array-valued, a two element array should be provided, containing + * the columns to include for the left and right tables, respectively. + * Array input may consist of column name strings, objects with output + * names as keys and single-table table expressions as values, or the + * selection helper functions *all*, *not*, or *range*. + * If object-valued, specifies the key-value pairs for each output, + * defined using two-table table expressions. + * @param {import('./types.js').JoinOptions} [options] + * Options for the join. With this method, any options will be overridden + * with `{left: true, right: true}`. + * @return {this} A new joined table. + * @example table.join_full(other, ['keyL', 'keyR']) + * @example table.join_full(other, (a, b) => op.equal(a.keyL, b.keyR)) + */ + join_full(other, on, values, options) { + const opt = { ...options, left: true, right: true }; + return join(this, other, on, values, opt); + } + + /** + * Produce the Cartesian cross product of two tables. The output table + * has one row for every pair of input table rows. Beware that outputs + * may be quite large, as the number of output rows is the product of + * the input row counts. + * This is a convenience method for *join* in which the + * join criteria is always true. + * @param {import('./types.js').TableRef} other + * The other (right) table to join with. + * @param {import('./types.js').JoinValues} [values] + * The columns to include in the output. + * If unspecified, all columns from both tables are included. + * If array-valued, a two element array should be provided, containing + * the columns to include for the left and right tables, respectively. + * Array input may consist of column name strings, objects with output + * names as keys and single-table table expressions as values, or the + * selection helper functions *all*, *not*, or *range*. + * If object-valued, specifies the key-value pairs for each output, + * defined using two-table table expressions. + * @param {import('./types.js').JoinOptions} [options] + * Options for the join. + * @return {this} A new joined table. + * @example table.cross(other) + * @example table.cross(other, [['leftKey', 'leftVal'], ['rightVal']]) + */ + cross(other, values, options) { + return cross(this, other, values, options); + } + + /** + * Perform a semi-join, filtering the left table to only rows that + * match a row in the right table. + * @param {import('./types.js').TableRef} other + * The other (right) table to join with. + * @param {import('./types.js').JoinPredicate} [on] + * The join criteria for matching table rows. + * If unspecified, the values of all columns with matching names + * are compared. + * If array-valued, a two-element array should be provided, containing + * the columns to compare for the left and right tables, respectively. + * If a one-element array or a string value is provided, the same + * column names will be drawn from both tables. + * If function-valued, should be a two-table table expression that + * returns a boolean value. When providing a custom predicate, note that + * join key values can be arrays or objects, and that normal join + * semantics do not consider null or undefined values to be equal (that is, + * null !== null). Use the op.equal function to handle these cases. + * @return {this} A new filtered table. + * @example table.semijoin(other) + * @example table.semijoin(other, ['keyL', 'keyR']) + * @example table.semijoin(other, (a, b) => op.equal(a.keyL, b.keyR)) + */ + semijoin(other, on) { + return semijoin(this, other, on); + } + + /** + * Perform an anti-join, filtering the left table to only rows that + * do *not* match a row in the right table. + * @param {import('./types.js').TableRef} other + * The other (right) table to join with. + * @param {import('./types.js').JoinPredicate} [on] + * The join criteria for matching table rows. + * If unspecified, the values of all columns with matching names + * are compared. + * If array-valued, a two-element array should be provided, containing + * the columns to compare for the left and right tables, respectively. + * If a one-element array or a string value is provided, the same + * column names will be drawn from both tables. + * If function-valued, should be a two-table table expression that + * returns a boolean value. When providing a custom predicate, note that + * join key values can be arrays or objects, and that normal join + * semantics do not consider null or undefined values to be equal (that is, + * null !== null). Use the op.equal function to handle these cases. + * @return {this} A new filtered table. + * @example table.antijoin(other) + * @example table.antijoin(other, ['keyL', 'keyR']) + * @example table.antijoin(other, (a, b) => op.equal(a.keyL, b.keyR)) + */ + antijoin(other, on) { + return antijoin(this, other, on); + } + + // -- Set Operations ------------------------------------------------------ + + /** + * Concatenate multiple tables into a single table, preserving all rows. + * This transformation mirrors the UNION_ALL operation in SQL. + * Only named columns in this table are included in the output. + * @param {...import('./types.js').TableRefList} tables + * A list of tables to concatenate. + * @return {this} A new concatenated table. + * @example table.concat(other) + * @example table.concat(other1, other2) + * @example table.concat([other1, other2]) + */ + concat(...tables) { + return concat(this, ...tables); + } + + /** + * Union multiple tables into a single table, deduplicating all rows. + * This transformation mirrors the UNION operation in SQL. It is + * similar to *concat* but suppresses duplicate rows with + * values identical to another row. + * Only named columns in this table are included in the output. + * @param {...import('./types.js').TableRefList} tables + * A list of tables to union. + * @return {this} A new unioned table. + * @example table.union(other) + * @example table.union(other1, other2) + * @example table.union([other1, other2]) + */ + union(...tables) { + return union(this, ...tables); + } + + /** + * Intersect multiple tables, keeping only rows whose with identical + * values for all columns in all tables, and deduplicates the rows. + * This transformation is similar to a series of *semijoin*. + * calls, but additionally suppresses duplicate rows. + * @param {...import('./types.js').TableRefList} tables + * A list of tables to intersect. + * @return {this} A new filtered table. + * @example table.intersect(other) + * @example table.intersect(other1, other2) + * @example table.intersect([other1, other2]) + */ + intersect(...tables) { + return intersect(this, ...tables); + } + + /** + * Compute the set difference with multiple tables, keeping only rows in + * this table that whose values do not occur in the other tables. + * This transformation is similar to a series of *anitjoin* + * calls, but additionally suppresses duplicate rows. + * @param {...import('./types.js').TableRefList} tables + * A list of tables to difference. + * @return {this} A new filtered table. + * @example table.except(other) + * @example table.except(other1, other2) + * @example table.except([other1, other2]) + */ + except(...tables) { + return except(this, ...tables); + } + + // -- Table Output Formats ------------------------------------------------ + + /** + * Format this table as an Apache Arrow table. + * @param {import('../arrow/types.js').ArrowFormatOptions} [options] + * The Arrow formatting options. + * @return {import('apache-arrow').Table} An Apache Arrow table. + */ + toArrow(options) { + return toArrow(this, options); + } + + /** + * Format this table as binary data in the Apache Arrow IPC format. + * @param {import('../arrow/types.js').ArrowIPCFormatOptions} [options] + * The Arrow IPC formatting options. + * @return {Uint8Array} A new Uint8Array of Arrow-encoded binary data. + */ + toArrowIPC(options) { + return toArrowIPC(this, options); + } + + /** + * Format this table as a comma-separated values (CSV) string. Other + * delimiters, such as tabs or pipes ('|'), can be specified using + * the options argument. + * @param {import('../format/to-csv.js').CSVFormatOptions} [options] + * The CSV formatting options. + * @return {string} A delimited value string. + */ + toCSV(options) { + return toCSV(this, options); + } + + /** + * Format this table as an HTML table string. + * @param {import('../format/to-html.js').HTMLFormatOptions} [options] + * The HTML formatting options. + * @return {string} An HTML table string. + */ + toHTML(options) { + return toHTML(this, options); + } + + /** + * Format this table as a JavaScript Object Notation (JSON) string. + * @param {import('../format/to-json.js').JSONFormatOptions} [options] + * The JSON formatting options. + * @return {string} A JSON string. + */ + toJSON(options) { + return toJSON(this, options); + } + + /** + * Format this table as a GitHub-Flavored Markdown table string. + * @param {import('../format/to-markdown.js').MarkdownFormatOptions} [options] + * The Markdown formatting options. + * @return {string} A GitHub-Flavored Markdown table string. + */ + toMarkdown(options) { + return toMarkdown(this, options); + } +} diff --git a/src/table/Table.js b/src/table/Table.js new file mode 100644 index 00000000..734f8e15 --- /dev/null +++ b/src/table/Table.js @@ -0,0 +1,656 @@ +import { nest, regroup, reindex } from './regroup.js'; +import { rowObjectBuilder } from '../expression/row-object.js'; +import resolve, { all } from '../helpers/selection.js'; +import arrayType from '../util/array-type.js'; +import error from '../util/error.js'; +import isArrayType from '../util/is-array-type.js'; +import isNumber from '../util/is-number.js'; +import repeat from '../util/repeat.js'; + +/** + * Base class representing a column-oriented data table. + */ +export class Table { + /** + * Instantiate a Table instance. + * @param {import('./types.js').ColumnData} columns + * An object mapping column names to values. + * @param {string[]} [names] + * An ordered list of column names. + * @param {import('./BitSet.js').BitSet} [filter] + * A filtering BitSet. + * @param {import('./types.js').GroupBySpec} [group] + * A groupby specification. + * @param {import('./types.js').RowComparator} [order] + * A row comparator function. + * @param {import('./types.js').Params} [params] + * An object mapping parameter names to values. + */ + constructor(columns, names, filter, group, order, params) { + const data = Object.freeze({ ...columns }); + names = names?.slice() ?? Object.keys(data); + const nrows = names.length ? data[names[0]].length : 0; + /** + * @private + * @type {readonly string[]} + */ + this._names = Object.freeze(names); + /** + * @private + * @type {import('./types.js').ColumnData} + */ + this._data = data; + /** + * @private + * @type {number} + */ + this._total = nrows; + /** + * @private + * @type {number} + */ + this._nrows = filter?.count() ?? nrows; + /** + * @private + * @type {import('./BitSet.js').BitSet} + */ + this._mask = filter ?? null; + /** + * @private + * @type {import('./types.js').GroupBySpec} + */ + this._group = group ?? null; + /** + * @private + * @type {import('./types.js').RowComparator} + */ + this._order = order ?? null; + /** + * @private + * @type {import('./types.js').Params} + */ + this._params = params; + /** + * @private + * @type {Uint32Array} + */ + this._index = null; + /** + * @private + * @type {number[][] | Uint32Array[]} + */ + this._partitions = null; + } + + /** + * Create a new table with the same type as this table. + * The new table may have different data, filter, grouping, or ordering + * based on the values of the optional configuration argument. If a + * setting is not specified, it is inherited from the current table. + * @param {import('./types.js').CreateOptions} [options] + * Creation options for the new table. + * @return {this} A newly created table. + */ + create({ + data = undefined, + names = undefined, + filter = undefined, + groups = undefined, + order = undefined + } = {}) { + const f = filter !== undefined ? filter : this.mask(); + // @ts-ignore + return new this.constructor( + data || this._data, + names || (!data ? this._names : null), + f, + groups !== undefined ? groups : regroup(this._group, filter && f), + order !== undefined ? order : this._order, + this._params + ); + } + + /** + * Get or set table expression parameter values. + * If called with no arguments, returns the current parameter values + * as an object. Otherwise, adds the provided parameters to this + * table's parameter set and returns the table. Any prior parameters + * with names matching the input parameters are overridden. + * @param {import('./types.js').Params} [values] + * The parameter values. + * @return {this|import('./types.js').Params} + * The current parameter values (if called with no arguments) or this table. + */ + params(values) { + if (arguments.length) { + if (values) { + this._params = { ...this._params, ...values }; + } + return this; + } else { + return this._params; + } + } + + /** + * Provide an informative object string tag. + */ + get [Symbol.toStringTag]() { + if (!this._names) return 'Object'; // bail if called on prototype + const nr = this.numRows(); + const nc = this.numCols(); + const plural = v => v !== 1 ? 's' : ''; + return `Table: ${nc} col${plural(nc)} x ${nr} row${plural(nr)}` + + (this.isFiltered() ? ` (${this.totalRows()} backing)` : '') + + (this.isGrouped() ? `, ${this._group.size} groups` : '') + + (this.isOrdered() ? ', ordered' : ''); + } + + /** + * Indicates if the table has a filter applied. + * @return {boolean} True if filtered, false otherwise. + */ + isFiltered() { + return !!this._mask; + } + + /** + * Indicates if the table has a groupby specification. + * @return {boolean} True if grouped, false otherwise. + */ + isGrouped() { + return !!this._group; + } + + /** + * Indicates if the table has a row order comparator. + * @return {boolean} True if ordered, false otherwise. + */ + isOrdered() { + return !!this._order; + } + + /** + * Get the backing column data for this table. + * @return {import('./types.js').ColumnData} + * Object of named column instances. + */ + data() { + return this._data; + } + + /** + * Returns the filter bitset mask, if defined. + * @return {import('./BitSet.js').BitSet} The filter bitset mask. + */ + mask() { + return this._mask; + } + + /** + * Returns the groupby specification, if defined. + * @return {import('./types.js').GroupBySpec} The groupby specification. + */ + groups() { + return this._group; + } + + /** + * Returns the row order comparator function, if specified. + * @return {import('./types.js').RowComparator} + * The row order comparator function. + */ + comparator() { + return this._order; + } + + /** + * The total number of rows in this table, counting both + * filtered and unfiltered rows. + * @return {number} The number of total rows. + */ + totalRows() { + return this._total; + } + + /** + * The number of active rows in this table. This number may be + * less than the *totalRows* if the table has been filtered. + * @return {number} The number of rows. + */ + numRows() { + return this._nrows; + } + + /** + * The number of active rows in this table. This number may be + * less than the *totalRows* if the table has been filtered. + * @return {number} The number of rows. + */ + get size() { + return this._nrows; + } + + /** + * The number of columns in this table. + * @return {number} The number of columns. + */ + numCols() { + return this._names.length; + } + + /** + * Filter function invoked for each column name. + * @callback NameFilter + * @param {string} name The column name. + * @param {number} index The column index. + * @param {string[]} array The array of names. + * @return {boolean} Returns true to retain the column name. + */ + + /** + * The table column names, optionally filtered. + * @param {NameFilter} [filter] An optional filter function. + * If unspecified, all column names are returned. + * @return {string[]} An array of matching column names. + */ + columnNames(filter) { + return filter ? this._names.filter(filter) : this._names.slice(); + } + + /** + * The column name at the given index. + * @param {number} index The column index. + * @return {string} The column name, + * or undefined if the index is out of range. + */ + columnName(index) { + return this._names[index]; + } + + /** + * The column index for the given name. + * @param {string} name The column name. + * @return {number} The column index, or -1 if the name is not found. + */ + columnIndex(name) { + return this._names.indexOf(name); + } + + /** + * Get the column instance with the given name. + * @param {string} name The column name. + * @return {import('./types.js').ColumnType | undefined} + * The named column, or undefined if it does not exist. + */ + column(name) { + return this._data[name]; + } + + /** + * Get the column instance at the given index position. + * @param {number} index The zero-based column index. + * @return {import('./types.js').ColumnType | undefined} + * The column, or undefined if it does not exist. + */ + columnAt(index) { + return this._data[this._names[index]]; + } + + /** + * Get an array of values contained in a column. The resulting array + * respects any table filter or orderby criteria. + * @param {string} name The column name. + * @param {ArrayConstructor | import('./types.js').TypedArrayConstructor} [constructor=Array] + * The array constructor for instantiating the output array. + * @return {import('./types.js').DataValue[] | import('./types.js').TypedArray} + * The array of column values. + */ + array(name, constructor = Array) { + const column = this.column(name); + const array = new constructor(this.numRows()); + let idx = -1; + this.scan(row => array[++idx] = column.at(row), true); + return array; + } + + /** + * Get the value for the given column and row. + * @param {string} name The column name. + * @param {number} [row=0] The row index, defaults to zero if not specified. + * @return {import('./types.js').DataValue} The table value at (column, row). + */ + get(name, row = 0) { + const column = this.column(name); + return this.isFiltered() || this.isOrdered() + ? column.at(this.indices()[row]) + : column.at(row); + } + + /** + * Returns an accessor ("getter") function for a column. The returned + * function takes a row index as its single argument and returns the + * corresponding column value. + * @param {string} name The column name. + * @return {import('./types.js').ColumnGetter} The column getter function. + */ + getter(name) { + const column = this.column(name); + const indices = this.isFiltered() || this.isOrdered() ? this.indices() : null; + if (indices) { + return row => column.at(indices[row]); + } else if (column) { + return row => column.at(row); + } else { + error(`Unrecognized column: ${name}`); + } + } + + /** + * Returns an object representing a table row. + * @param {number} [row=0] The row index, defaults to zero if not specified. + * @return {object} A row object with named properties for each column. + */ + object(row = 0) { + return objectBuilder(this)(row); + } + + /** + * Returns an array of objects representing table rows. + * @param {import('./types.js').ObjectsOptions} [options] + * The options for row object generation. + * @return {object[]} An array of row objects. + */ + objects(options = {}) { + const { grouped, limit, offset } = options; + + // generate array of row objects + const names = resolve(this, options.columns || all()); + const createRow = rowObjectBuilder(this, names); + const obj = []; + this.scan( + (row, data) => obj.push(createRow(row, data)), + true, limit, offset + ); + + // produce nested output as requested + if (grouped && this.isGrouped()) { + const idx = []; + this.scan(row => idx.push(row), true, limit, offset); + return nest(this, idx, obj, grouped); + } + + return obj; + } + + /** + * Returns an iterator over objects representing table rows. + * @return {Iterator} An iterator over row objects. + */ + *[Symbol.iterator]() { + const createRow = objectBuilder(this); + const n = this.numRows(); + for (let i = 0; i < n; ++i) { + yield createRow(i); + } + } + + /** + * Returns an iterator over column values. + * @return {Iterator} An iterator over row objects. + */ + *values(name) { + const get = this.getter(name); + const n = this.numRows(); + for (let i = 0; i < n; ++i) { + yield get(i); + } + } + + /** + * Print the contents of this table using the console.table() method. + * @param {import('./types.js').PrintOptions|number} options + * The options for row object generation, determining which rows and + * columns are printed. If number-valued, specifies the row limit. + * @return {this} The table instance. + */ + print(options = {}) { + const opt = isNumber(options) + ? { limit: +options } + // @ts-ignore + : { ...options, limit: 10 }; + + const obj = this.objects({ ...opt, grouped: false }); + const msg = `${this[Symbol.toStringTag]}. Showing ${obj.length} rows.`; + + console.log(msg); // eslint-disable-line no-console + console.table(obj); // eslint-disable-line no-console + return this; + } + + /** + * Returns an array of indices for all rows passing the table filter. + * @param {boolean} [order=true] A flag indicating if the returned + * indices should be sorted if this table is ordered. If false, the + * returned indices may or may not be sorted. + * @return {Uint32Array} An array of row indices. + */ + indices(order = true) { + if (this._index) return this._index; + + const n = this.numRows(); + const index = new Uint32Array(n); + const ordered = this.isOrdered(); + const bits = this.mask(); + let row = -1; + + // inline the following for performance: + // this.scan(row => index[++i] = row); + if (bits) { + for (let i = bits.next(0); i >= 0; i = bits.next(i + 1)) { + index[++row] = i; + } + } else { + for (let i = 0; i < n; ++i) { + index[++row] = i; + } + } + + // sort index vector + if (order && ordered) { + const { _order, _data } = this; + index.sort((a, b) => _order(a, b, _data)); + } + + // save indices if they reflect table metadata + if (order || !ordered) { + this._index = index; + } + + return index; + } + + /** + * Returns an array of indices for each group in the table. + * If the table is not grouped, the result is the same as + * the *indices* method, but wrapped within an array. + * @param {boolean} [order=true] A flag indicating if the returned + * indices should be sorted if this table is ordered. If false, the + * returned indices may or may not be sorted. + * @return {number[][] | Uint32Array[]} An array of row index arrays, one + * per group. The indices will be filtered if the table is filtered. + */ + partitions(order = true) { + // return partitions if already generated + if (this._partitions) { + return this._partitions; + } + + // if not grouped, return a single partition + if (!this.isGrouped()) { + return [ this.indices(order) ]; + } + + // generate partitions + const { keys, size } = this._group; + const part = repeat(size, () => []); + + // populate partitions, don't sort if indices don't exist + // inline the following for performance: + // this.scan(row => part[keys[row]].push(row), sort); + const sort = this._index; + const bits = this.mask(); + const n = this.numRows(); + if (sort && this.isOrdered()) { + for (let i = 0, r; i < n; ++i) { + r = sort[i]; + part[keys[r]].push(r); + } + } else if (bits) { + for (let i = bits.next(0); i >= 0; i = bits.next(i + 1)) { + part[keys[i]].push(i); + } + } else { + for (let i = 0; i < n; ++i) { + part[keys[i]].push(i); + } + } + + // if ordered but not yet sorted, sort partitions directly + if (order && !sort && this.isOrdered()) { + const compare = this._order; + const data = this._data; + for (let i = 0; i < size; ++i) { + part[i].sort((a, b) => compare(a, b, data)); + } + } + + // save partitions if they reflect table metadata + if (order || !this.isOrdered()) { + this._partitions = part; + } + + return part; + } + + /** + * Create a new fully-materialized instance of this table. + * All filter and orderby settings are removed from the new table. + * Instead, the backing data itself is filtered and ordered as needed. + * @param {number[]} [indices] Ordered row indices to materialize. + * If unspecified, all rows passing the table filter are used. + * @return {this} A reified table. + */ + reify(indices) { + const nrows = indices ? indices.length : this.numRows(); + const names = this._names; + let data, groups; + + if (!indices && !this.isOrdered()) { + if (!this.isFiltered()) { + return this; // data already reified + } else if (nrows === this.totalRows()) { + data = this.data(); // all rows pass filter, skip copy + } + } + + if (!data) { + const scan = indices ? f => indices.forEach(f) : f => this.scan(f, true); + const ncols = names.length; + data = {}; + + for (let i = 0; i < ncols; ++i) { + const name = names[i]; + const prev = this.column(name); + const curr = data[name] = new (arrayType(prev))(nrows); + let r = -1; + // optimize array access + isArrayType(prev) + ? scan(row => curr[++r] = prev[row]) + : scan(row => curr[++r] = prev.at(row)); + } + + if (this.isGrouped()) { + groups = reindex(this.groups(), scan, !!indices, nrows); + } + } + + return this.create({ data, names, groups, filter: null, order: null }); + } + + /** + * Callback function to cancel a table scan. + * @callback ScanStop + * @return {void} + */ + + /** + * Callback function invoked for each row of a table scan. + * @callback ScanVisitor + * @param {number} [row] The table row index. + * @param {import('./types.js').ColumnData} [data] + * The backing table data store. + * @param {ScanStop} [stop] Function to stop the scan early. + * Callees can invoke this function to prevent future calls. + * @return {void} + */ + + /** + * Perform a table scan, visiting each row of the table. + * If this table is filtered, only rows passing the filter are visited. + * @param {ScanVisitor} fn Callback invoked for each row of the table. + * @param {boolean} [order=false] Indicates if the table should be + * scanned in the order determined by *orderby*. This + * argument has no effect if the table is unordered. + * @property {number} [limit=Infinity] The maximum number of + * objects to create. + * @property {number} [offset=0] The row offset indicating how many + * initial rows to skip. + */ + scan(fn, order, limit = Infinity, offset = 0) { + const filter = this._mask; + const nrows = this._nrows; + const data = this._data; + + let i = offset || 0; + if (i > nrows) return; + + const n = Math.min(nrows, i + limit); + const stop = () => i = this._total; + + if (order && this.isOrdered() || filter && this._index) { + const index = this.indices(); + const data = this._data; + for (; i < n; ++i) { + fn(index[i], data, stop); + } + } else if (filter) { + let c = n - i + 1; + for (i = filter.nth(i); --c && i > -1; i = filter.next(i + 1)) { + fn(i, data, stop); + } + } else { + for (; i < n; ++i) { + fn(i, data, stop); + } + } + } +} + +function objectBuilder(table) { + let b = table._builder; + + if (!b) { + const createRow = rowObjectBuilder(table); + const data = table.data(); + if (table.isOrdered() || table.isFiltered()) { + const indices = table.indices(); + b = row => createRow(indices[row], data); + } else { + b = row => createRow(row, data); + } + table._builder = b; + } + + return b; +} diff --git a/src/table/column-set.js b/src/table/column-set.js deleted file mode 100644 index 9c507726..00000000 --- a/src/table/column-set.js +++ /dev/null @@ -1,35 +0,0 @@ -import has from '../util/has.js'; - -export default function(table) { - return table - ? new ColumnSet({ ...table.data() }, table.columnNames()) - : new ColumnSet(); -} - -class ColumnSet { - constructor(data, names) { - this.data = data || {}; - this.names = names || []; - } - - add(name, values) { - if (!this.has(name)) this.names.push(name + ''); - return this.data[name] = values; - } - - has(name) { - return has(this.data, name); - } - - new() { - this.filter = null; - this.groups = this.groups || null; - this.order = null; - return this; - } - - groupby(groups) { - this.groups = groups; - return this; - } -} diff --git a/src/table/column-table.js b/src/table/column-table.js deleted file mode 100644 index 99b76ede..00000000 --- a/src/table/column-table.js +++ /dev/null @@ -1,456 +0,0 @@ -import { defaultColumnFactory } from './column.js'; -import columnsFrom from './columns-from.js'; -import columnSet from './column-set.js'; -import Table from './table.js'; -import { nest, regroup, reindex } from './regroup.js'; -import { rowObjectBuilder } from '../expression/row-object.js'; -import { default as toArrow, toArrowIPC } from '../format/to-arrow.js'; -import toCSV from '../format/to-csv.js'; -import toHTML from '../format/to-html.js'; -import toJSON from '../format/to-json.js'; -import toMarkdown from '../format/to-markdown.js'; -import resolve, { all } from '../helpers/selection.js'; -import arrayType from '../util/array-type.js'; -import entries from '../util/entries.js'; -import error from '../util/error.js'; -import mapObject from '../util/map-object.js'; - -/** - * Class representing a table backed by a named set of columns. - */ -export default class ColumnTable extends Table { - - /** - * Create a new ColumnTable from existing input data. - * @param {object[]|Iterable|object|Map} values The backing table data values. - * If array-valued, should be a list of JavaScript objects with - * key-value properties for each column value. - * If object- or Map-valued, a table with two columns (one for keys, - * one for values) will be created. - * @param {string[]} [names] The named columns to include. - * @return {ColumnTable} A new ColumnTable instance. - */ - static from(values, names) { - return new ColumnTable(columnsFrom(values, names), names); - } - - /** - * Create a new table for a set of named columns. - * @param {object|Map} columns - * The set of named column arrays. Keys are column names. - * The enumeration order of the keys determines the column indices, - * unless the names parameter is specified. - * Values must be arrays (or array-like values) of identical length. - * @param {string[]} [names] Ordered list of column names. If specified, - * this array determines the column indices. If not specified, the - * key enumeration order of the columns object is used. - * @return {ColumnTable} the instantiated ColumnTable instance. - */ - static new(columns, names) { - if (columns instanceof ColumnTable) return columns; - const data = {}; - const keys = []; - for (const [key, value] of entries(columns)) { - data[key] = value; - keys.push(key); - } - return new ColumnTable(data, names || keys); - } - - /** - * Instantiate a new ColumnTable instance. - * @param {object} columns An object mapping column names to values. - * @param {string[]} [names] An ordered list of column names. - * @param {BitSet} [filter] A filtering BitSet. - * @param {GroupBySpec} [group] A groupby specification. - * @param {RowComparator} [order] A row comparator function. - * @param {Params} [params] An object mapping parameter names to values. - */ - constructor(columns, names, filter, group, order, params) { - mapObject(columns, defaultColumnFactory, columns); - names = names || Object.keys(columns); - const nrows = names.length ? columns[names[0]].length : 0; - super(names, nrows, columns, filter, group, order, params); - } - - /** - * Create a new table with the same type as this table. - * The new table may have different data, filter, grouping, or ordering - * based on the values of the optional configuration argument. If a - * setting is not specified, it is inherited from the current table. - * @param {CreateOptions} [options] Creation options for the new table. - * @return {this} A newly created table. - */ - create({ data, names, filter, groups, order }) { - const f = filter !== undefined ? filter : this.mask(); - - return new ColumnTable( - data || this._data, - names || (!data ? this._names : null), - f, - groups !== undefined ? groups : regroup(this._group, filter && f), - order !== undefined ? order : this._order, - this._params - ); - } - - /** - * Create a new table with additional columns drawn from one or more input - * tables. All tables must have the same numer of rows and are reified - * prior to assignment. In the case of repeated column names, input table - * columns overwrite existing columns. - * @param {...ColumnTable} tables The tables to merge with this table. - * @return {ColumnTable} A new table with merged columns. - * @example table.assign(table1, table2) - */ - assign(...tables) { - const nrows = this.numRows(); - const base = this.reify(); - const cset = columnSet(base).groupby(base.groups()); - tables.forEach(input => { - input = ColumnTable.new(input); - if (input.numRows() !== nrows) error('Assign row counts do not match'); - input = input.reify(); - input.columnNames(name => cset.add(name, input.column(name))); - }); - return this.create(cset.new()); - } - - /** - * Get the backing set of columns for this table. - * @return {ColumnData} Object of named column instances. - */ - columns() { - return this._data; - } - - /** - * Get the column instance with the given name. - * @param {string} name The column name. - * @return {ColumnType | undefined} The named column, or undefined if it does not exist. - */ - column(name) { - return this._data[name]; - } - - /** - * Get the column instance at the given index position. - * @param {number} index The zero-based column index. - * @return {ColumnType | undefined} The column, or undefined if it does not exist. - */ - columnAt(index) { - return this._data[this._names[index]]; - } - - /** - * Get an array of values contained in a column. The resulting array - * respects any table filter or orderby criteria. - * @param {string} name The column name. - * @param {ArrayConstructor|TypedArrayConstructor} [constructor=Array] - * The array constructor for instantiating the output array. - * @return {DataValue[]|TypedArray} The array of column values. - */ - array(name, constructor = Array) { - const column = this.column(name); - const array = new constructor(this.numRows()); - let idx = -1; - this.scan(row => array[++idx] = column.get(row), true); - return array; - } - - /** - * Get the value for the given column and row. - * @param {string} name The column name. - * @param {number} [row=0] The row index, defaults to zero if not specified. - * @return {DataValue} The table value at (column, row). - */ - get(name, row = 0) { - const column = this.column(name); - return this.isFiltered() || this.isOrdered() - ? column.get(this.indices()[row]) - : column.get(row); - } - - /** - * Returns an accessor ("getter") function for a column. The returned - * function takes a row index as its single argument and returns the - * corresponding column value. - * @param {string} name The column name. - * @return {ColumnGetter} The column getter function. - */ - getter(name) { - const column = this.column(name); - const indices = this.isFiltered() || this.isOrdered() ? this.indices() : null; - return indices ? row => column.get(indices[row]) - : column ? row => column.get(row) - : error(`Unrecognized column: ${name}`); - } - - /** - * Returns an object representing a table row. - * @param {number} [row=0] The row index, defaults to zero if not specified. - * @return {object} A row object with named properties for each column. - */ - object(row = 0) { - return objectBuilder(this)(row); - } - - /** - * Returns an array of objects representing table rows. - * @param {ObjectsOptions} [options] The options for row object generation. - * @return {object[]} An array of row objects. - */ - objects(options = {}) { - const { grouped, limit, offset } = options; - - // generate array of row objects - const names = resolve(this, options.columns || all()); - const create = rowObjectBuilder(names); - const obj = []; - this.scan( - (row, data) => obj.push(create(row, data)), - true, limit, offset - ); - - // produce nested output as requested - if (grouped && this.isGrouped()) { - const idx = []; - this.scan(row => idx.push(row), true, limit, offset); - return nest(this, idx, obj, grouped); - } - - return obj; - } - - /** - * Returns an iterator over objects representing table rows. - * @return {Iterator} An iterator over row objects. - */ - *[Symbol.iterator]() { - const create = objectBuilder(this); - const n = this.numRows(); - for (let i = 0; i < n; ++i) { - yield create(i); - } - } - - /** - * Create a new fully-materialized instance of this table. - * All filter and orderby settings are removed from the new table. - * Instead, the backing data itself is filtered and ordered as needed. - * @param {number[]} [indices] Ordered row indices to materialize. - * If unspecified, all rows passing the table filter are used. - * @return {this} A reified table. - */ - reify(indices) { - const nrows = indices ? indices.length : this.numRows(); - const names = this._names; - let data, groups; - - if (!indices && !this.isOrdered()) { - if (!this.isFiltered()) { - return this; // data already reified - } else if (nrows === this.totalRows()) { - data = this.data(); // all rows pass filter, skip copy - } - } - - if (!data) { - const scan = indices ? f => indices.forEach(f) : f => this.scan(f, true); - const ncols = names.length; - data = {}; - - for (let i = 0; i < ncols; ++i) { - const name = names[i]; - const prev = this.column(name); - const curr = data[name] = new (arrayType(prev))(nrows); - let r = -1; - scan(row => curr[++r] = prev.get(row)); - } - - if (this.isGrouped()) { - groups = reindex(this.groups(), scan, !!indices, nrows); - } - } - - return this.create({ data, names, groups, filter: null, order: null }); - } - - /** - * Apply a sequence of transformations to this table. The output - * of each transform is passed as input to the next transform, and - * the output of the last transform is then returned. - * @param {...(Transform|Transform[])} transforms Transformation - * functions to apply to the table in sequence. Each function should - * take a single table as input and return a table as output. - * @return {ColumnTable} The output of the last transform. - */ - transform(...transforms) { - return transforms.flat().reduce((t, f) => f(t), this); - } - - /** - * Format this table as an Apache Arrow table. - * @param {ArrowFormatOptions} [options] The formatting options. - * @return {import('apache-arrow').Table} An Apache Arrow table. - */ - toArrow(options) { - return toArrow(this, options); - } - - /** - * Format this table as binary data in the Apache Arrow IPC format. - * @param {ArrowFormatOptions} [options] The formatting options. Set {format: 'stream'} - * or {format:"file"} for specific IPC format - * @return {Uint8Array} A new Uint8Array of Arrow-encoded binary data. - */ - toArrowBuffer(options) { - return toArrowIPC(this, options); - } - - /** - * Format this table as a comma-separated values (CSV) string. Other - * delimiters, such as tabs or pipes ('|'), can be specified using - * the options argument. - * @param {CSVFormatOptions} [options] The formatting options. - * @return {string} A delimited value string. - */ - toCSV(options) { - return toCSV(this, options); - } - - /** - * Format this table as an HTML table string. - * @param {HTMLFormatOptions} [options] The formatting options. - * @return {string} An HTML table string. - */ - toHTML(options) { - return toHTML(this, options); - } - - /** - * Format this table as a JavaScript Object Notation (JSON) string. - * @param {JSONFormatOptions} [options] The formatting options. - * @return {string} A JSON string. - */ - toJSON(options) { - return toJSON(this, options); - } - - /** - * Format this table as a GitHub-Flavored Markdown table string. - * @param {MarkdownFormatOptions} [options] The formatting options. - * @return {string} A GitHub-Flavored Markdown table string. - */ - toMarkdown(options) { - return toMarkdown(this, options); - } -} - -function objectBuilder(table) { - let b = table._builder; - - if (!b) { - const create = rowObjectBuilder(table.columnNames()); - const data = table.data(); - if (table.isOrdered() || table.isFiltered()) { - const indices = table.indices(); - b = row => create(indices[row], data); - } else { - b = row => create(row, data); - } - table._builder = b; - } - - return b; -} - -/** - * Options for derived table creation. - * @typedef {import('./table').CreateOptions} CreateOptions - */ - -/** - * A typed array constructor. - * @typedef {import('./table').TypedArrayConstructor} TypedArrayConstructor - */ - -/** - * A typed array instance. - * @typedef {import('./table').TypedArray} TypedArray - */ - -/** - * Table value. - * @typedef {import('./table').DataValue} DataValue - */ - -/** - * Column value accessor. - * @typedef {import('./table').ColumnGetter} ColumnGetter - */ - -/** - * Options for generating row objects. - * @typedef {import('./table').ObjectsOptions} ObjectsOptions - */ - -/** - * A table transformation. - * @typedef {(table: ColumnTable) => ColumnTable} Transform - */ - -/** - * Proxy type for BitSet class. - * @typedef {import('./table').BitSet} BitSet - */ - -/** - * Proxy type for ColumnType interface. - * @typedef {import('./column').ColumnType} ColumnType - */ - -/** - * A named collection of columns. - * @typedef {{[key: string]: ColumnType}} ColumnData - */ - -/** - * Proxy type for GroupBySpec. - * @typedef {import('./table').GroupBySpec} GroupBySpec - */ - -/** - * Proxy type for RowComparator. - * @typedef {import('./table').RowComparator} RowComparator - */ - -/** - * Proxy type for Params. - * @typedef {import('./table').Params} Params - */ - -/** - * Options for Arrow formatting. - * @typedef {import('../arrow/encode').ArrowFormatOptions} ArrowFormatOptions - */ - -/** - * Options for CSV formatting. - * @typedef {import('../format/to-csv').CSVFormatOptions} CSVFormatOptions - */ - -/** - * Options for HTML formatting. - * @typedef {import('../format/to-html').HTMLFormatOptions} HTMLFormatOptions - */ - -/** - * Options for JSON formatting. - * @typedef {import('../format/to-json').JSONFormatOptions} JSONFormatOptions - */ - -/** - * Options for Markdown formatting. - * @typedef {import('../format/to-markdown').MarkdownFormatOptions} MarkdownFormatOptions - */ diff --git a/src/table/column.js b/src/table/column.js deleted file mode 100644 index fac03d3b..00000000 --- a/src/table/column.js +++ /dev/null @@ -1,79 +0,0 @@ -import isFunction from '../util/is-function.js'; - -/** - * Class representing an array-backed data column. - */ -export default class Column { - /** - * Create a new column instance. - * @param {Array} data The backing array (or array-like object) - * containing the column data. - */ - constructor(data) { - this.data = data; - } - - /** - * Get the length (number of rows) of the column. - * @return {number} The length of the column array. - */ - get length() { - return this.data.length; - } - - /** - * Get the column value at the given row index. - * @param {number} row The row index of the value to retrieve. - * @return {import('./table').DataValue} The column value. - */ - get(row) { - return this.data[row]; - } - - /** - * Returns an iterator over the column values. - * @return {Iterator} An iterator over column values. - */ - [Symbol.iterator]() { - return this.data[Symbol.iterator](); - } -} - -/** - * Column interface. Any object that adheres to this interface - * can be used as a data column within a {@link ColumnTable}. - * @typedef {object} ColumnType - * @property {number} length - * The length (number of rows) of the column. - * @property {import('./table').ColumnGetter} get - * Column value getter. - */ - -/** - * Column factory function interface. - * @callback ColumnFactory - * @param {*} data The input column data. - * @return {ColumnType} A column instance. - */ - -/** - * Create a new column from the given input data. - * @param {any} data The backing column data. If the value conforms to - * the Column interface it is returned directly. If the value is an - * array, it will be wrapped in a new Column instance. - * @return {ColumnType} A compatible column instance. - */ -export let defaultColumnFactory = function(data) { - return data && isFunction(data.get) ? data : new Column(data); -}; - -/** - * Get or set the default factory function for instantiating table columns. - * @param {ColumnFactory} [factory] The new default factory. - * @return {ColumnFactory} The current default column factory. - */ -export function columnFactory(factory) { - return arguments.length - ? (defaultColumnFactory = factory) - : defaultColumnFactory; -} diff --git a/src/table/columns-from.js b/src/table/columns-from.js index 592f5d6b..8fe317dd 100644 --- a/src/table/columns-from.js +++ b/src/table/columns-from.js @@ -6,8 +6,12 @@ import isObject from '../util/is-object.js'; import isRegExp from '../util/is-regexp.js'; import isString from '../util/is-string.js'; -export default function(values, names) { +/** + * @return {import('./types.js').ColumnData} + */ +export function columnsFrom(values, names) { const raise = type => error(`Illegal argument type: ${type || typeof values}`); + // @ts-ignore return values instanceof Map ? fromKeyValuePairs(values.entries(), names) : isDate(values) ? raise('Date') : isRegExp(values) ? raise('RegExp') diff --git a/src/table/index.js b/src/table/index.js index 6468ce46..ff558c29 100644 --- a/src/table/index.js +++ b/src/table/index.js @@ -1,8 +1,6 @@ -import ColumnTable from './column-table.js'; -import verbs from '../verbs/index.js'; - -// Add verb implementations to ColumnTable prototype -Object.assign(ColumnTable.prototype, verbs); +import entries from '../util/entries.js'; +import { ColumnTable } from './ColumnTable.js'; +import { columnsFrom } from './columns-from.js'; /** * Create a new table for a set of named columns. @@ -18,7 +16,15 @@ Object.assign(ColumnTable.prototype, verbs); * @example table({ colA: ['a', 'b', 'c'], colB: [3, 4, 5] }) */ export function table(columns, names) { - return ColumnTable.new(columns, names); + if (columns instanceof ColumnTable) return columns; + /** @type {import('./types.js').ColumnData} */ + const data = {}; + const keys = []; + for (const [key, value] of entries(columns)) { + data[key] = value; + keys.push(key); + } + return new ColumnTable(data, names || keys); } /** @@ -36,5 +42,5 @@ export function table(columns, names) { * @example from([ { colA: 1, colB: 2 }, { colA: 3, colB: 4 } ]) */ export function from(values, names) { - return ColumnTable.from(values, names); + return new ColumnTable(columnsFrom(values, names), names); } diff --git a/src/table/regroup.js b/src/table/regroup.js index efeaa1b0..f540152e 100644 --- a/src/table/regroup.js +++ b/src/table/regroup.js @@ -1,18 +1,21 @@ import { array_agg, entries_agg, map_agg, object_agg } from '../op/op-api.js'; import error from '../util/error.js'; import uniqueName from '../util/unique-name.js'; +import { groupby } from '../verbs/groupby.js'; +import { rollup } from '../verbs/rollup.js'; +import { select } from '../verbs/select.js'; /** * Regroup table rows in response to a BitSet filter. - * @param {GroupBySpec} groups The current groupby specification. - * @param {BitSet} filter The filter to apply. + * @param {import('./types.js').GroupBySpec} groups The current groupby specification. + * @param {import('./BitSet.js').BitSet} filter The filter to apply. */ export function regroup(groups, filter) { if (!groups || !filter) return groups; // check for presence of rows for each group const { keys, rows, size } = groups; - const map = new Int32Array(size); + const map = new Uint32Array(size); filter.scan(row => map[keys[row]] = 1); // check sum, exit early if all groups occur @@ -36,7 +39,8 @@ export function regroup(groups, filter) { /** * Regroup table rows in response to a re-indexing. * This operation may or may not involve filtering of rows. - * @param {GroupBySpec} groups The current groupby specification. + * @param {import('./types.js').GroupBySpec} groups + * The current groupby specification. * @param {Function} scan Function to scan new row indices. * @param {boolean} filter Flag indicating if filtering may occur. * @param {number} nrows The number of rows in the new table. @@ -86,17 +90,16 @@ export function nest(table, idx, obj, type) { // create table with one column of row objects // then aggregate into per-group arrays - let t = table - .select() - .reify(idx) - .create({ data: { [col]: obj } }) - .rollup({ [col]: array_agg(col) }); + let t = select(table, {}).reify(idx).create({ data: { [col]: obj } }); + t = rollup(t, { [col]: array_agg(col) }); // create nested structures for each level of grouping for (let i = names.length; --i >= 0;) { - t = t - .groupby(names.slice(0, i)) - .rollup({ [col]: agg(names[i], col) }); + t = rollup( + groupby(t, names.slice(0, i)), + // @ts-ignore + { [col]: agg(names[i], col) } + ); } // return the final aggregated structure diff --git a/src/table/table.js b/src/table/table.js deleted file mode 100644 index d2097bc7..00000000 --- a/src/table/table.js +++ /dev/null @@ -1,607 +0,0 @@ -import Transformable from './transformable.js'; -import error from '../util/error.js'; -import isNumber from '../util/is-number.js'; -import repeat from '../util/repeat.js'; - -/** - * Abstract class representing a data table. - */ -export default class Table extends Transformable { - - /** - * Instantiate a new Table instance. - * @param {string[]} names An ordered list of column names. - * @param {number} nrows The number of rows. - * @param {TableData} data The backing data, which can vary by implementation. - * @param {BitSet} [filter] A bit mask for which rows to include. - * @param {GroupBySpec} [groups] A groupby specification for grouping ows. - * @param {RowComparator} [order] A comparator function for sorting rows. - * @param {Params} [params] Parameter values for table expressions. - */ - constructor(names, nrows, data, filter, groups, order, params) { - super(params); - this._names = Object.freeze(names); - this._data = data; - this._total = nrows; - this._nrows = filter ? filter.count() : nrows; - this._mask = (nrows !== this._nrows && filter) || null; - this._group = groups || null; - this._order = order || null; - } - - /** - * Create a new table with the same type as this table. - * The new table may have different data, filter, grouping, or ordering - * based on the values of the optional configuration argument. If a - * setting is not specified, it is inherited from the current table. - * @param {CreateOptions} [options] Creation options for the new table. - * @return {this} A newly created table. - */ - create(options) { // eslint-disable-line no-unused-vars - error('Not implemented'); - } - - /** - * Provide an informative object string tag. - */ - get [Symbol.toStringTag]() { - if (!this._names) return 'Object'; // bail if called on prototype - const nr = this.numRows() + ' row' + (this.numRows() !== 1 ? 's' : ''); - const nc = this.numCols() + ' col' + (this.numCols() !== 1 ? 's' : ''); - return `Table: ${nc} x ${nr}` - + (this.isFiltered() ? ` (${this.totalRows()} backing)` : '') - + (this.isGrouped() ? `, ${this._group.size} groups` : '') - + (this.isOrdered() ? ', ordered' : ''); - } - - /** - * Indicates if the table has a filter applied. - * @return {boolean} True if filtered, false otherwise. - */ - isFiltered() { - return !!this._mask; - } - - /** - * Indicates if the table has a groupby specification. - * @return {boolean} True if grouped, false otherwise. - */ - isGrouped() { - return !!this._group; - } - - /** - * Indicates if the table has a row order comparator. - * @return {boolean} True if ordered, false otherwise. - */ - isOrdered() { - return !!this._order; - } - - /** - * Returns the internal table storage data structure. - * @return {TableData} The backing table storage data structure. - */ - data() { - return this._data; - } - - /** - * Returns the filter bitset mask, if defined. - * @return {BitSet} The filter bitset mask. - */ - mask() { - return this._mask; - } - - /** - * Returns the groupby specification, if defined. - * @return {GroupBySpec} The groupby specification. - */ - groups() { - return this._group; - } - - /** - * Returns the row order comparator function, if specified. - * @return {RowComparator} The row order comparator function. - */ - comparator() { - return this._order; - } - - /** - * The total number of rows in this table, counting both - * filtered and unfiltered rows. - * @return {number} The number of total rows. - */ - totalRows() { - return this._total; - } - - /** - * The number of active rows in this table. This number may be - * less than the total rows if the table has been filtered. - * @see Table.totalRows - * @return {number} The number of rows. - */ - numRows() { - return this._nrows; - } - - /** - * The number of active rows in this table. This number may be - * less than the total rows if the table has been filtered. - * @see Table.totalRows - * @return {number} The number of rows. - */ - get size() { - return this._nrows; - } - - /** - * The number of columns in this table. - * @return {number} The number of columns. - */ - numCols() { - return this._names.length; - } - - /** - * Filter function invoked for each column name. - * @callback NameFilter - * @param {string} name The column name. - * @param {number} index The column index. - * @param {string[]} array The array of names. - * @return {boolean} Returns true to retain the column name. - */ - - /** - * The table column names, optionally filtered. - * @param {NameFilter} [filter] An optional filter function. - * If unspecified, all column names are returned. - * @return {string[]} An array of matching column names. - */ - columnNames(filter) { - return filter ? this._names.filter(filter) : this._names.slice(); - } - - /** - * The column name at the given index. - * @param {number} index The column index. - * @return {string} The column name, - * or undefined if the index is out of range. - */ - columnName(index) { - return this._names[index]; - } - - /** - * The column index for the given name. - * @param {string} name The column name. - * @return {number} The column index, or -1 if the name is not found. - */ - columnIndex(name) { - return this._names.indexOf(name); - } - - /** - * Deprecated alias for the table array() method: use table.array() - * instead. Get an array of values contained in a column. The resulting - * array respects any table filter or orderby criteria. - * @param {string} name The column name. - * @param {ArrayConstructor|TypedArrayConstructor} [constructor=Array] - * The array constructor for instantiating the output array. - * @return {DataValue[]|TypedArray} The array of column values. - */ - columnArray(name, constructor) { - return this.array(name, constructor); - } - - /** - * Get an array of values contained in a column. The resulting array - * respects any table filter or orderby criteria. - * @param {string} name The column name. - * @param {ArrayConstructor|TypedArrayConstructor} [constructor=Array] - * The array constructor for instantiating the output array. - * @return {DataValue[]|TypedArray} The array of column values. - */ - array(name, constructor) { // eslint-disable-line no-unused-vars - error('Not implemented'); - } - - /** - * Returns an iterator over column values. - * @return {Iterator} An iterator over row objects. - */ - *values(name) { - const get = this.getter(name); - const n = this.numRows(); - for (let i = 0; i < n; ++i) { - yield get(i); - } - } - - /** - * Get the value for the given column and row. - * @param {string} name The column name. - * @param {number} [row=0] The row index, defaults to zero if not specified. - * @return {DataValue} The data value at (column, row). - */ - get(name, row = 0) { // eslint-disable-line no-unused-vars - error('Not implemented'); - } - - /** - * Returns an accessor ("getter") function for a column. The returned - * function takes a row index as its single argument and returns the - * corresponding column value. - * @param {string} name The column name. - * @return {ColumnGetter} The column getter function. - */ - getter(name) { // eslint-disable-line no-unused-vars - error('Not implemented'); - } - - /** - * Returns an array of objects representing table rows. - * @param {ObjectsOptions} [options] The options for row object generation. - * @return {RowObject[]} An array of row objects. - */ - objects(options) { // eslint-disable-line no-unused-vars - error('Not implemented'); - } - - /** - * Returns an object representing a table row. - * @param {number} [row=0] The row index, defaults to zero if not specified. - * @return {object} A row object with named properties for each column. - */ - object(row) { // eslint-disable-line no-unused-vars - error('Not implemented'); - } - - /** - * Returns an iterator over objects representing table rows. - * @return {Iterator} An iterator over row objects. - */ - [Symbol.iterator]() { - error('Not implemented'); - } - - /** - * Print the contents of this table using the console.table() method. - * @param {PrintOptions|number} options The options for row object - * generation, determining which rows and columns are printed. If - * number-valued, specifies the row limit. - * @return {this} The table instance. - */ - print(options = {}) { - if (isNumber(options)) { - options = { limit: options }; - } else if (options.limit == null) { - options.limit = 10; - } - - const obj = this.objects({ ...options, grouped: false }); - const msg = `${this[Symbol.toStringTag]}. Showing ${obj.length} rows.`; - - console.log(msg); // eslint-disable-line no-console - console.table(obj); // eslint-disable-line no-console - return this; - } - - /** - * Returns an array of indices for all rows passing the table filter. - * @param {boolean} [order=true] A flag indicating if the returned - * indices should be sorted if this table is ordered. If false, the - * returned indices may or may not be sorted. - * @return {Uint32Array} An array of row indices. - */ - indices(order = true) { - if (this._index) return this._index; - - const n = this.numRows(); - const index = new Uint32Array(n); - const ordered = this.isOrdered(); - const bits = this.mask(); - let row = -1; - - // inline the following for performance: - // this.scan(row => index[++i] = row); - if (bits) { - for (let i = bits.next(0); i >= 0; i = bits.next(i + 1)) { - index[++row] = i; - } - } else { - for (let i = 0; i < n; ++i) { - index[++row] = i; - } - } - - // sort index vector - if (order && ordered) { - const compare = this._order; - const data = this._data; - index.sort((a, b) => compare(a, b, data)); - } - - // save indices if they reflect table metadata - if (order || !ordered) { - this._index = index; - } - - return index; - } - - /** - * Returns an array of indices for each group in the table. - * If the table is not grouped, the result is the same as - * {@link indices}, but wrapped within an array. - * @param {boolean} [order=true] A flag indicating if the returned - * indices should be sorted if this table is ordered. If false, the - * returned indices may or may not be sorted. - * @return {number[][]} An array of row index arrays, one per group. - * The indices will be filtered if the table is filtered. - */ - partitions(order = true) { - // return partitions if already generated - if (this._partitions) { - return this._partitions; - } - - // if not grouped, return a single partition - if (!this.isGrouped()) { - return [ this.indices(order) ]; - } - - // generate partitions - const { keys, size } = this._group; - const part = repeat(size, () => []); - - // populate partitions, don't sort if indices don't exist - // inline the following for performance: - // this.scan(row => part[keys[row]].push(row), sort); - const sort = this._index; - const bits = this.mask(); - const n = this.numRows(); - if (sort && this.isOrdered()) { - for (let i = 0, r; i < n; ++i) { - r = sort[i]; - part[keys[r]].push(r); - } - } else if (bits) { - for (let i = bits.next(0); i >= 0; i = bits.next(i + 1)) { - part[keys[i]].push(i); - } - } else { - for (let i = 0; i < n; ++i) { - part[keys[i]].push(i); - } - } - - // if ordered but not yet sorted, sort partitions directly - if (order && !sort && this.isOrdered()) { - const compare = this._order; - const data = this._data; - for (let i = 0; i < size; ++i) { - part[i].sort((a, b) => compare(a, b, data)); - } - } - - // save partitions if they reflect table metadata - if (order || !this.isOrdered()) { - this._partitions = part; - } - - return part; - } - - /** - * Callback function to cancel a table scan. - * @callback ScanStop - * @return {void} - */ - - /** - * Callback function invoked for each row of a table scan. - * @callback ScanVisitor - * @param {number} [row] The table row index. - * @param {TableData} [data] The backing table data store. - * @param {ScanStop} [stop] Function to stop the scan early. - * Callees can invoke this function to prevent future calls. - * @return {void} - */ - - /** - * Perform a table scan, visiting each row of the table. - * If this table is filtered, only rows passing the filter are visited. - * @param {ScanVisitor} fn Callback invoked for each row of the table. - * @param {boolean} [order=false] Indicates if the table should be - * scanned in the order determined by {@link Table#orderby}. This - * argument has no effect if the table is unordered. - * @property {number} [limit=Infinity] The maximum number of objects to create. - * @property {number} [offset=0] The row offset indicating how many initial rows to skip. - */ - scan(fn, order, limit = Infinity, offset = 0) { - const filter = this._mask; - const nrows = this._nrows; - const data = this._data; - - let i = offset || 0; - if (i > nrows) return; - - const n = Math.min(nrows, i + limit); - const stop = () => i = this._total; - - if (order && this.isOrdered() || filter && this._index) { - const index = this.indices(); - const data = this._data; - for (; i < n; ++i) { - fn(index[i], data, stop); - } - } else if (filter) { - let c = n - i + 1; - for (i = filter.nth(i); --c && i > -1; i = filter.next(i + 1)) { - fn(i, data, stop); - } - } else { - for (; i < n; ++i) { - fn(i, data, stop); - } - } - } - - /** - * Extract rows with indices from start to end (end not included), where - * start and end represent per-group ordered row numbers in the table. - * @param {number} [start] Zero-based index at which to start extraction. - * A negative index indicates an offset from the end of the group. - * If start is undefined, slice starts from the index 0. - * @param {number} [end] Zero-based index before which to end extraction. - * A negative index indicates an offset from the end of the group. - * If end is omitted, slice extracts through the end of the group. - * @return {this} A new table with sliced rows. - * @example table.slice(1, -1) - */ - slice(start = 0, end = Infinity) { - if (this.isGrouped()) return super.slice(start, end); - - // if not grouped, scan table directly - const indices = []; - const nrows = this.numRows(); - start = Math.max(0, start + (start < 0 ? nrows : 0)); - end = Math.min(nrows, Math.max(0, end + (end < 0 ? nrows : 0))); - this.scan(row => indices.push(row), true, end - start, start); - return this.reify(indices); - } - - /** - * Reduce a table, processing all rows to produce a new table. - * To produce standard aggregate summaries, use {@link rollup}. - * This method allows the use of custom reducer implementations, - * for example to produce multiple rows for an aggregate. - * @param {Reducer} reducer The reducer to apply. - * @return {Table} A new table of reducer outputs. - */ - reduce(reducer) { - return this.__reduce(this, reducer); - } -} - -/** - * A typed array constructor. - * @typedef {Uint8ArrayConstructor|Uint16ArrayConstructor|Uint32ArrayConstructor|BigUint64ArrayConstructor|Int8ArrayConstructor|Int16ArrayConstructor|Int32ArrayConstructor|BigInt64ArrayConstructor|Float32ArrayConstructor|Float64ArrayConstructor} TypedArrayConstructor - */ - -/** - * A typed array instance. - * @typedef {Uint8Array|Uint16Array|Uint32Array|BigUint64Array|Int8Array|Int16Array|Int32Array|BigInt64Array|Float32Array|Float64Array} TypedArray - */ - -/** - * Backing table data. - * @typedef {object|Array} TableData - */ - -/** - * Table value. - * @typedef {*} DataValue - */ - -/** - * Table row object. - * @typedef {Object.} RowObject - */ - -/** - * Table expression parameters. - * @typedef {import('./transformable').Params} Params - */ - -/** - * Proxy type for BitSet class. - * @typedef {import('./bit-set').default} BitSet - */ - -/** - * Abstract class for custom aggregation operations. - * @typedef {import('../engine/reduce/reducer').default} Reducer - */ - -/** - * A table groupby specification. - * @typedef {object} GroupBySpec - * @property {number} size The number of groups. - * @property {string[]} names Column names for each group. - * @property {RowExpression[]} get Value accessor functions for each group. - * @property {number[]} rows Indices of an example table row for each group. - * @property {number[]} keys Per-row group indices, length is total rows of table. - */ - -/** - * Column value accessor. - * @callback ColumnGetter - * @param {number} [row] The table row. - * @return {DataValue} - */ - -/** - * An expression evaluated over a table row. - * @callback RowExpression - * @param {number} [row] The table row. - * @param {TableData} [data] The backing table data store. - * @return {DataValue} - */ - -/** - * Comparator function for sorting table rows. - * @callback RowComparator - * @param {number} rowA The table row index for the first row. - * @param {number} rowB The table row index for the second row. - * @param {TableData} data The backing table data store. - * @return {number} Negative if rowA < rowB, positive if - * rowA > rowB, otherwise zero. - */ - -/** - * Options for derived table creation. - * @typedef {object} CreateOptions - * @property {TableData} [data] The backing column data. - * @property {string[]} [names] An ordered list of column names. - * @property {BitSet} [filter] An additional filter BitSet to apply. - * @property {GroupBySpec} [groups] The groupby specification to use, or null for no groups. - * @property {RowComparator} [order] The orderby comparator function to use, or null for no order. - */ - -/** - * Options for generating row objects. - * @typedef {object} PrintOptions - * @property {number} [limit=Infinity] The maximum number of objects to create. - * @property {number} [offset=0] The row offset indicating how many initial rows to skip. - * @property {import('../table/transformable').Select} [columns] - * An ordered set of columns to include. The input may consist of column name - * strings, column integer indices, objects with current column names as keys - * and new column names as values (for renaming), or selection helper - * functions such as {@link all}, {@link not}, or {@link range}. - */ - -/** - * Options for generating row objects. - * @typedef {object} ObjectsOptions - * @property {number} [limit=Infinity] The maximum number of objects to create. - * @property {number} [offset=0] The row offset indicating how many initial rows to skip. - * @property {import('../table/transformable').Select} [columns] - * An ordered set of columns to include. The input may consist of column name - * strings, column integer indices, objects with current column names as keys - * and new column names as values (for renaming), or selection helper - * functions such as {@link all}, {@link not}, or {@link range}. - * @property {'map'|'entries'|'object'|boolean} [grouped=false] - * The export format for groups of rows. The default (false) is to ignore - * groups, returning a flat array of objects. The valid values are 'map' or - * true (for Map instances), 'object' (for standard objects), or 'entries' - * (for arrays in the style of Object.entries). For the 'object' format, - * groupby keys are coerced to strings to use as object property names; note - * that this can lead to undesirable behavior if the groupby keys are object - * values. The 'map' and 'entries' options preserve the groupby key values. - */ diff --git a/src/table/transformable.js b/src/table/transformable.js deleted file mode 100644 index 97a11cbe..00000000 --- a/src/table/transformable.js +++ /dev/null @@ -1,978 +0,0 @@ -import toArray from '../util/to-array.js'; -import slice from '../helpers/slice.js'; - -/** - * Abstract base class for transforming data. - */ -export default class Transformable { - - /** - * Instantiate a new Transformable instance. - * @param {Params} [params] The parameter values. - */ - constructor(params) { - if (params) this._params = params; - } - - /** - * Get or set table expression parameter values. - * If called with no arguments, returns the current parameter values - * as an object. Otherwise, adds the provided parameters to this - * table's parameter set and returns the table. Any prior parameters - * with names matching the input parameters are overridden. - * @param {Params} [values] The parameter values. - * @return {this|Params} The current parameters values (if called with - * no arguments) or this table. - */ - params(values) { - if (arguments.length) { - if (values) { - this._params = { ...this._params, ...values }; - } - return this; - } else { - return this._params; - } - } - - /** - * Create a new fully-materialized instance of this table. - * All filter and orderby settings are removed from the new table. - * Instead, the backing data itself is filtered and ordered as needed. - * @param {number[]} [indices] Ordered row indices to materialize. - * If unspecified, all rows passing the table filter are used. - * @return {this} A reified table. - */ - reify(indices) { - return this.__reify(this, indices); - } - - // -- Transformation Verbs ------------------------------------------------ - - /** - * Count the number of values in a group. This method is a shorthand - * for {@link Transformable#rollup} with a count aggregate function. - * @param {CountOptions} [options] Options for the count. - * @return {this} A new table with groupby and count columns. - * @example table.groupby('colA').count() - * @example table.groupby('colA').count({ as: 'num' }) - */ - count(options) { - return this.__count(this, options); - } - - /** - * Derive new column values based on the provided expressions. By default, - * new columns are added after (higher indices than) existing columns. Use - * the before or after options to place new columns elsewhere. - * @param {ExprObject} values Object of name-value pairs defining the - * columns to derive. The input object should have output column - * names for keys and table expressions for values. - * @param {DeriveOptions} [options] Options for dropping or relocating - * derived columns. Use either a before or after property to indicate - * where to place derived columns. Specifying both before and after is an - * error. Unlike the relocate verb, this option affects only new columns; - * updated columns with existing names are excluded from relocation. - * @return {this} A new table with derived columns added. - * @example table.derive({ sumXY: d => d.x + d.y }) - * @example table.derive({ z: d => d.x * d.y }, { before: 'x' }) - */ - derive(values, options) { - return this.__derive(this, values, options); - } - - /** - * Filter a table to a subset of rows based on the input criteria. - * The resulting table provides a filtered view over the original data; no - * data copy is made. To create a table that copies only filtered data to - * new data structures, call {@link Transformable#reify} on the output table. - * @param {TableExpr} criteria Filter criteria as a table expression. - * Both aggregate and window functions are permitted, taking into account - * {@link Transformable#groupby} or {@link Transformable#orderby} settings. - * @return {this} A new table with filtered rows. - * @example table.filter(d => abs(d.value) < 5) - */ - filter(criteria) { - return this.__filter(this, criteria); - } - - /** - * Extract rows with indices from start to end (end not included), where - * start and end represent per-group ordered row numbers in the table. - * @param {number} [start] Zero-based index at which to start extraction. - * A negative index indicates an offset from the end of the group. - * If start is undefined, slice starts from the index 0. - * @param {number} [end] Zero-based index before which to end extraction. - * A negative index indicates an offset from the end of the group. - * If end is omitted, slice extracts through the end of the group. - * @return {this} A new table with sliced rows. - * @example table.slice(1, -1) - */ - slice(start, end) { - return this.filter(slice(start, end)).reify(); - } - - /** - * Group table rows based on a set of column values. - * Subsequent operations that are sensitive to grouping (such as - * aggregate functions) will operate over the grouped rows. - * To undo grouping, use {@link Transformable#ungroup}. - * @param {...ExprList} keys Key column values to group by. - * The keys may be specified using column name strings, column index - * numbers, value objects with output column names for keys and table - * expressions for values, or selection helper functions. - * @return {this} A new table with grouped rows. - * @example table.groupby('colA', 'colB') - * @example table.groupby({ key: d => d.colA + d.colB }) - */ - groupby(...keys) { - return this.__groupby(this, keys.flat()); - } - - /** - * Order table rows based on a set of column values. - * Subsequent operations sensitive to ordering (such as window functions) - * will operate over sorted values. - * The resulting table provides an view over the original data, without - * any copying. To create a table with sorted data copied to new data - * strucures, call {@link Transformable#reify} on the result of this method. - * To undo ordering, use {@link Transformable#unorder}. - * @param {...OrderKeys} keys Key values to sort by, in precedence order. - * By default, sorting is done in ascending order. - * To sort in descending order, wrap values using {@link desc}. - * If a string, order by the column with that name. - * If a number, order by the column with that index. - * If a function, must be a valid table expression; aggregate functions - * are permitted, but window functions are not. - * If an object, object values must be valid values parameters - * with output column names for keys and table expressions - * for values (the output names will be ignored). - * If an array, array values must be valid key parameters. - * @return {this} A new ordered table. - * @example table.orderby('a', desc('b')) - * @example table.orderby({ a: 'a', b: desc('b') )}) - * @example table.orderby(desc(d => d.a)) - */ - orderby(...keys) { - return this.__orderby(this, keys.flat()); - } - - /** - * Relocate a subset of columns to change their positions, also - * potentially renaming them. - * @param {Selection} columns An ordered selection of columns to relocate. - * The input may consist of column name strings, column integer indices, - * rename objects with current column names as keys and new column names - * as values, or functions that take a table as input and returns a valid - * selection parameter (typically the output of selection helper functions - * such as {@link all}, {@link not}, or {@link range}). - * @param {RelocateOptions} options Options for relocating. Must include - * either the before or after property to indicate where to place the - * relocated columns. Specifying both before and after is an error. - * @return {this} A new table with relocated columns. - * @example table.relocate(['colY', 'colZ'], { after: 'colX' }) - * @example table.relocate(not('colB', 'colC'), { before: 'colA' }) - * @example table.relocate({ colA: 'newA', colB: 'newB' }, { after: 'colC' }) - */ - relocate(columns, options) { - return this.__relocate(this, toArray(columns), options); - } - - /** - * Rename one or more columns, preserving column order. - * @param {...Select} columns One or more rename objects with current - * column names as keys and new column names as values. - * @return {this} A new table with renamed columns. - * @example table.rename({ oldName: 'newName' }) - * @example table.rename({ a: 'a2', b: 'b2' }) - */ - rename(...columns) { - return this.__rename(this, columns.flat()); - } - - /** - * Rollup a table to produce an aggregate summary. - * Often used in conjunction with {@link Transformable#groupby}. - * To produce counts only, {@link Transformable#count} is a shortcut. - * @param {ExprObject} [values] Object of name-value pairs defining aggregate - * output columns. The input object should have output column names for - * keys and table expressions for values. The expressions must be valid - * aggregate expressions: window functions are not allowed and column - * references must be arguments to aggregate functions. - * @return {this} A new table of aggregate summary values. - * @example table.groupby('colA').rollup({ mean: d => mean(d.colB) }) - * @example table.groupby('colA').rollup({ mean: op.median('colB') }) - */ - rollup(values) { - return this.__rollup(this, values); - } - - /** - * Generate a table from a random sample of rows. - * If the table is grouped, performs a stratified sample by - * sampling from each group separately. - * @param {number|TableExpr} size The number of samples to draw per group. - * If number-valued, the same sample size is used for each group. - * If function-valued, the input should be an aggregate table - * expression compatible with {@link Transformable#rollup}. - * @param {SampleOptions} [options] Options for sampling. - * @return {this} A new table with sampled rows. - * @example table.sample(50) - * @example table.sample(100, { replace: true }) - * @example table.groupby('colA').sample(() => op.floor(0.5 * op.count())) - */ - sample(size, options) { - return this.__sample(this, size, options); - } - - /** - * Select a subset of columns into a new table, potentially renaming them. - * @param {...Select} columns An ordered selection of columns. - * The input may consist of column name strings, column integer indices, - * rename objects with current column names as keys and new column names - * as values, or functions that take a table as input and returns a valid - * selection parameter (typically the output of selection helper functions - * such as {@link all}, {@link not}, or {@link range}). - * @return {this} A new table of selected columns. - * @example table.select('colA', 'colB') - * @example table.select(not('colB', 'colC')) - * @example table.select({ colA: 'newA', colB: 'newB' }) - */ - select(...columns) { - return this.__select(this, columns.flat()); - } - - /** - * Ungroup a table, removing any grouping criteria. - * Undoes the effects of {@link Transformable#groupby}. - * @return {this} A new ungrouped table, or this table if not grouped. - * @example table.ungroup() - */ - ungroup() { - return this.__ungroup(this); - } - - /** - * Unorder a table, removing any sorting criteria. - * Undoes the effects of {@link Transformable#orderby}. - * @return {this} A new unordered table, or this table if not ordered. - * @example table.unorder() - */ - unorder() { - return this.__unorder(this); - } - - // -- Cleaning Verbs ------------------------------------------------------ - - /** - * De-duplicate table rows by removing repeated row values. - * @param {...ExprList} keys Key columns to check for duplicates. - * Two rows are considered duplicates if they have matching values for - * all keys. If keys are unspecified, all columns are used. - * The keys may be specified using column name strings, column index - * numbers, value objects with output column names for keys and table - * expressions for values, or selection helper functions. - * @return {this} A new de-duplicated table. - * @example table.dedupe() - * @example table.dedupe('a', 'b') - * @example table.dedupe({ abs: d => op.abs(d.a) }) - */ - dedupe(...keys) { - return this.__dedupe(this, keys.flat()); - } - - /** - * Impute missing values or rows. Accepts a set of column-expression pairs - * and evaluates the expressions to replace any missing (null, undefined, - * or NaN) values in the original column. - * If the expand option is specified, imputes new rows for missing - * combinations of values. All combinations of key values (a full cross - * product) are considered for each level of grouping (specified by - * {@link Transformable#groupby}). New rows will be added for any combination - * of key and groupby values not already contained in the table. For all - * non-key and non-group columns the new rows are populated with imputation - * values (first argument) if specified, otherwise undefined. - * If the expand option is specified, any filter or orderby settings are - * removed from the output table, but groupby settings persist. - * @param {ExprObject} values Object of name-value pairs for the column values - * to impute. The input object should have existing column names for keys - * and table expressions for values. The expressions will be evaluated to - * determine replacements for any missing values. - * @param {ImputeOptions} [options] Imputation options. The expand - * property specifies a set of column values to consider for imputing - * missing rows. All combinations of expanded values are considered, and - * new rows are added for each combination that does not appear in the - * input table. - * @return {this} A new table with imputed values and/or rows. - * @example table.impute({ v: () => 0 }) - * @example table.impute({ v: d => op.mean(d.v) }) - * @example table.impute({ v: () => 0 }, { expand: ['x', 'y'] }) - */ - impute(values, options) { - return this.__impute(this, values, options); - } - - // -- Reshaping Verbs ----------------------------------------------------- - - /** - * Fold one or more columns into two key-value pair columns. - * The fold transform is an inverse of the {@link Transformable#pivot} transform. - * The resulting table has two new columns, one containing the column - * names (named "key") and the other the column values (named "value"). - * The number of output rows equals the original row count multiplied - * by the number of folded columns. - * @param {ExprList} values The columns to fold. - * The columns may be specified using column name strings, column index - * numbers, value objects with output column names for keys and table - * expressions for values, or selection helper functions. - * @param {FoldOptions} [options] Options for folding. - * @return {this} A new folded table. - * @example table.fold('colA') - * @example table.fold(['colA', 'colB']) - * @example table.fold(range(5, 8)) - */ - fold(values, options) { - return this.__fold(this, values, options); - } - - /** - * Pivot columns into a cross-tabulation. - * The pivot transform is an inverse of the {@link Transformable#fold} transform. - * The resulting table has new columns for each unique combination - * of the provided *keys*, populated with the provided *values*. - * The provided *values* must be aggregates, as a single set of keys may - * include more than one row. If string-valued, the *any* aggregate is used. - * If only one *values* column is defined, the new pivoted columns will - * be named using key values directly. Otherwise, input value column names - * will be included as a component of the output column names. - * @param {ExprList} keys Key values to map to new column names. - * The keys may be specified using column name strings, column index - * numbers, value objects with output column names for keys and table - * expressions for values, or selection helper functions. - * @param {ExprList} values Output values for pivoted columns. - * Column references will be wrapped in an *any* aggregate. - * If object-valued, the input object should have output value - * names for keys and aggregate table expressions for values. - * @param {PivotOptions} [options] Options for pivoting. - * @return {this} A new pivoted table. - * @example table.pivot('key', 'value') - * @example table.pivot(['keyA', 'keyB'], ['valueA', 'valueB']) - * @example table.pivot({ key: d => d.key }, { value: d => sum(d.value) }) - */ - pivot(keys, values, options) { - return this.__pivot(this, keys, values, options); - } - - /** - * Spread array elements into a set of new columns. - * Output columns are named based on the value key and array index. - * @param {ExprList} values The column values to spread. - * The values may be specified using column name strings, column index - * numbers, value objects with output column names for keys and table - * expressions for values, or selection helper functions. - * @param {SpreadOptions} [options] Options for spreading. - * @return {this} A new table with the spread columns added. - * @example table.spread({ a: split(d.text, '') }) - * @example table.spread('arrayCol', { limit: 100 }) - */ - spread(values, options) { - return this.__spread(this, values, options); - } - - /** - * Unroll one or more array-valued columns into new rows. - * If more than one array value is used, the number of new rows - * is the smaller of the limit and the largest length. - * Values for all other columns are copied over. - * @param {ExprList} values The column values to unroll. - * The values may be specified using column name strings, column index - * numbers, value objects with output column names for keys and table - * expressions for values, or selection helper functions. - * @param {UnrollOptions} [options] Options for unrolling. - * @return {this} A new unrolled table. - * @example table.unroll('colA', { limit: 1000 }) - */ - unroll(values, options) { - return this.__unroll(this, values, options); - } - - // -- Joins --------------------------------------------------------------- - - /** - * Lookup values from a secondary table and add them as new columns. - * A lookup occurs upon matching key values for rows in both tables. - * If the secondary table has multiple rows with the same key, only - * the last observed instance will be considered in the lookup. - * Lookup is similar to {@link Transformable#join_left}, but with a simpler - * syntax and the added constraint of allowing at most one match only. - * @param {TableRef} other The secondary table to look up values from. - * @param {JoinKeys} [on] Lookup keys (column name strings or table - * expressions) for this table and the secondary table, respectively. - * @param {...ExprList} values The column values to add from the - * secondary table. Can be column name strings or objects with column - * names as keys and table expressions as values. - * @return {this} A new table with lookup values added. - * @example table.lookup(other, ['key1', 'key2'], 'value1', 'value2') - */ - lookup(other, on, ...values) { - return this.__lookup(this, other, on, values.flat()); - } - - /** - * Join two tables, extending the columns of one table with - * values from the other table. The current table is considered - * the "left" table in the join, and the new table input is - * considered the "right" table in the join. By default an inner - * join is performed, removing all rows that do not match the - * join criteria. To perform left, right, or full outer joins, use - * the {@link Transformable#join_left}, {@link Transformable#join_right}, or - * {@link Transformable#join_full} methods, or provide an options argument. - * @param {TableRef} other The other (right) table to join with. - * @param {JoinPredicate} [on] The join criteria for matching table rows. - * If unspecified, the values of all columns with matching names - * are compared. - * If array-valued, a two-element array should be provided, containing - * the columns to compare for the left and right tables, respectively. - * If a one-element array or a string value is provided, the same - * column names will be drawn from both tables. - * If function-valued, should be a two-table table expression that - * returns a boolean value. When providing a custom predicate, note that - * join key values can be arrays or objects, and that normal join - * semantics do not consider null or undefined values to be equal (that is, - * null !== null). Use the op.equal function to handle these cases. - * @param {JoinValues} [values] The columns to include in the join output. - * If unspecified, all columns from both tables are included; paired - * join keys sharing the same column name are included only once. - * If array-valued, a two element array should be provided, containing - * the columns to include for the left and right tables, respectively. - * Array input may consist of column name strings, objects with output - * names as keys and single-table table expressions as values, or the - * selection helper functions {@link all}, {@link not}, or {@link range}. - * If object-valued, specifies the key-value pairs for each output, - * defined using two-table table expressions. - * @param {JoinOptions} [options] Options for the join. - * @return {this} A new joined table. - * @example table.join(other, ['keyL', 'keyR']) - * @example table.join(other, (a, b) => equal(a.keyL, b.keyR)) - */ - join(other, on, values, options) { - return this.__join(this, other, on, values, options); - } - - /** - * Perform a left outer join on two tables. Rows in the left table - * that do not match a row in the right table will be preserved. - * This is a convenience method with fixed options for {@link Transformable#join}. - * @param {TableRef} other The other (right) table to join with. - * @param {JoinPredicate} [on] The join criteria for matching table rows. - * If unspecified, the values of all columns with matching names - * are compared. - * If array-valued, a two-element array should be provided, containing - * the columns to compare for the left and right tables, respectively. - * If a one-element array or a string value is provided, the same - * column names will be drawn from both tables. - * If function-valued, should be a two-table table expression that - * returns a boolean value. When providing a custom predicate, note that - * join key values can be arrays or objects, and that normal join - * semantics do not consider null or undefined values to be equal (that is, - * null !== null). Use the op.equal function to handle these cases. - * @param {JoinValues} [values] The columns to include in the join output. - * If unspecified, all columns from both tables are included; paired - * join keys sharing the same column name are included only once. - * If array-valued, a two element array should be provided, containing - * the columns to include for the left and right tables, respectively. - * Array input may consist of column name strings, objects with output - * names as keys and single-table table expressions as values, or the - * selection helper functions {@link all}, {@link not}, or {@link range}. - * If object-valued, specifies the key-value pairs for each output, - * defined using two-table table expressions. - * @param {JoinOptions} [options] Options for the join. With this method, - * any options will be overridden with {left: true, right: false}. - * @return {this} A new joined table. - * @example table.join_left(other, ['keyL', 'keyR']) - * @example table.join_left(other, (a, b) => equal(a.keyL, b.keyR)) - */ - join_left(other, on, values, options) { - const opt = { ...options, left: true, right: false }; - return this.__join(this, other, on, values, opt); - } - - /** - * Perform a right outer join on two tables. Rows in the right table - * that do not match a row in the left table will be preserved. - * This is a convenience method with fixed options for {@link Transformable#join}. - * @param {TableRef} other The other (right) table to join with. - * @param {JoinPredicate} [on] The join criteria for matching table rows. - * If unspecified, the values of all columns with matching names - * are compared. - * If array-valued, a two-element array should be provided, containing - * the columns to compare for the left and right tables, respectively. - * If a one-element array or a string value is provided, the same - * column names will be drawn from both tables. - * If function-valued, should be a two-table table expression that - * returns a boolean value. When providing a custom predicate, note that - * join key values can be arrays or objects, and that normal join - * semantics do not consider null or undefined values to be equal (that is, - * null !== null). Use the op.equal function to handle these cases. - * @param {JoinValues} [values] The columns to include in the join output. - * If unspecified, all columns from both tables are included; paired - * join keys sharing the same column name are included only once. - * If array-valued, a two element array should be provided, containing - * the columns to include for the left and right tables, respectively. - * Array input may consist of column name strings, objects with output - * names as keys and single-table table expressions as values, or the - * selection helper functions {@link all}, {@link not}, or {@link range}. - * If object-valued, specifies the key-value pairs for each output, - * defined using two-table table expressions. - * @param {JoinOptions} [options] Options for the join. With this method, - * any options will be overridden with {left: false, right: true}. - * @return {this} A new joined table. - * @example table.join_right(other, ['keyL', 'keyR']) - * @example table.join_right(other, (a, b) => equal(a.keyL, b.keyR)) - */ - join_right(other, on, values, options) { - const opt = { ...options, left: false, right: true }; - return this.__join(this, other, on, values, opt); - } - - /** - * Perform a full outer join on two tables. Rows in either the left or - * right table that do not match a row in the other will be preserved. - * This is a convenience method with fixed options for {@link Transformable#join}. - * @param {TableRef} other The other (right) table to join with. - * @param {JoinPredicate} [on] The join criteria for matching table rows. - * If unspecified, the values of all columns with matching names - * are compared. - * If array-valued, a two-element array should be provided, containing - * the columns to compare for the left and right tables, respectively. - * If a one-element array or a string value is provided, the same - * column names will be drawn from both tables. - * If function-valued, should be a two-table table expression that - * returns a boolean value. When providing a custom predicate, note that - * join key values can be arrays or objects, and that normal join - * semantics do not consider null or undefined values to be equal (that is, - * null !== null). Use the op.equal function to handle these cases. - * @param {JoinValues} [values] The columns to include in the join output. - * If unspecified, all columns from both tables are included; paired - * join keys sharing the same column name are included only once. - * If array-valued, a two element array should be provided, containing - * the columns to include for the left and right tables, respectively. - * Array input may consist of column name strings, objects with output - * names as keys and single-table table expressions as values, or the - * selection helper functions {@link all}, {@link not}, or {@link range}. - * If object-valued, specifies the key-value pairs for each output, - * defined using two-table table expressions. - * @param {JoinOptions} [options] Options for the join. With this method, - * any options will be overridden with {left: true, right: true}. - * @return {this} A new joined table. - * @example table.join_full(other, ['keyL', 'keyR']) - * @example table.join_full(other, (a, b) => equal(a.keyL, b.keyR)) - */ - join_full(other, on, values, options) { - const opt = { ...options, left: true, right: true }; - return this.__join(this, other, on, values, opt); - } - - /** - * Produce the Cartesian cross product of two tables. The output table - * has one row for every pair of input table rows. Beware that outputs - * may be quite large, as the number of output rows is the product of - * the input row counts. - * This is a convenience method for {@link Transformable#join} in which the - * join criteria is always true. - * @param {TableRef} other The other (right) table to join with. - * @param {JoinValues} [values] The columns to include in the output. - * If unspecified, all columns from both tables are included. - * If array-valued, a two element array should be provided, containing - * the columns to include for the left and right tables, respectively. - * Array input may consist of column name strings, objects with output - * names as keys and single-table table expressions as values, or the - * selection helper functions {@link all}, {@link not}, or {@link range}. - * If object-valued, specifies the key-value pairs for each output, - * defined using two-table table expressions. - * @param {JoinOptions} [options] Options for the join. - * @return {this} A new joined table. - * @example table.cross(other) - * @example table.cross(other, [['leftKey', 'leftVal'], ['rightVal']]) - */ - cross(other, values, options) { - return this.__cross(this, other, values, options); - } - - /** - * Perform a semi-join, filtering the left table to only rows that - * match a row in the right table. - * @param {TableRef} other The other (right) table to join with. - * @param {JoinPredicate} [on] The join criteria for matching table rows. - * If unspecified, the values of all columns with matching names - * are compared. - * If array-valued, a two-element array should be provided, containing - * the columns to compare for the left and right tables, respectively. - * If a one-element array or a string value is provided, the same - * column names will be drawn from both tables. - * If function-valued, should be a two-table table expression that - * returns a boolean value. When providing a custom predicate, note that - * join key values can be arrays or objects, and that normal join - * semantics do not consider null or undefined values to be equal (that is, - * null !== null). Use the op.equal function to handle these cases. - * @return {this} A new filtered table. - * @example table.semijoin(other) - * @example table.semijoin(other, ['keyL', 'keyR']) - * @example table.semijoin(other, (a, b) => equal(a.keyL, b.keyR)) - */ - semijoin(other, on) { - return this.__semijoin(this, other, on); - } - - /** - * Perform an anti-join, filtering the left table to only rows that - * do *not* match a row in the right table. - * @param {TableRef} other The other (right) table to join with. - * @param {JoinPredicate} [on] The join criteria for matching table rows. - * If unspecified, the values of all columns with matching names - * are compared. - * If array-valued, a two-element array should be provided, containing - * the columns to compare for the left and right tables, respectively. - * If a one-element array or a string value is provided, the same - * column names will be drawn from both tables. - * If function-valued, should be a two-table table expression that - * returns a boolean value. When providing a custom predicate, note that - * join key values can be arrays or objects, and that normal join - * semantics do not consider null or undefined values to be equal (that is, - * null !== null). Use the op.equal function to handle these cases. - * @return {this} A new filtered table. - * @example table.antijoin(other) - * @example table.antijoin(other, ['keyL', 'keyR']) - * @example table.antijoin(other, (a, b) => equal(a.keyL, b.keyR)) - */ - antijoin(other, on) { - return this.__antijoin(this, other, on); - } - - // -- Set Operations ------------------------------------------------------ - - /** - * Concatenate multiple tables into a single table, preserving all rows. - * This transformation mirrors the UNION_ALL operation in SQL. - * Only named columns in this table are included in the output. - * @see Transformable#union - * @param {...TableRef} tables A list of tables to concatenate. - * @return {this} A new concatenated table. - * @example table.concat(other) - * @example table.concat(other1, other2) - * @example table.concat([other1, other2]) - */ - concat(...tables) { - return this.__concat(this, tables.flat()); - } - - /** - * Union multiple tables into a single table, deduplicating all rows. - * This transformation mirrors the UNION operation in SQL. It is - * similar to {@link Transformable#concat} but suppresses duplicate rows with - * values identical to another row. - * Only named columns in this table are included in the output. - * @see Transformable#concat - * @param {...TableRef} tables A list of tables to union. - * @return {this} A new unioned table. - * @example table.union(other) - * @example table.union(other1, other2) - * @example table.union([other1, other2]) - */ - union(...tables) { - return this.__union(this, tables.flat()); - } - - /** - * Intersect multiple tables, keeping only rows whose with identical - * values for all columns in all tables, and deduplicates the rows. - * This transformation is similar to a series of {@link Transformable#semijoin} - * calls, but additionally suppresses duplicate rows. - * @see Transformable#semijoin - * @param {...TableRef} tables A list of tables to intersect. - * @return {this} A new filtered table. - * @example table.intersect(other) - * @example table.intersect(other1, other2) - * @example table.intersect([other1, other2]) - */ - intersect(...tables) { - return this.__intersect(this, tables.flat()); - } - - /** - * Compute the set difference with multiple tables, keeping only rows in - * this table that whose values do not occur in the other tables. - * This transformation is similar to a series of {@link Transformable#antijoin} - * calls, but additionally suppresses duplicate rows. - * @see Transformable#antijoin - * @param {...TableRef} tables A list of tables to difference. - * @return {this} A new filtered table. - * @example table.except(other) - * @example table.except(other1, other2) - * @example table.except([other1, other2]) - */ - except(...tables) { - return this.__except(this, tables.flat()); - } -} - -// -- Parameter Types ------------------------------------------------------- - -/** - * Table expression parameters. - * @typedef {Object.} Params - */ - -/** - * A reference to a column by string name or integer index. - * @typedef {string|number} ColumnRef - */ - -/** - * A value that can be coerced to a string. - * @typedef {object} Stringable - * @property {() => string} toString String coercion method. - */ - -/** - * A table expression provided as a string or string-coercible value. - * @typedef {string|Stringable} TableExprString - */ - -/** - * A struct object with arbitraty named properties. - * @typedef {Object.} Struct - */ - -/** - * A function defined over a table row. - * @typedef {(d?: Struct, $?: Params) => any} TableExprFunc - */ - -/** - * A table expression defined over a single table. - * @typedef {TableExprFunc|TableExprString} TableExpr - */ - -/** - * A function defined over rows from two tables. - * @typedef {(a?: Struct, b?: Struct, $?: Params) => any} TableExprFunc2 - */ - -/** - * A table expression defined over two tables. - * @typedef {TableExprFunc2|TableExprString} TableExpr2 - */ - -/** - * An object that maps current column names to new column names. - * @typedef {{ [name: string]: string }} RenameMap - */ - -/** - * A selection helper function. - * @typedef {(table: any) => string[]} SelectHelper - */ - -/** - * One or more column selections, potentially with renaming. - * The input may consist of a column name string, column integer index, a - * rename map object with current column names as keys and new column names - * as values, or a select helper function that takes a table as input and - * returns a valid selection parameter. - * @typedef {ColumnRef|RenameMap|SelectHelper} SelectEntry - */ - -/** - * An ordered set of column selections, potentially with renaming. - * @typedef {SelectEntry|SelectEntry[]} Select - */ - -/** - * An object of column name / table expression pairs. - * @typedef {{ [name: string]: TableExpr }} ExprObject - */ - -/** - * An object of column name / two-table expression pairs. - * @typedef {{ [name: string]: TableExpr2 }} Expr2Object - */ - -/** - * An ordered set of one or more column values. - * @typedef {ColumnRef|SelectHelper|ExprObject} ListEntry - */ - -/** - * An ordered set of column values. - * Entries may be column name strings, column index numbers, value objects - * with output column names for keys and table expressions for values, - * or a selection helper function. - * @typedef {ListEntry|ListEntry[]} ExprList - */ - -/** - * A reference to a data table or transformable instance. - * @typedef {Transformable|string} TableRef - */ - -/** - * One or more orderby sort criteria. - * If a string, order by the column with that name. - * If a number, order by the column with that index. - * If a function, must be a valid table expression; aggregate functions - * are permitted, but window functions are not. - * If an object, object values must be valid values parameters - * with output column names for keys and table expressions - * for values. The output name keys will subsequently be ignored. - * @typedef {ColumnRef|TableExpr|ExprObject} OrderKey - */ - -/** - * An ordered set of orderby sort criteria, in precedence order. - * @typedef {OrderKey|OrderKey[]} OrderKeys - */ - -/** - * Column values to use as a join key. - * @typedef {ColumnRef|TableExprFunc} JoinKey - */ - -/** - * An ordered set of join keys. - * @typedef {JoinKey|[JoinKey[]]|[JoinKey[], JoinKey[]]} JoinKeys - */ - -/** - * A predicate specification for joining two tables. - * @typedef {JoinKeys|TableExprFunc2|null} JoinPredicate - */ - -/** - * An array of per-table join values to extract. - * @typedef {[ExprList]|[ExprList, ExprList]|[ExprList, ExprList, Expr2Object]} JoinList - */ - -/** - * A specification of join values to extract. - * @typedef {JoinList|Expr2Object} JoinValues - */ - -// -- Transform Options ----------------------------------------------------- - -/** - * Options for count transformations. - * @typedef {object} CountOptions - * @property {string} [as='count'] The name of the output count column. - */ - -/** - * Options for derive transformations. - * @typedef {object} DeriveOptions - * @property {boolean} [drop=false] A flag indicating if the original - * columns should be dropped, leaving only the derived columns. If true, - * the before and after options are ignored. - * @property {Select} [before] - * An anchor column that relocated columns should be placed before. - * The value can be any legal column selection. If multiple columns are - * selected, only the first column will be used as an anchor. - * It is an error to specify both before and after options. - * @property {Select} [after] - * An anchor column that relocated columns should be placed after. - * The value can be any legal column selection. If multiple columns are - * selected, only the last column will be used as an anchor. - * It is an error to specify both before and after options. - */ - -/** - * Options for relocate transformations. - * @typedef {object} RelocateOptions - * @property {Selection} [before] - * An anchor column that relocated columns should be placed before. - * The value can be any legal column selection. If multiple columns are - * selected, only the first column will be used as an anchor. - * It is an error to specify both before and after options. - * @property {Selection} [after] - * An anchor column that relocated columns should be placed after. - * The value can be any legal column selection. If multiple columns are - * selected, only the last column will be used as an anchor. - * It is an error to specify both before and after options. - */ - -/** - * Options for sample transformations. - * @typedef {object} SampleOptions - * @property {boolean} [replace=false] Flag for sampling with replacement. - * @property {boolean} [shuffle=true] Flag to ensure randomly ordered rows. - * @property {string|TableExprFunc} [weight] Column values to use as weights - * for sampling. Rows will be sampled with probability proportional to - * their relative weight. The input should be a column name string or - * a table expression compatible with {@link Transformable#derive}. - */ - -/** - * Options for impute transformations. - * @typedef {object} ImputeOptions - * @property {ExprList} [expand] Column values to combine to impute missing - * rows. For column names and indices, all unique column values are - * considered. Otherwise, each entry should be an object of name-expresion - * pairs, with valid table expressions for {@link Transformable#rollup}. - * All combinations of values are checked for each set of unique groupby - * values. - */ - -/** - * Options for fold transformations. - * @typedef {object} FoldOptions - * @property {string[]} [as=['key', 'value']] An array indicating the - * output column names to use for the key and value columns, respectively. - */ - -/** - * Options for pivot transformations. - * @typedef {object} PivotOptions - * @property {number} [limit=Infinity] The maximum number of new columns to generate. - * @property {string} [keySeparator='_'] A string to place between multiple key names. - * @property {string} [valueSeparator='_'] A string to place between key and value names. - * @property {boolean} [sort=true] Flag for alphabetical sorting of new column names. - */ - -/** - * Options for spread transformations. - * @typedef {object} SpreadOptions - * @property {boolean} [drop=true] Flag indicating if input columns to the - * spread operation should be dropped in the output table. - * @property {number} [limit=Infinity] The maximum number of new columns to - * generate. - * @property {string[]} [as] Output column names to use. This option only - * applies when a single column is spread. If the given array of names is - * shorter than the number of generated columns and no limit option is - * specified, the additional generated columns will be dropped. - */ - -/** - * Options for unroll transformations. - * @typedef {object} UnrollOptions - * @property {number} [limit=Infinity] The maximum number of new rows - * to generate per array value. - * @property {boolean|string} [index=false] Flag or column name for adding - * zero-based array index values as an output column. If true, a new column - * named "index" will be included. If string-valued, a new column with - * the given name will be added. - * @property {Select} [drop] Columns to drop from the output. The input may - * consist of column name strings, column integer indices, objects with - * column names as keys, or functions that take a table as input and - * return a valid selection parameter (typically the output of selection - * helper functions such as {@link all}, {@link not}, or {@link range}). - */ - -/** - * Options for join transformations. - * @typedef {object} JoinOptions - * @property {boolean} [left=false] Flag indicating a left outer join. - * If both the *left* and *right* are true, indicates a full outer join. - * @property {boolean} [right=false] Flag indicating a right outer join. - * If both the *left* and *right* are true, indicates a full outer join. - * @property {string[]} [suffix=['_1', '_2']] Column name suffixes to - * append if two columns with the same name are produced by the join. - */ diff --git a/src/table/types.ts b/src/table/types.ts new file mode 100644 index 00000000..17dfba77 --- /dev/null +++ b/src/table/types.ts @@ -0,0 +1,407 @@ +import { Table } from './Table.js'; +import { BitSet } from './BitSet.js'; + +/** A table column value. */ +export type DataValue = any; + +/** Interface for table columns. */ +export interface ColumnType { + /** The number of rows in the column. */ + length: number; + /** Retrieve the values at the given row index. */ + at(row: number): T; + /** Return a column value iterator. */ + [Symbol.iterator]() : Iterator; +} + +/** A named collection of columns. */ +export type ColumnData = Record>; + +/** Table expression parameters. */ +export type Params = Record; + +/** A typed array constructor. */ +export type TypedArrayConstructor = + | Uint8ArrayConstructor + | Uint16ArrayConstructor + | Uint32ArrayConstructor + | BigUint64ArrayConstructor + | Int8ArrayConstructor + | Int16ArrayConstructor + | Int32ArrayConstructor + | BigInt64ArrayConstructor + | Float32ArrayConstructor + | Float64ArrayConstructor; + +/** A typed array instance. */ +export type TypedArray = + | Uint8Array + | Uint16Array + | Uint32Array + | BigUint64Array + | Int8Array + | Int16Array + | Int32Array + | BigInt64Array + | Float32Array + | Float64Array; + +/** Table row object. */ +export type RowObject = Record; + +/** A table groupby specification. */ +export interface GroupBySpec { + /** The number of groups. */ + size: number; + /** Column names for each group. */ + names: string[]; + /** Value accessor functions for each group. */ + get: RowExpression[]; + /** Indices of an example table row for each group. */ + rows: number[] | Uint32Array; + /** Per-row group indices, length is total rows of table. */ + keys: number[] | Uint32Array; +} + +/** An expression evaluated over a table row. */ +export type RowExpression = ( + /** The table row. */ + row: number, + /** The backing table data store. */ + data: ColumnData +) => DataValue; + +/** Column value accessor. */ +export type ColumnGetter = ( + /** The table row. */ + row: number +) => DataValue; + +/** + * Comparator function for sorting table rows. Returns a negative value + * if rowA < rowB, positive if rowA > rowB, otherwise zero. + */ +export type RowComparator = ( + /** The table row index for the first row. */ + rowA: number, + /** The table row index for the second row. */ + rowB: number, + /** The backing table data store. */ + data: ColumnData +) => number; + +/** Options for derived table creation. */ +export interface CreateOptions { + /** The backing column data. */ + data?: ColumnData; + /** An ordered list of column names. */ + names?: readonly string[]; + /** An additional filter BitSet to apply. */ + filter?: BitSet; + /** The groupby specification to use, or null for no groups. */ + groups?: GroupBySpec; + /** The orderby comparator function to use, or null for no order. */ + order?: RowComparator +} + +/** Options for generating row objects. */ +export interface PrintOptions { + /** The maximum number of objects to create, default `Infinity`. */ + limit?: number; + /** The row offset indicating how many initial rows to skip, default `0`. */ + offset?: number; + /** + * An ordered set of columns to include. The input may consist of column + * name strings, column integer indices, objects with current column names + * as keys and new column names as values (for renaming), or selection + * helper functions such as *all*, *not*, or *range*. + */ + columns?: Select; +} + +/** Options for generating row objects. */ +export interface ObjectsOptions { + /** The maximum number of objects to create, default `Infinity`. */ + limit?: number; + /** The row offset indicating how many initial rows to skip, default `0`. */ + offset?: number; + /** + * An ordered set of columns to include. The input may consist of column + * name strings, column integer indices, objects with current column names + * as keys and new column names as values (for renaming), or selection + * helper functions such as *all*, *not*, or *range*. + */ + columns?: Select; + /** + * The export format for groups of rows. The default (false) is to ignore + * groups, returning a flat array of objects. The valid values are 'map' or + * true (for Map instances), 'object' (for standard objects), or 'entries' + * (for arrays in the style of Object.entries). For the 'object' format, + * groupby keys are coerced to strings to use as object property names; note + * that this can lead to undesirable behavior if the groupby keys are object + * values. The 'map' and 'entries' options preserve the groupby key values. + */ + grouped?: 'map' | 'entries' | 'object' | boolean; +} + +/** A reference to a column by string name or integer index. */ +export type ColumnRef = string | number; + +/** A value that can be coerced to a string. */ +export interface Stringable { + /** String coercion method. */ + toString(): string; +} + +/** A table expression provided as a string or string-coercible value. */ +export type TableExprString = string | Stringable; + +/** A struct object with arbitrary named properties. */ +export type Struct = Record; + +/** A function defined over a table row. */ +export type TableExprFunc = (d?: Struct, $?: Params) => any; + +/** A table expression defined over a single table. */ +export type TableExpr = TableExprFunc | TableExprString; + +/** A function defined over rows from two tables. */ +export type TableExprFunc2 = (a?: Struct, b?: Struct, $?: Params) => any; + +/** A table expression defined over two tables. */ +export type TableExpr2 = TableExprFunc2 | TableExprString; + +/** An object that maps current column names to new column names. */ +export type RenameMap = Record; + +/** A selection helper function. */ +export type SelectHelper = (table: Table) => string[]; + +/** + * One or more column selections, potentially with renaming. + * The input may consist of a column name string, column integer index, a + * rename map object with current column names as keys and new column names + * as values, or a select helper function that takes a table as input and + * returns a valid selection parameter. + */ +export type SelectEntry = ColumnRef | RenameMap | SelectHelper; + +/** An ordered set of column selections, potentially with renaming. */ +export type Select = SelectEntry | SelectEntry[]; + +/** An object of column name / table expression pairs. */ +export type ExprObject = Record; + +/** An object of column name / two-table expression pairs. */ +export type Expr2Object = Record; + +/** An ordered set of one or more column values. */ +export type ListEntry = ColumnRef | SelectHelper | ExprObject; + +/** + * An ordered set of column values. + * Entries may be column name strings, column index numbers, value objects + * with output column names for keys and table expressions for values, + * or a selection helper function. + */ +export type ExprList = ListEntry | ListEntry[]; + +/** A reference to a data table instance. */ +export type TableRef = Table | string; + +/** A list of one or more table references. */ +export type TableRefList = TableRef | TableRef[]; + +/** + * One or more orderby sort criteria. + * If a string, order by the column with that name. + * If a number, order by the column with that index. + * If a function, must be a valid table expression; aggregate functions + * are permitted, but window functions are not. + * If an object, object values must be valid values parameters + * with output column names for keys and table expressions + * for values. The output name keys will subsequently be ignored. + */ +export type OrderKey = ColumnRef | TableExpr | ExprObject; + +/** An ordered set of orderby sort criteria, in precedence order. */ +export type OrderKeys = OrderKey | OrderKey[]; + +/** Column values to use as a join key. */ +export type JoinKey = ColumnRef | TableExprFunc; + +/** An ordered set of join keys. */ +export type JoinKeys = + | JoinKey + | [JoinKey[]] + | [JoinKey, JoinKey] + | [JoinKey[], JoinKey[]]; + +/** A predicate specification for joining two tables. */ +export type JoinPredicate = JoinKeys | TableExprFunc2 | null; + +/** An array of per-table join values to extract. */ +export type JoinList = + | [ExprList] + | [ExprList, ExprList] + | [ExprList, ExprList, Expr2Object]; + +/** A specification of join values to extract. */ +export type JoinValues = JoinList | Expr2Object; + +// -- Transform Options ----------------------------------------------------- + +/** Options for count transformations. */ +export interface CountOptions { + /** The name of the output count column, default `count`. */ + as?: string; +} + +/** Options for derive transformations. */ +export interface DeriveOptions { + /** + * A flag (default `false`) indicating if the original columns should be + * dropped, leaving only the derived columns. If true, the before and after + * options are ignored. + */ + drop?: boolean; + /** + * An anchor column that relocated columns should be placed before. + * The value can be any legal column selection. If multiple columns are + * selected, only the first column will be used as an anchor. + * It is an error to specify both before and after options. + */ + before?: Select; + /** + * An anchor column that relocated columns should be placed after. + * The value can be any legal column selection. If multiple columns are + * selected, only the last column will be used as an anchor. + * It is an error to specify both before and after options. + */ + after?: Select; +} + +/** Options for relocate transformations. */ +export interface RelocateOptions { + /** + * An anchor column that relocated columns should be placed before. + * The value can be any legal column selection. If multiple columns are + * selected, only the first column will be used as an anchor. + * It is an error to specify both before and after options. + */ + before?: Select; + /** + * An anchor column that relocated columns should be placed after. + * The value can be any legal column selection. If multiple columns are + * selected, only the last column will be used as an anchor. + * It is an error to specify both before and after options. + */ + after?: Select; +} + +/** Options for sample transformations. */ +export interface SampleOptions { + /** Flag for sampling with replacement (default `false`). */ + replace?: boolean; + /** Flag to ensure randomly ordered rows (default `true`). */ + shuffle?: boolean; + /** + * Column values to use as weights for sampling. Rows will be sampled with + * probability proportional to their relative weight. The input should be a + * column name string or a table expression compatible with *derive*. + */ + weight?: string | TableExprFunc; +} + +/** Options for impute transformations. */ +export interface ImputeOptions { + /** + * Column values to combine to impute missing rows. For column names and + * indices, all unique column values are considered. Otherwise, each entry + * should be an object of name-expresion pairs, with valid table expressions + * for *rollup*. All combinations of values are checked for each set of + * unique groupby values. + */ + expand?: ExprList; +} + +/** Options for fold transformations. */ +export interface FoldOptions { + /** + * An array indicating the output column names to use for the key and value + * columns, respectively. The default is `['key', 'value']`. + */ + as?: string[]; +} + +/** Options for pivot transformations. */ +export interface PivotOptions { + /** The maximum number of new columns to generate (default `Infinity`). */ + limit?: number; + /** A string to place between multiple key names (default `_`); */ + keySeparator?: string; + /** A string to place between key and value names (default `_`). */ + valueSeparator?: string; + /** Flag for alphabetical sorting of new column names (default `true`). */ + sort?: boolean; +} + +/** Options for spread transformations. */ +export interface SpreadOptions { + /** + * Flag (default `true`) indicating if input columns to the + * spread operation should be dropped in the output table. + */ + drop?: boolean; + /** The maximum number of new columns to generate (default `Infinity`). */ + limit?: number; + /** + * Output column names to use. This option only applies when a single + * column is spread. If the given array of names is shorter than the + * number of generated columns and no limit option is specified, the + * additional generated columns will be dropped. + */ + as?: string[]; +} + +/** Options for unroll transformations. */ +export interface UnrollOptions { + /** + * The maximum number of new rows to generate per array value + * (default `Infinity`). + */ + limit?: number; + /** + * Flag or column name to add zero-based array index values as an output + * column (default `false`). If true, a column named "index" will be + * included. If string-valued, a column with the given name will be added. + */ + index?: boolean | string; + /** + * Columns to drop from the output. The input may consist of column name + * strings, column integer indices, objects with column names as keys, or + * functions that take a table as input and return a valid selection + * parameter (typically the output of selection helper functions such as + * *all*, *not*, or *range*. + */ + drop?: Select; +} + +/** Options for join transformations. */ +export interface JoinOptions { + /** + * Flag indicating a left outer join (default `false`). If both the + * *left* and *right* flags are true, indicates a full outer join. + */ + left?: boolean; + /** + * Flag indicating a right outer join (default `false`). If both the + * *left* and *right* flags are true, indicates a full outer join. + */ + right?: boolean; + /** + * Column name suffixes to append if two columns with the same name are + * produced by the join. The default is `['_1', '_2']`. + */ + suffix?: string[]; +} diff --git a/src/util/array-type.js b/src/util/array-type.js index e9e0b430..e7dadf13 100644 --- a/src/util/array-type.js +++ b/src/util/array-type.js @@ -1,5 +1,10 @@ import isTypedArray from './is-typed-array.js'; +/** + * @param {*} column + * @returns {ArrayConstructor | import('../table/types.js').TypedArrayConstructor} + */ export default function(column) { - return isTypedArray(column.data) ? column.data.constructor : Array; + // @ts-ignore + return isTypedArray(column) ? column.constructor : Array; } diff --git a/src/util/auto-type.js b/src/util/auto-type.js index ad668fac..d398ddbb 100644 --- a/src/util/auto-type.js +++ b/src/util/auto-type.js @@ -9,6 +9,7 @@ export default function(input) { : value === 'false' ? false : value === 'NaN' ? NaN : !isNaN(parsed = +value) ? parsed + // @ts-ignore : (parsed = parseIsoDate(value, d => new Date(d))) !== value ? parsed : input; } diff --git a/src/util/concat.js b/src/util/concat.js index 5d411cfb..447b1c21 100644 --- a/src/util/concat.js +++ b/src/util/concat.js @@ -1,4 +1,5 @@ -export default function(list, fn = (x => x), delim = '') { +// eslint-disable-next-line no-unused-vars +export default function(list, fn = ((x, i) => x), delim = '') { const n = list.length; if (!n) return ''; diff --git a/src/util/error.js b/src/util/error.js index 50d8b193..80e4372c 100644 --- a/src/util/error.js +++ b/src/util/error.js @@ -1,3 +1,4 @@ -export default function(message) { - throw Error(message); +export default function(message, cause) { + // @ts-ignore + throw Error(message, { cause }); } diff --git a/src/util/has.js b/src/util/has.js index 21381a4b..12ee46fa 100644 --- a/src/util/has.js +++ b/src/util/has.js @@ -1,5 +1 @@ -const { hasOwnProperty } = Object.prototype; - -export default function(object, property) { - return hasOwnProperty.call(object, property); -} +export default Object.hasOwn; diff --git a/src/util/is-array-type.js b/src/util/is-array-type.js index a8410572..5fa76dda 100644 --- a/src/util/is-array-type.js +++ b/src/util/is-array-type.js @@ -1,6 +1,10 @@ import isArray from './is-array.js'; import isTypedArray from './is-typed-array.js'; +/** + * @param {*} value + * @return {value is (any[] | import('../table/types.js').TypedArray)} + */ export default function isArrayType(value) { return isArray(value) || isTypedArray(value); } diff --git a/src/util/is-map-or-set.js b/src/util/is-map-or-set.js index 49b42e10..b29edb82 100644 --- a/src/util/is-map-or-set.js +++ b/src/util/is-map-or-set.js @@ -1,6 +1,10 @@ import isMap from './is-map.js'; import isSet from './is-set.js'; +/** + * @param {*} value + * @return {value is Map | Set} + */ export default function(value) { return isMap(value) || isSet(value); } diff --git a/src/util/is-map.js b/src/util/is-map.js index 7948024c..c7be6098 100644 --- a/src/util/is-map.js +++ b/src/util/is-map.js @@ -1,3 +1,7 @@ +/** + * @param {*} value + * @return {value is Map} + */ export default function(value) { return value instanceof Map; } diff --git a/src/util/is-set.js b/src/util/is-set.js index 8944f024..3b26a434 100644 --- a/src/util/is-set.js +++ b/src/util/is-set.js @@ -1,3 +1,7 @@ +/** + * @param {*} value + * @return {value is Set} + */ export default function(value) { return value instanceof Set; } diff --git a/src/util/is-string.js b/src/util/is-string.js index 653c8a56..7944070e 100644 --- a/src/util/is-string.js +++ b/src/util/is-string.js @@ -1,3 +1,7 @@ +/** + * @param {*} value + * @return {value is String} + */ export default function(value) { return typeof value === 'string'; } diff --git a/src/util/is-typed-array.js b/src/util/is-typed-array.js index 0228a8ef..065ba907 100644 --- a/src/util/is-typed-array.js +++ b/src/util/is-typed-array.js @@ -1,5 +1,9 @@ const TypedArray = Object.getPrototypeOf(Int8Array); +/** + * @param {*} value + * @return {value is import("../table/types.js").TypedArray} + */ export default function(value) { return value instanceof TypedArray; } diff --git a/src/util/parse-dsv.js b/src/util/parse-dsv.js index 13734fc9..9d5d1961 100644 --- a/src/util/parse-dsv.js +++ b/src/util/parse-dsv.js @@ -31,7 +31,7 @@ const RETURN = 13; // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. export default function( text, - { delimiter = ',', comment, skip } = {} + { delimiter = ',', comment = undefined, skip = 0 } = {} ) { if (delimiter.length !== 1) { error(`Text delimiter should be a single character: "${delimiter}"`); diff --git a/src/util/quantile.js b/src/util/quantile.js index fa475e95..887df136 100644 --- a/src/util/quantile.js +++ b/src/util/quantile.js @@ -14,5 +14,6 @@ export default function quantile(values, p) { const v0 = toNumeric(values[i0]); return isBigInt(v0) ? v0 + // @ts-ignore : v0 + (toNumeric(values[i0 + 1]) - v0) * (i - i0); } diff --git a/src/util/value-list.js b/src/util/value-list.js index 4a27153f..f0401bc3 100644 --- a/src/util/value-list.js +++ b/src/util/value-list.js @@ -1,6 +1,6 @@ import ascending from './ascending.js'; -import min from './min.js'; import max from './max.js'; +import min from './min.js'; import quantile from './quantile.js'; export default class ValueList { diff --git a/src/verbs/assign.js b/src/verbs/assign.js new file mode 100644 index 00000000..56d14ab5 --- /dev/null +++ b/src/verbs/assign.js @@ -0,0 +1,17 @@ +import { columnSet } from '../table/ColumnSet.js'; +import { Table } from '../table/Table.js'; +import error from '../util/error.js'; + +export function assign(table, ...others) { + others = others.flat(); + const nrows = table.numRows(); + const base = table.reify(); + const cols = columnSet(base).groupby(base.groups()); + others.forEach(input => { + input = input instanceof Table ? input : new Table(input); + if (input.numRows() !== nrows) error('Assign row counts do not match'); + input = input.reify(); + input.columnNames(name => cols.add(name, input.column(name))); + }); + return cols.new(table); +} diff --git a/src/engine/concat.js b/src/verbs/concat.js similarity index 60% rename from src/engine/concat.js rename to src/verbs/concat.js index 9b79b94d..c5f7942b 100644 --- a/src/engine/concat.js +++ b/src/verbs/concat.js @@ -1,7 +1,8 @@ -import columnSet from '../table/column-set.js'; +import { columnSet } from '../table/ColumnSet.js'; import NULL from '../util/null.js'; -export default function(table, others) { +export function concat(table, ...others) { + others = others.flat(); const trows = table.numRows(); const nrows = trows + others.reduce((n, t) => n + t.numRows(), 0); if (trows === nrows) return table; @@ -13,11 +14,11 @@ export default function(table, others) { const arr = Array(nrows); let row = 0; tables.forEach(table => { - const col = table.column(name) || { get: () => NULL }; - table.scan(trow => arr[row++] = col.get(trow)); + const col = table.column(name) || { at: () => NULL }; + table.scan(trow => arr[row++] = col.at(trow)); }); cols.add(name, arr); }); - return table.create(cols.new()); + return cols.new(table); } diff --git a/src/verbs/dedupe.js b/src/verbs/dedupe.js index f2ac0afc..c68c916a 100644 --- a/src/verbs/dedupe.js +++ b/src/verbs/dedupe.js @@ -1,7 +1,8 @@ -export default function(table, keys = []) { - return table - .groupby(keys.length ? keys : table.columnNames()) - .filter('row_number() === 1') - .ungroup() - .reify(); +import { groupby } from './groupby.js'; +import { filter } from './filter.js'; + +export function dedupe(table, ...keys) { + keys = keys.flat(); + const gt = groupby(table, keys.length ? keys : table.columnNames()); + return filter(gt, 'row_number() === 1').ungroup().reify(); } diff --git a/src/verbs/derive.js b/src/verbs/derive.js index 5593e2ef..1fbc71cc 100644 --- a/src/verbs/derive.js +++ b/src/verbs/derive.js @@ -1,14 +1,92 @@ -import relocate from './relocate.js'; -import _derive from '../engine/derive.js'; +import { relocate } from './relocate.js'; +import { aggregate } from './reduce/util.js'; +import { window } from './window/window.js'; import parse from '../expression/parse.js'; +import { hasWindow } from '../op/index.js'; +import { columnSet } from '../table/ColumnSet.js'; +import repeat from '../util/repeat.js'; -export default function(table, values, options = {}) { +function isWindowed(op) { + return hasWindow(op.name) || + op.frame && ( + Number.isFinite(op.frame[0]) || + Number.isFinite(op.frame[1]) + ); +} + +export function derive(table, values, options = {}) { const dt = _derive(table, parse(values, { table }), options); return options.drop || (options.before == null && options.after == null) ? dt - : relocate(dt, + : relocate( + dt, Object.keys(values).filter(name => !table.column(name)), options ); } + +export function _derive(table, { names, exprs, ops = [] }, options = {}) { + // instantiate output data + const total = table.totalRows(); + const cols = columnSet(options.drop ? null : table); + const data = names.map(name => cols.add(name, Array(total))); + + // analyze operations, compute non-windowed aggregates + const [ aggOps, winOps ] = segmentOps(ops); + + const size = table.isGrouped() ? table.groups().size : 1; + const result = aggregate( + table, aggOps, + repeat(ops.length, () => Array(size)) + ); + + // perform table scans to generate output values + winOps.length + ? window(table, data, exprs, result, winOps) + : output(table, data, exprs, result); + + return cols.derive(table); +} + +function segmentOps(ops) { + const aggOps = []; + const winOps = []; + const n = ops.length; + + for (let i = 0; i < n; ++i) { + const op = ops[i]; + op.id = i; + (isWindowed(op) ? winOps : aggOps).push(op); + } + + return [aggOps, winOps]; +} + +function output(table, cols, exprs, result) { + const bits = table.mask(); + const data = table.data(); + const { keys } = table.groups() || {}; + const op = keys + ? (id, row) => result[id][keys[row]] + : id => result[id][0]; + + const m = cols.length; + for (let j = 0; j < m; ++j) { + const get = exprs[j]; + const col = cols[j]; + + // inline the following for performance: + // table.scan((i, data) => col[i] = get(i, data, op)); + if (bits) { + for (let i = bits.next(0); i >= 0; i = bits.next(i + 1)) { + col[i] = get(i, data, op); + } + } else { + const n = table.totalRows(); + for (let i = 0; i < n; ++i) { + col[i] = get(i, data, op); + } + } + } +} diff --git a/src/verbs/except.js b/src/verbs/except.js index cb4a1ed2..972ea26e 100644 --- a/src/verbs/except.js +++ b/src/verbs/except.js @@ -1,5 +1,9 @@ -export default function(table, others) { +import { dedupe } from './dedupe.js'; +import { antijoin } from './join-filter.js'; + +export function except(table, ...others) { + others = others.flat(); if (others.length === 0) return table; const names = table.columnNames(); - return others.reduce((a, b) => a.antijoin(b.select(names)), table).dedupe(); + return dedupe(others.reduce((a, b) => antijoin(a, b.select(names)), table)); } diff --git a/src/verbs/filter.js b/src/verbs/filter.js index ec07fefe..fe4ca1b8 100644 --- a/src/verbs/filter.js +++ b/src/verbs/filter.js @@ -1,13 +1,34 @@ -import _derive from '../engine/derive.js'; -import _filter from '../engine/filter.js'; +import { _derive } from './derive.js'; import parse from '../expression/parse.js'; +import { BitSet } from '../table/BitSet.js'; -export default function(table, criteria) { +export function filter(table, criteria) { const test = parse({ p: criteria }, { table }); let predicate = test.exprs[0]; if (test.ops.length) { - const { data } = _derive(table, test, { drop: true }).column('p'); - predicate = row => data[row]; + const data = _derive(table, test, { drop: true }).column('p'); + predicate = row => data.at(row); } return _filter(table, predicate); } + +export function _filter(table, predicate) { + const n = table.totalRows(); + const bits = table.mask(); + const data = table.data(); + const filter = new BitSet(n); + + // inline the following for performance: + // table.scan((row, data) => { if (predicate(row, data)) filter.set(row); }); + if (bits) { + for (let i = bits.next(0); i >= 0; i = bits.next(i + 1)) { + if (predicate(i, data)) filter.set(i); + } + } else { + for (let i = 0; i < n; ++i) { + if (predicate(i, data)) filter.set(i); + } + } + + return table.create({ filter }); +} diff --git a/src/verbs/fold.js b/src/verbs/fold.js index 5b9c286a..ac127041 100644 --- a/src/verbs/fold.js +++ b/src/verbs/fold.js @@ -1,6 +1,23 @@ -import _fold from '../engine/fold.js'; +import { aggregateGet } from './reduce/util.js'; +import { _unroll } from './unroll.js'; import parse from './util/parse.js'; -export default function(table, values, options) { +export function fold(table, values, options) { return _fold(table, parse('fold', table, values), options); } + +export function _fold(table, { names = [], exprs = [], ops = [] }, options = {}) { + if (names.length === 0) return table; + + const [k = 'key', v = 'value'] = options.as || []; + const vals = aggregateGet(table, ops, exprs); + + return _unroll( + table, + { + names: [k, v], + exprs: [() => names, (row, data) => vals.map(fn => fn(row, data))] + }, + { ...options, drop: names } + ); +} diff --git a/src/verbs/groupby.js b/src/verbs/groupby.js index 954c0041..dbf5c92f 100644 --- a/src/verbs/groupby.js +++ b/src/verbs/groupby.js @@ -1,6 +1,54 @@ -import _groupby from '../engine/groupby.js'; +import { aggregateGet } from './reduce/util.js'; import parse from './util/parse.js'; +import keyFunction from '../util/key-function.js'; -export default function(table, values) { - return _groupby(table, parse('groupby', table, values)); +export function groupby(table, ...values) { + return _groupby(table, parse('groupby', table, values.flat())); +} + +export function _groupby(table, exprs) { + return table.create({ + groups: createGroups(table, exprs) + }); +} + +function createGroups(table, { names = [], exprs = [], ops = [] }) { + const n = names.length; + if (n === 0) return null; + + // check for optimized path when grouping by a single field + // use pre-calculated groups if available + if (n === 1 && !table.isFiltered() && exprs[0].field) { + const col = table.column(exprs[0].field); + if (col.groups) return col.groups(names); + } + + let get = aggregateGet(table, ops, exprs); + const getKey = keyFunction(get); + const nrows = table.totalRows(); + const keys = new Uint32Array(nrows); + const index = {}; + const rows = []; + + // inline table scan for performance + const data = table.data(); + const bits = table.mask(); + if (bits) { + for (let i = bits.next(0); i >= 0; i = bits.next(i + 1)) { + const key = getKey(i, data) + ''; + keys[i] = (index[key] ??= rows.push(i) - 1); + } + } else { + for (let i = 0; i < nrows; ++i) { + const key = getKey(i, data) + ''; + keys[i] = (index[key] ??= rows.push(i) - 1); + } + } + + if (!ops.length) { + // capture data in closure, so no interaction with select + get = get.map(f => row => f(row, data)); + } + + return { keys, get, names, rows, size: rows.length }; } diff --git a/src/verbs/helpers/agg.js b/src/verbs/helpers/agg.js index aa20fffa..0cff02f5 100644 --- a/src/verbs/helpers/agg.js +++ b/src/verbs/helpers/agg.js @@ -1,14 +1,17 @@ +import { rollup } from '../rollup.js'; +import { ungroup } from '../ungroup.js'; + /** * Convenience function for computing a single aggregate value for * a table. Equivalent to ungrouping a table, applying a rollup verb * for a single aggregate, and extracting the resulting value. - * @param {import('../../table/table.js').Table} table A table instance. - * @param {import('../../table/transformable.js').TableExpr} expr An + * @param {import('../../table/Table.js').Table} table A table instance. + * @param {import('../../table/types.js').TableExpr} expr An * aggregate table expression to evaluate. - * @return {import('../../table/table.js').DataValue} The aggregate value. + * @return {import('../../table/types.js').DataValue} The aggregate value. * @example agg(table, op.max('colA')) * @example agg(table, d => [op.min('colA'), op.max('colA')]) */ export default function agg(table, expr) { - return table.ungroup().rollup({ _: expr }).get('_'); + return rollup(ungroup(table), { _: expr }).get('_'); } diff --git a/src/verbs/impute.js b/src/verbs/impute.js index 1c7254cc..66fb7f3f 100644 --- a/src/verbs/impute.js +++ b/src/verbs/impute.js @@ -1,12 +1,17 @@ -import _impute from '../engine/impute.js'; -import _rollup from '../engine/rollup.js'; -import parse from '../expression/parse.js'; +import { aggregateGet } from './reduce/util.js'; +import { _rollup } from './rollup.js'; +import { ungroup } from './ungroup.js'; import parseValues from './util/parse.js'; +import parse from '../expression/parse.js'; import { array_agg_distinct } from '../op/op-api.js'; +import { columnSet } from '../table/ColumnSet.js'; import error from '../util/error.js'; +import isValid from '../util/is-valid.js'; +import keyFunction from '../util/key-function.js'; import toString from '../util/to-string.js'; +import unroll from '../util/unroll.js'; -export default function(table, values, options = {}) { +export function impute(table, values, options = {}) { values = parse(values, { table }); values.names.forEach(name => @@ -14,9 +19,9 @@ export default function(table, values, options = {}) { ); if (options.expand) { - const opt = { preparse, aggronly: true }; + const opt = { preparse, window: false, aggronly: true }; const params = parseValues('impute', table, options.expand, opt); - const result = _rollup(table.ungroup(), params); + const result = _rollup(ungroup(table), params); return _impute( table, values, params.names, params.names.map(name => result.get(name, 0)) @@ -32,3 +37,126 @@ function preparse(map) { value.field ? map.set(key, array_agg_distinct(value + '')) : 0 ); } + +export function _impute(table, values, keys, arrays) { + const write = keys && keys.length; + table = write ? expand(table, keys, arrays) : table; + const { names, exprs, ops } = values; + const gets = aggregateGet(table, ops, exprs); + const cols = write ? null : columnSet(table); + const rows = table.totalRows(); + + names.forEach((name, i) => { + const col = table.column(name); + const out = write ? col : cols.add(name, Array(rows)); + const get = gets[i]; + + table.scan(idx => { + const v = col.at(idx); + out[idx] = !isValid(v) ? get(idx) : v; + }); + }); + + return write ? table : table.create(cols); +} + +function expand(table, keys, values) { + const groups = table.groups(); + const data = table.data(); + + // expansion keys and accessors + const keyNames = (groups ? groups.names : []).concat(keys); + const keyGet = (groups ? groups.get : []) + .concat(keys.map(key => table.getter(key))); + + // build hash of existing rows + const hash = new Set(); + const keyTable = keyFunction(keyGet); + table.scan((idx, data) => hash.add(keyTable(idx, data))); + + // initialize output table data + const names = table.columnNames(); + const cols = columnSet(); + const out = names.map(name => cols.add(name, [])); + names.forEach((name, i) => { + const old = data[name]; + const col = out[i]; + table.scan(row => col.push(old.at(row))); + }); + + // enumerate expanded value sets and augment output table + const keyEnum = keyFunction(keyGet.map((k, i) => a => a[i])); + const set = unroll( + 'v', + '{' + out.map((_, i) => `_${i}.push(v[$${i}]);`).join('') + '}', + out, names.map(name => keyNames.indexOf(name)) + ); + + if (groups) { + let row = groups.keys.length; + const prod = values.reduce((p, a) => p * a.length, groups.size); + const keys = new Uint32Array(prod + (row - hash.size)); + keys.set(groups.keys); + enumerate(groups, values, (vec, idx) => { + if (!hash.has(keyEnum(vec))) { + set(vec); + keys[row++] = idx[0]; + } + }); + cols.groupby({ ...groups, keys }); + } else { + enumerate(groups, values, vec => { + if (!hash.has(keyEnum(vec))) set(vec); + }); + } + + return cols.new(table); +} + +function enumerate(groups, values, callback) { + const offset = groups ? groups.get.length : 0; + const pad = groups ? 1 : 0; + const len = pad + values.length; + const lens = new Int32Array(len); + const idxs = new Int32Array(len); + const set = []; + + if (groups) { + const { get, rows, size } = groups; + lens[0] = size; + set.push((vec, idx) => { + const row = rows[idx]; + for (let i = 0; i < offset; ++i) { + vec[i] = get[i](row); + } + }); + } + + values.forEach((a, i) => { + const j = i + offset; + lens[i + pad] = a.length; + set.push((vec, idx) => vec[j] = a[idx]); + }); + + const vec = Array(offset + values.length); + + // initialize value vector + for (let i = 0; i < len; ++i) { + set[i](vec, 0); + } + callback(vec, idxs); + + // enumerate all combinations of values + for (let i = len - 1; i >= 0;) { + const idx = ++idxs[i]; + if (idx < lens[i]) { + set[i](vec, idx); + callback(vec, idxs); + i = len - 1; + } else { + idxs[i] = 0; + set[i](vec, 0); + --i; + } + } +} diff --git a/src/verbs/index.js b/src/verbs/index.js index 58babacf..5fbfd517 100644 --- a/src/verbs/index.js +++ b/src/verbs/index.js @@ -1,64 +1,27 @@ -import __dedupe from './dedupe.js'; -import __derive from './derive.js'; -import __except from './except.js'; -import __filter from './filter.js'; -import __fold from './fold.js'; -import __impute from './impute.js'; -import __intersect from './intersect.js'; -import __join from './join.js'; -import __semijoin from './join-filter.js'; -import __lookup from './lookup.js'; -import __pivot from './pivot.js'; -import __relocate from './relocate.js'; -import __rename from './rename.js'; -import __rollup from './rollup.js'; -import __sample from './sample.js'; -import __select from './select.js'; -import __spread from './spread.js'; -import __union from './union.js'; -import __unroll from './unroll.js'; -import __groupby from './groupby.js'; -import __orderby from './orderby.js'; - -import __concat from '../engine/concat.js'; -import __reduce from '../engine/reduce.js'; -import __ungroup from '../engine/ungroup.js'; -import __unorder from '../engine/unorder.js'; - -import { count } from '../op/op-api.js'; - -export default { - __antijoin: (table, other, on) => - __semijoin(table, other, on, { anti: true }), - __count: (table, options = {}) => - __rollup(table, { [options.as || 'count']: count() }), - __cross: (table, other, values, options) => - __join(table, other, () => true, values, { - ...options, left: true, right: true - }), - __concat, - __dedupe, - __derive, - __except, - __filter, - __fold, - __impute, - __intersect, - __join, - __lookup, - __pivot, - __relocate, - __rename, - __rollup, - __sample, - __select, - __semijoin, - __spread, - __union, - __unroll, - __groupby, - __orderby, - __ungroup, - __unorder, - __reduce -}; +export { assign } from './assign.js'; +export { concat } from './concat.js'; +export { dedupe } from './dedupe.js'; +export { derive } from './derive.js'; +export { except } from './except.js'; +export { filter } from './filter.js'; +export { fold } from './fold.js'; +export { groupby } from './groupby.js'; +export { impute } from './impute.js'; +export { intersect } from './intersect.js'; +export { cross, join } from './join.js'; +export { antijoin, semijoin } from './join-filter.js'; +export { lookup } from './lookup.js'; +export { orderby } from './orderby.js'; +export { pivot } from './pivot.js'; +export { reduce } from './reduce.js'; +export { relocate } from './relocate.js'; +export { rename } from './rename.js'; +export { rollup } from './rollup.js'; +export { sample } from './sample.js'; +export { select } from './select.js'; +export { slice } from './slice.js'; +export { spread } from './spread.js'; +export { ungroup } from './ungroup.js'; +export { union } from './union.js'; +export { unorder } from './unorder.js'; +export { unroll } from './unroll.js'; diff --git a/src/verbs/intersect.js b/src/verbs/intersect.js index ef19876d..fd90e184 100644 --- a/src/verbs/intersect.js +++ b/src/verbs/intersect.js @@ -1,6 +1,10 @@ -export default function(table, others) { +import { dedupe } from './dedupe.js'; +import { semijoin } from './join-filter.js'; + +export function intersect(table, ...others) { + others = others.flat(); const names = table.columnNames(); return others.length - ? others.reduce((a, b) => a.semijoin(b.select(names)), table).dedupe() + ? dedupe(others.reduce((a, b) => semijoin(a, b.select(names)), table)) : table.reify([]); } diff --git a/src/verbs/join-filter.js b/src/verbs/join-filter.js index 3c2530ff..148235bc 100644 --- a/src/verbs/join-filter.js +++ b/src/verbs/join-filter.js @@ -1,10 +1,19 @@ -import _join_filter from '../engine/join-filter.js'; +import { rowLookup } from './join/lookup.js'; import { inferKeys, keyPredicate } from './util/join-keys.js'; import parse from '../expression/parse.js'; +import { BitSet } from '../table/BitSet.js'; import isArray from '../util/is-array.js'; import toArray from '../util/to-array.js'; -export default function(tableL, tableR, on, options) { +export function semijoin(tableL, tableR, on) { + return join_filter(tableL, tableR, on, { anti: false }); +} + +export function antijoin(tableL, tableR, on) { + return join_filter(tableL, tableR, on, { anti: true }); +} + +export function join_filter(tableL, tableR, on, options) { on = inferKeys(tableL, tableR, on); const predicate = isArray(on) @@ -13,3 +22,60 @@ export default function(tableL, tableR, on, options) { return _join_filter(tableL, tableR, predicate, options); } + +export function _join_filter(tableL, tableR, predicate, options = {}) { + // calculate semi-join filter mask + const filter = new BitSet(tableL.totalRows()); + const join = isArray(predicate) ? hashSemiJoin : loopSemiJoin; + join(filter, tableL, tableR, predicate); + + // if anti-join, negate the filter + if (options.anti) { + filter.not().and(tableL.mask()); + } + + return tableL.create({ filter }); +} + +function hashSemiJoin(filter, tableL, tableR, [keyL, keyR]) { + // build lookup table + const lut = rowLookup(tableR, keyR); + + // scan table, update filter with matches + tableL.scan((rowL, data) => { + const rowR = lut.get(keyL(rowL, data)); + if (rowR >= 0) filter.set(rowL); + }); +} + +function loopSemiJoin(filter, tableL, tableR, predicate) { + const nL = tableL.numRows(); + const nR = tableR.numRows(); + const dataL = tableL.data(); + const dataR = tableR.data(); + + if (tableL.isFiltered() || tableR.isFiltered()) { + // use indices as at least one table is filtered + const idxL = tableL.indices(false); + const idxR = tableR.indices(false); + for (let i = 0; i < nL; ++i) { + const rowL = idxL[i]; + for (let j = 0; j < nR; ++j) { + if (predicate(rowL, dataL, idxR[j], dataR)) { + filter.set(rowL); + break; + } + } + } + } else { + // no filters, enumerate row indices directly + for (let i = 0; i < nL; ++i) { + for (let j = 0; j < nR; ++j) { + if (predicate(i, dataL, j, dataR)) { + filter.set(i); + break; + } + } + } + } +} diff --git a/src/verbs/join.js b/src/verbs/join.js index 86efcbe4..ccfa813c 100644 --- a/src/verbs/join.js +++ b/src/verbs/join.js @@ -1,17 +1,31 @@ -import _join from '../engine/join.js'; +import { indexLookup } from './join/lookup.js'; import { inferKeys, keyPredicate } from './util/join-keys.js'; import parseValue from './util/parse.js'; import parse from '../expression/parse.js'; import { all, not } from '../helpers/selection.js'; +import { columnSet } from '../table/ColumnSet.js'; +import concat from '../util/concat.js'; import isArray from '../util/is-array.js'; import isString from '../util/is-string.js'; import toArray from '../util/to-array.js'; import toString from '../util/to-string.js'; +import unroll from '../util/unroll.js'; const OPT_L = { aggregate: false, window: false }; const OPT_R = { ...OPT_L, index: 1 }; +const NONE = -Infinity; -export default function(tableL, tableR, on, values, options = {}) { +export function cross(table, other, values, options) { + return join( + table, + other, + () => true, + values, + { ...options, left: true, right: true } + ); +} + +export function join(tableL, tableR, on, values, options = {}) { on = inferKeys(tableL, tableR, on); const optParse = { join: [tableL, tableR] }; let predicate; @@ -104,3 +118,108 @@ function rekey(names, rename, suffix) { ? (names[i] = name + suffix) : 0); } + +function emitter(columns, getters) { + const args = ['i', 'a', 'j', 'b']; + return unroll( + args, + '{' + concat(columns, (_, i) => `_${i}.push($${i}(${args}));`) + '}', + columns, getters + ); +} + +export function _join(tableL, tableR, predicate, { names, exprs }, options = {}) { + // initialize data for left table + const dataL = tableL.data(); + const idxL = tableL.indices(false); + const nL = idxL.length; + const hitL = new Int32Array(nL); + + // initialize data for right table + const dataR = tableR.data(); + const idxR = tableR.indices(false); + const nR = idxR.length; + const hitR = new Int32Array(nR); + + // initialize output data + const ncols = names.length; + const cols = columnSet(); + const columns = Array(ncols); + const getters = Array(ncols); + for (let i = 0; i < names.length; ++i) { + columns[i] = cols.add(names[i], []); + getters[i] = exprs[i]; + } + const emit = emitter(columns, getters); + + // perform join + const join = isArray(predicate) ? hashJoin : loopJoin; + join(emit, predicate, dataL, dataR, idxL, idxR, hitL, hitR, nL, nR); + + if (options.left) { + for (let i = 0; i < nL; ++i) { + if (!hitL[i]) { + emit(idxL[i], dataL, NONE, dataR); + } + } + } + + if (options.right) { + for (let j = 0; j < nR; ++j) { + if (!hitR[j]) { + emit(NONE, dataL, idxR[j], dataR); + } + } + } + + return cols.new(tableL); +} + +function loopJoin(emit, predicate, dataL, dataR, idxL, idxR, hitL, hitR, nL, nR) { + // perform nested-loops join + for (let i = 0; i < nL; ++i) { + const rowL = idxL[i]; + for (let j = 0; j < nR; ++j) { + const rowR = idxR[j]; + if (predicate(rowL, dataL, rowR, dataR)) { + emit(rowL, dataL, rowR, dataR); + hitL[i] = 1; + hitR[j] = 1; + } + } + } +} + +function hashJoin(emit, [keyL, keyR], dataL, dataR, idxL, idxR, hitL, hitR, nL, nR) { + // determine which table to hash + let dataScan, keyScan, hitScan, idxScan; + let dataHash, keyHash, hitHash, idxHash; + let emitScan = emit; + if (nL >= nR) { + dataScan = dataL; keyScan = keyL; hitScan = hitL; idxScan = idxL; + dataHash = dataR; keyHash = keyR; hitHash = hitR; idxHash = idxR; + } else { + dataScan = dataR; keyScan = keyR; hitScan = hitR; idxScan = idxR; + dataHash = dataL; keyHash = keyL; hitHash = hitL; idxHash = idxL; + emitScan = (i, a, j, b) => emit(j, b, i, a); + } + + // build lookup table + const lut = indexLookup(idxHash, dataHash, keyHash); + + // scan other table + const m = idxScan.length; + for (let j = 0; j < m; ++j) { + const rowScan = idxScan[j]; + const list = lut.get(keyScan(rowScan, dataScan)); + if (list) { + const n = list.length; + for (let k = 0; k < n; ++k) { + const i = list[k]; + emitScan(rowScan, dataScan, idxHash[i], dataHash); + hitHash[i] = 1; + } + hitScan[j] = 1; + } + } +} diff --git a/src/engine/join/lookup.js b/src/verbs/join/lookup.js similarity index 100% rename from src/engine/join/lookup.js rename to src/verbs/join/lookup.js diff --git a/src/verbs/lookup.js b/src/verbs/lookup.js index c71d687e..3ddf656a 100644 --- a/src/verbs/lookup.js +++ b/src/verbs/lookup.js @@ -1,14 +1,46 @@ -import _lookup from '../engine/lookup.js'; +import { rowLookup } from './join/lookup.js'; +import { aggregateGet } from './reduce/util.js'; import { inferKeys } from './util/join-keys.js'; import parseKey from './util/parse-key.js'; import parseValues from './util/parse.js'; +import { columnSet } from '../table/ColumnSet.js'; +import NULL from '../util/null.js'; +import concat from '../util/concat.js'; +import unroll from '../util/unroll.js'; -export default function(tableL, tableR, on, values) { +export function lookup(tableL, tableR, on, ...values) { on = inferKeys(tableL, tableR, on); return _lookup( tableL, tableR, [ parseKey('lookup', tableL, on[0]), parseKey('lookup', tableR, on[1]) ], - parseValues('lookup', tableR, values) + parseValues('lookup', tableR, values.flat()) ); } + +export function _lookup(tableL, tableR, [keyL, keyR], { names, exprs, ops = [] }) { + // instantiate output data + const cols = columnSet(tableL); + const total = tableL.totalRows(); + names.forEach(name => cols.add(name, Array(total).fill(NULL))); + + // build lookup table + const lut = rowLookup(tableR, keyR); + + // generate setter function for lookup match + const set = unroll( + ['lr', 'rr', 'data'], + '{' + concat(names, (_, i) => `_[${i}][lr] = $[${i}](rr, data);`) + '}', + names.map(name => cols.data[name]), + aggregateGet(tableR, ops, exprs) + ); + + // find matching rows, set values on match + const dataR = tableR.data(); + tableL.scan((lrow, data) => { + const rrow = lut.get(keyL(lrow, data)); + if (rrow >= 0) set(lrow, rrow, dataR); + }); + + return cols.derive(tableL); +} diff --git a/src/verbs/orderby.js b/src/verbs/orderby.js index b9f06f29..f119a58d 100644 --- a/src/verbs/orderby.js +++ b/src/verbs/orderby.js @@ -1,4 +1,3 @@ -import _orderby from '../engine/orderby.js'; import parse from '../expression/compare.js'; import field from '../helpers/field.js'; import error from '../util/error.js'; @@ -7,8 +6,8 @@ import isObject from '../util/is-object.js'; import isNumber from '../util/is-number.js'; import isString from '../util/is-string.js'; -export default function(table, values) { - return _orderby(table, parseValues(table, values)); +export function orderby(table, ...values) { + return _orderby(table, parseValues(table, values.flat())); } function parseValues(table, params) { @@ -33,3 +32,7 @@ function parseValues(table, params) { return parse(table, exprs); } + +export function _orderby(table, comparator) { + return table.create({ order: comparator }); +} diff --git a/src/verbs/pivot.js b/src/verbs/pivot.js index b5564b40..182ab6bd 100644 --- a/src/verbs/pivot.js +++ b/src/verbs/pivot.js @@ -1,13 +1,15 @@ -import _pivot from '../engine/pivot.js'; -import { any } from '../op/op-api.js'; +import { aggregate, aggregateGet, groupOutput } from './reduce/util.js'; import parse from './util/parse.js'; +import { ungroup } from './ungroup.js'; +import { any } from '../op/op-api.js'; +import { columnSet } from '../table/ColumnSet.js'; // TODO: enforce aggregates only (no output changes) for values -export default function(table, on, values, options) { +export function pivot(table, on, values, options) { return _pivot( table, parse('fold', table, on), - parse('fold', table, values, { preparse, aggronly: true }), + parse('fold', table, values, { preparse, window: false, aggronly: true }), options ); } @@ -18,3 +20,110 @@ function preparse(map) { value.field ? map.set(key, any(value + '')) : 0 ); } + +const opt = (value, defaultValue) => value != null ? value : defaultValue; + +export function _pivot(table, on, values, options = {}) { + const { keys, keyColumn } = pivotKeys(table, on, options); + const vsep = opt(options.valueSeparator, '_'); + const namefn = values.names.length > 1 + ? (i, name) => name + vsep + keys[i] + : i => keys[i]; + + // perform separate aggregate operations for each key + // if keys do not match, emit NaN so aggregate skips it + // use custom toString method for proper field resolution + const results = keys.map( + k => aggregate(table, values.ops.map(op => { + if (op.name === 'count') { // fix #273 + const fn = r => k === keyColumn[r] ? 1 : NaN; + fn.toString = () => k + ':1'; + return { ...op, name: 'sum', fields: [fn] }; + } + const fields = op.fields.map(f => { + const fn = (r, d) => k === keyColumn[r] ? f(r, d) : NaN; + fn.toString = () => k + ':' + f; + return fn; + }); + return { ...op, fields }; + })) + ); + + return output(values, namefn, table.groups(), results).new(table); +} + +function pivotKeys(table, on, options) { + const limit = options.limit > 0 ? +options.limit : Infinity; + const sort = opt(options.sort, true); + const ksep = opt(options.keySeparator, '_'); + + // construct key accessor function + const get = aggregateGet(table, on.ops, on.exprs); + const key = get.length === 1 + ? get[0] + : (row, data) => get.map(fn => fn(row, data)).join(ksep); + + // generate vector of per-row key values + const kcol = Array(table.totalRows()); + table.scan((row, data) => kcol[row] = key(row, data)); + + // collect unique key values + const uniq = aggregate( + ungroup(table), + [ { + id: 0, + name: 'array_agg_distinct', + fields: [(row => kcol[row])], params: [] + } ] + )[0][0]; + + // get ordered set of unique key values + const keys = sort ? uniq.sort() : uniq; + + // return key values + return { + keys: Number.isFinite(limit) ? keys.slice(0, limit) : keys, + keyColumn: kcol + }; +} + +function output({ names, exprs }, namefn, groups, results) { + const size = groups ? groups.size : 1; + const cols = columnSet(); + const m = results.length; + const n = names.length; + + let result; + const op = (id, row) => result[id][row]; + + // write groupby fields to output + if (groups) groupOutput(cols, groups); + + // write pivot values to output + for (let i = 0; i < n; ++i) { + const get = exprs[i]; + if (get.field != null) { + // if expression is op only, use aggregates directly + for (let j = 0; j < m; ++j) { + cols.add(namefn(j, names[i]), results[j][get.field]); + } + } else if (size > 1) { + // if multiple groups, evaluate expression for each + for (let j = 0; j < m; ++j) { + result = results[j]; + const col = cols.add(namefn(j, names[i]), Array(size)); + for (let k = 0; k < size; ++k) { + col[k] = get(k, null, op); + } + } + } else { + // if only one group, no need to loop + for (let j = 0; j < m; ++j) { + result = results[j]; + cols.add(namefn(j, names[i]), [ get(0, null, op) ]); + } + } + } + + return cols; +} diff --git a/src/engine/reduce.js b/src/verbs/reduce.js similarity index 89% rename from src/engine/reduce.js rename to src/verbs/reduce.js index cd506cbf..72135b90 100644 --- a/src/engine/reduce.js +++ b/src/verbs/reduce.js @@ -1,7 +1,7 @@ import { reduceFlat, reduceGroups } from './reduce/util.js'; -import columnSet from '../table/column-set.js'; +import { columnSet } from '../table/ColumnSet.js'; -export default function(table, reducer) { +export function reduce(table, reducer) { const cols = columnSet(); const groups = table.groups(); @@ -37,5 +37,5 @@ export default function(table, reducer) { }); } - return table.create(cols.new()); + return cols.new(table); } diff --git a/src/engine/reduce/count-pattern.js b/src/verbs/reduce/count-pattern.js similarity index 97% rename from src/engine/reduce/count-pattern.js rename to src/verbs/reduce/count-pattern.js index d7d7b5fd..1ddc359e 100644 --- a/src/engine/reduce/count-pattern.js +++ b/src/verbs/reduce/count-pattern.js @@ -6,7 +6,7 @@ export default function(fields, as, pattern) { } function columnGetter(column) { - return (row, data) => data[column].get(row); + return (row, data) => data[column].at(row); } export class CountPattern extends Reducer { diff --git a/src/engine/reduce/field-reducer.js b/src/verbs/reduce/field-reducer.js similarity index 99% rename from src/engine/reduce/field-reducer.js rename to src/verbs/reduce/field-reducer.js index 61c0783a..a1af271c 100644 --- a/src/engine/reduce/field-reducer.js +++ b/src/verbs/reduce/field-reducer.js @@ -20,6 +20,7 @@ export default function(oplist, stream) { : n === 1 ? Field1Reducer : n === 2 ? Field2Reducer : error('Unsupported field count: ' + n); + // @ts-ignore return new cls(fields, ops, output, stream); } diff --git a/src/engine/reduce/reducer.js b/src/verbs/reduce/reducer.js similarity index 56% rename from src/engine/reduce/reducer.js rename to src/verbs/reduce/reducer.js index 7e363764..fdb359ed 100644 --- a/src/engine/reduce/reducer.js +++ b/src/verbs/reduce/reducer.js @@ -14,18 +14,22 @@ export default class Reducer { return this._outputs; } - init(/* columns */) { + // eslint-disable-next-line no-unused-vars + init(columns) { return {}; } - add(/* state, row, data */) { + // eslint-disable-next-line no-unused-vars + add(state, row, data) { // no-op, subclasses should override } - rem(/* state, row, data */) { + // eslint-disable-next-line no-unused-vars + rem(state, row, data) { // no-op, subclasses should override } - write(/* state, values, index */) { + // eslint-disable-next-line no-unused-vars + write(state, values, index) { } } diff --git a/src/engine/reduce/util.js b/src/verbs/reduce/util.js similarity index 100% rename from src/engine/reduce/util.js rename to src/verbs/reduce/util.js diff --git a/src/verbs/relocate.js b/src/verbs/relocate.js index de85eb8e..73b0ea80 100644 --- a/src/verbs/relocate.js +++ b/src/verbs/relocate.js @@ -1,8 +1,11 @@ -import _select from '../engine/select.js'; +import { _select } from './select.js'; import resolve from '../helpers/selection.js'; import error from '../util/error.js'; -export default function(table, columns, { before, after } = {}) { +export function relocate(table, columns, { + before = undefined, + after = undefined +} = {}) { const bef = before != null; const aft = after != null; diff --git a/src/verbs/rename.js b/src/verbs/rename.js index 1e3eadc6..c22f1864 100644 --- a/src/verbs/rename.js +++ b/src/verbs/rename.js @@ -1,8 +1,8 @@ -import _select from '../engine/select.js'; +import { _select } from './select.js'; import resolve from '../helpers/selection.js'; -export default function(table, columns) { +export function rename(table, ...columns) { const map = new Map(); table.columnNames(x => (map.set(x, x), 0)); - return _select(table, resolve(table, columns, map)); + return _select(table, resolve(table, columns.flat(), map)); } diff --git a/src/verbs/rollup.js b/src/verbs/rollup.js index 3cdea091..997e617c 100644 --- a/src/verbs/rollup.js +++ b/src/verbs/rollup.js @@ -1,6 +1,46 @@ -import _rollup from '../engine/rollup.js'; +import { aggregate, groupOutput } from './reduce/util.js'; import parse from '../expression/parse.js'; +import { columnSet } from '../table/ColumnSet.js'; -export default function(table, values) { +export function rollup(table, values) { return _rollup(table, parse(values, { table, aggronly: true, window: false })); } + +export function _rollup(table, { names, exprs, ops = [] }) { + // output data + const cols = columnSet(); + const groups = table.groups(); + + // write groupby fields to output + if (groups) groupOutput(cols, groups); + + // compute and write aggregate output + output(names, exprs, groups, aggregate(table, ops), cols); + + // return output table + return cols.new(table); +} + +function output(names, exprs, groups, result = [], cols) { + if (!exprs.length) return; + const size = groups ? groups.size : 1; + const op = (id, row) => result[id][row]; + const n = names.length; + + for (let i = 0; i < n; ++i) { + const get = exprs[i]; + if (get.field != null) { + // if expression is op only, use aggregates directly + cols.add(names[i], result[get.field]); + } else if (size > 1) { + // if multiple groups, evaluate expression for each + const col = cols.add(names[i], Array(size)); + for (let j = 0; j < size; ++j) { + col[j] = get(j, null, op); + } + } else { + // if only one group, no need to loop + cols.add(names[i], [ get(0, null, op) ]); + } + } +} diff --git a/src/verbs/sample.js b/src/verbs/sample.js index 3a1c68bf..f2cc8f77 100644 --- a/src/verbs/sample.js +++ b/src/verbs/sample.js @@ -1,11 +1,12 @@ -import _derive from '../engine/derive.js'; -import _rollup from '../engine/rollup.js'; -import _sample from '../engine/sample.js'; +import { _derive } from './derive.js'; +import { _rollup } from './rollup.js'; import parse from '../expression/parse.js'; import isNumber from '../util/is-number.js'; import isString from '../util/is-string.js'; +import sampleIndices from '../util/sample.js'; +import shuffleIndices from '../util/shuffle.js'; -export default function(table, size, options = {}) { +export function sample(table, size, options = {}) { return _sample( table, parseSize(table, size), @@ -14,7 +15,7 @@ export default function(table, size, options = {}) { ); } -const get = col => row => col.get(row) || 0; +const get = col => row => col.at(row) || 0; function parseSize(table, size) { return isNumber(size) @@ -31,3 +32,39 @@ function parseWeight(table, w) { : _derive(table, parse({ w }, { table }), { drop: true }).column('w') ); } + +export function _sample(table, size, weight, options = {}) { + const { replace, shuffle } = options; + const parts = table.partitions(false); + + let total = 0; + size = parts.map((idx, group) => { + let s = size(group); + total += (s = (replace ? s : Math.min(idx.length, s))); + return s; + }); + + const samples = new Uint32Array(total); + let curr = 0; + + parts.forEach((idx, group) => { + const sz = size[group]; + const buf = samples.subarray(curr, curr += sz); + + if (!replace && sz === idx.length) { + // sample size === data size, no replacement + // no need to sample, just copy indices + buf.set(idx); + } else { + sampleIndices(buf, replace, idx, weight); + } + }); + + if (shuffle !== false && (parts.length > 1 || !replace)) { + // sampling with replacement methods shuffle, so in + // that case a single partition is already good to go + shuffleIndices(samples); + } + + return table.reify(samples); +} diff --git a/src/verbs/select.js b/src/verbs/select.js index dc6045e2..241aaa2a 100644 --- a/src/verbs/select.js +++ b/src/verbs/select.js @@ -1,6 +1,22 @@ -import _select from '../engine/select.js'; import resolve from '../helpers/selection.js'; +import { columnSet } from '../table/ColumnSet.js'; +import error from '../util/error.js'; +import isString from '../util/is-string.js'; -export default function(table, columns) { - return _select(table, resolve(table, columns)); +export function select(table, ...columns) { + return _select(table, resolve(table, columns.flat())); +} + +export function _select(table, columns) { + const cols = columnSet(); + + columns.forEach((value, curr) => { + const next = isString(value) ? value : curr; + if (next) { + const col = table.column(curr) || error(`Unrecognized column: ${curr}`); + cols.add(next, col); + } + }); + + return cols.derive(table); } diff --git a/src/verbs/slice.js b/src/verbs/slice.js new file mode 100644 index 00000000..4cd1a5eb --- /dev/null +++ b/src/verbs/slice.js @@ -0,0 +1,16 @@ +import { filter } from './filter.js'; +import _slice from '../helpers/slice.js'; + +export function slice(table, start = 0, end = Infinity) { + if (table.isGrouped()) { + return filter(table, _slice(start, end)).reify(); + } + + // if not grouped, scan table directly + const indices = []; + const nrows = table.numRows(); + start = Math.max(0, start + (start < 0 ? nrows : 0)); + end = Math.min(nrows, Math.max(0, end + (end < 0 ? nrows : 0))); + table.scan(row => indices.push(row), true, end - start, start); + return table.reify(indices); +} diff --git a/src/verbs/spread.js b/src/verbs/spread.js index 180ad188..b7773d1f 100644 --- a/src/verbs/spread.js +++ b/src/verbs/spread.js @@ -1,6 +1,64 @@ -import _spread from '../engine/spread.js'; +import { aggregateGet } from './reduce/util.js'; import parse from './util/parse.js'; +import { columnSet } from '../table/ColumnSet.js'; +import NULL from '../util/null.js'; +import toArray from '../util/to-array.js'; -export default function(table, values, options) { +export function spread(table, values, options) { return _spread(table, parse('spread', table, values), options); } + +export function _spread(table, { names, exprs, ops = [] }, options = {}) { + if (names.length === 0) return table; + + // ignore 'as' if there are multiple field names + const as = (names.length === 1 && options.as) || []; + const drop = options.drop == null ? true : !!options.drop; + const limit = options.limit == null + ? as.length || Infinity + : Math.max(1, +options.limit || 1); + + const get = aggregateGet(table, ops, exprs); + const cols = columnSet(); + const map = names.reduce((map, name, i) => map.set(name, i), new Map()); + + const add = (index, name) => { + const columns = spreadCols(table, get[index], limit); + const n = columns.length; + for (let i = 0; i < n; ++i) { + cols.add(as[i] || `${name}_${i + 1}`, columns[i]); + } + }; + + table.columnNames().forEach(name => { + if (map.has(name)) { + if (!drop) cols.add(name, table.column(name)); + add(map.get(name), name); + map.delete(name); + } else { + cols.add(name, table.column(name)); + } + }); + + map.forEach(add); + + return cols.derive(table); +} + +function spreadCols(table, get, limit) { + const nrows = table.totalRows(); + const columns = []; + + table.scan((row, data) => { + const values = toArray(get(row, data)); + const n = Math.min(values.length, limit); + while (columns.length < n) { + columns.push(Array(nrows).fill(NULL)); + } + for (let i = 0; i < n; ++i) { + columns[i][row] = values[i]; + } + }); + + return columns; +} diff --git a/src/engine/ungroup.js b/src/verbs/ungroup.js similarity index 70% rename from src/engine/ungroup.js rename to src/verbs/ungroup.js index 33f76688..6ef68d48 100644 --- a/src/engine/ungroup.js +++ b/src/verbs/ungroup.js @@ -1,4 +1,4 @@ -export default function(table) { +export function ungroup(table) { return table.isGrouped() ? table.create({ groups: null }) : table; diff --git a/src/verbs/union.js b/src/verbs/union.js index 625a97ad..2e9afdef 100644 --- a/src/verbs/union.js +++ b/src/verbs/union.js @@ -1,3 +1,6 @@ -export default function(table, others) { - return table.concat(others).dedupe(); +import { concat } from './concat.js'; +import { dedupe } from './dedupe.js'; + +export function union(table, ...others) { + return dedupe(concat(table, others.flat())); } diff --git a/src/engine/unorder.js b/src/verbs/unorder.js similarity index 70% rename from src/engine/unorder.js rename to src/verbs/unorder.js index dc9bde22..b96e4b4c 100644 --- a/src/engine/unorder.js +++ b/src/verbs/unorder.js @@ -1,4 +1,4 @@ -export default function(table) { +export function unorder(table) { return table.isOrdered() ? table.create({ order: null }) : table; diff --git a/src/verbs/unroll.js b/src/verbs/unroll.js index 7b5d0faf..cc980560 100644 --- a/src/verbs/unroll.js +++ b/src/verbs/unroll.js @@ -1,7 +1,9 @@ -import _unroll from '../engine/unroll.js'; +import { aggregateGet } from './reduce/util.js'; import parse from './util/parse.js'; +import { columnSet } from '../table/ColumnSet.js'; +import toArray from '../util/to-array.js'; -export default function(table, values, options) { +export function unroll(table, values, options) { return _unroll( table, parse('unroll', table, values), @@ -10,3 +12,117 @@ export default function(table, values, options) { : options ); } + +export function _unroll(table, { names = [], exprs = [], ops = [] }, options = {}) { + if (!names.length) return table; + + const limit = options.limit > 0 ? +options.limit : Infinity; + const index = options.index + ? options.index === true ? 'index' : options.index + '' + : null; + const drop = new Set(options.drop); + const get = aggregateGet(table, ops, exprs); + + // initialize output columns + const cols = columnSet(); + const nset = new Set(names); + const priors = []; + const copies = []; + const unroll = []; + + // original and copied columns + table.columnNames().forEach(name => { + if (!drop.has(name)) { + const col = cols.add(name, []); + if (!nset.has(name)) { + priors.push(table.column(name)); + copies.push(col); + } + } + }); + + // unrolled output columns + names.forEach(name => { + if (!drop.has(name)) { + if (!cols.has(name)) cols.add(name, []); + unroll.push(cols.data[name]); + } + }); + + // index column, if requested + const icol = index ? cols.add(index, []) : null; + + let start = 0; + const m = priors.length; + const n = unroll.length; + + const copy = (row, maxlen) => { + for (let i = 0; i < m; ++i) { + copies[i].length = start + maxlen; + copies[i].fill(priors[i].at(row), start, start + maxlen); + } + }; + + const indices = icol + ? (row, maxlen) => { + for (let i = 0; i < maxlen; ++i) { + icol[row + i] = i; + } + } + : () => {}; + + if (n === 1) { + // optimize common case of one array-valued column + const fn = get[0]; + const col = unroll[0]; + + table.scan((row, data) => { + // extract array data + const array = toArray(fn(row, data)); + const maxlen = Math.min(array.length, limit); + + // copy original table data + copy(row, maxlen); + + // copy unrolled array data + for (let j = 0; j < maxlen; ++j) { + col[start + j] = array[j]; + } + + // fill in array indices + indices(start, maxlen); + + start += maxlen; + }); + } else { + table.scan((row, data) => { + let maxlen = 0; + + // extract parallel array data + const arrays = get.map(fn => { + const value = toArray(fn(row, data)); + maxlen = Math.min(Math.max(maxlen, value.length), limit); + return value; + }); + + // copy original table data + copy(row, maxlen); + + // copy unrolled array data + for (let i = 0; i < n; ++i) { + const col = unroll[i]; + const arr = arrays[i]; + for (let j = 0; j < maxlen; ++j) { + col[start + j] = arr[j]; + } + } + + // fill in array indices + indices(start, maxlen); + + start += maxlen; + }); + } + + return cols.new(table); +} diff --git a/src/engine/window/window-state.js b/src/verbs/window/window-state.js similarity index 100% rename from src/engine/window/window-state.js rename to src/verbs/window/window-state.js diff --git a/src/engine/window/window.js b/src/verbs/window/window.js similarity index 95% rename from src/engine/window/window.js rename to src/verbs/window/window.js index 9b55b29d..979770c1 100644 --- a/src/engine/window/window.js +++ b/src/verbs/window/window.js @@ -11,10 +11,11 @@ const peersValue = op => !!op.peers; function windowOp(spec) { const { id, name, fields = [], params = [] } = spec; - const op = getWindow(name).create(...params); - if (fields.length) op.get = fields[0]; - op.id = id; - return op; + return { + ...getWindow(name).create(...params), + get: fields.length ? fields[0] : null, + id + }; } export function window(table, cols, exprs, result = {}, ops) { diff --git a/test/arrow/arrow-column-test.js b/test/arrow/arrow-column-test.js new file mode 100644 index 00000000..d9824377 --- /dev/null +++ b/test/arrow/arrow-column-test.js @@ -0,0 +1,112 @@ +import assert from 'node:assert'; +import arrowColumn from '../../src/arrow/arrow-column.js'; +import { + DateDay, DateMillisecond, Int64, tableFromIPC, vectorFromArray +} from 'apache-arrow'; + +describe('arrowColumn', () => { + it('converts date day data', () => { + const date = (y, m = 0, d = 1) => new Date(Date.UTC(y, m, d)); + const values = [ + date(2000, 0, 1), + date(2004, 10, 12), + date(2007, 3, 14), + date(2009, 6, 26), + date(2000, 0, 1), + date(2004, 10, 12), + date(2007, 3, 14), + date(2009, 6, 26), + date(2000, 0, 1), + date(2004, 10, 12) + ]; + const vec = vectorFromArray(values, new DateDay()); + const proxy = arrowColumn(vec); + + assert.deepStrictEqual( + Array.from(proxy), + values, + 'date day converted' + ); + assert.deepStrictEqual( + Array.from(arrowColumn(vec, { convertDate: false })), + values.map(v => +v), + 'date day unconverted' + ); + assert.ok(proxy.at(0) === proxy.at(0), 'data day object equality'); + }); + + it('converts date millisecond data', () => { + const date = (y, m = 0, d = 1) => new Date(Date.UTC(y, m, d)); + const values = [ + date(2000, 0, 1), + date(2004, 10, 12), + date(2007, 3, 14), + date(2009, 6, 26), + date(2000, 0, 1), + date(2004, 10, 12), + date(2007, 3, 14), + date(2009, 6, 26), + date(2000, 0, 1), + date(2004, 10, 12) + ]; + const vec = vectorFromArray(values, new DateMillisecond()); + const proxy = arrowColumn(vec); + + assert.deepStrictEqual( + Array.from(proxy), + values, + 'date millisecond converted' + ); + assert.deepStrictEqual( + Array.from(arrowColumn(vec, { convertDate: false })), + values.map(v => +v), + 'date millisecond unconverted' + ); + assert.ok(proxy.at(0) === proxy.at(0), 'data millisecond object equality'); + }); + + it('converts bigint data', () => { + const values = [0n, 1n, 2n, 3n, 10n, 1000n]; + const vec = vectorFromArray(values, new Int64()); + + assert.deepStrictEqual( + Array.from(arrowColumn(vec, { convertBigInt: true })), + values.map(v => Number(v)), + 'bigint converted' + ); + assert.deepStrictEqual( + Array.from(arrowColumn(vec)), + values, + 'bigint unconverted' + ); + }); + + it('converts decimal data', () => { + // encoded externally to sidestep arrow JS lib bugs: + // import pyarrow as pa + // v = pa.array([1, 12, 34], type=pa.decimal128(18, 3)) + // batch = pa.record_batch([v], names=['d']) + // sink = pa.BufferOutputStream() + // with pa.ipc.new_stream(sink, batch.schema) as writer: + // writer.write_batch(batch) + // sink.getvalue().hex() + const hex = 'FFFFFFFF780000001000000000000A000C000600050008000A000000000104000C000000080008000000040008000000040000000100000014000000100014000800060007000C00000010001000000000000107100000001C0000000400000000000000010000006400000008000C0004000800080000001200000003000000FFFFFFFF8800000014000000000000000C0016000600050008000C000C0000000003040018000000300000000000000000000A0018000C00040008000A0000003C00000010000000030000000000000000000000020000000000000000000000000000000000000000000000000000003000000000000000000000000100000003000000000000000000000000000000E8030000000000000000000000000000E02E0000000000000000000000000000D0840000000000000000000000000000FFFFFFFF00000000'; + const bytes = Uint8Array.from(hex.match(/.{1,2}/g).map(s => parseInt(s, 16))); + const vec = tableFromIPC(bytes).getChild('d'); + + assert.deepStrictEqual( + Array.from(arrowColumn(vec, { convertDecimal: true })), + [1, 12, 34], + 'decimal converted' + ); + assert.deepEqual( + Array.from(arrowColumn(vec, { convertDecimal: false })), + [ + Uint32Array.from([1000, 0, 0, 0]), + Uint32Array.from([12000, 0, 0, 0]), + Uint32Array.from([34000, 0, 0, 0 ]) + ], + 'decimal unconverted' + ); + }); +}); diff --git a/test/arrow/data-from-test.js b/test/arrow/data-from-test.js index 5eaf9eee..a320cd90 100644 --- a/test/arrow/data-from-test.js +++ b/test/arrow/data-from-test.js @@ -6,7 +6,7 @@ import { } from 'apache-arrow'; import { dataFromScan } from '../../src/arrow/encode/data-from.js'; import { scanTable } from '../../src/arrow/encode/scan.js'; -import { table } from '../../src/table/index.js'; +import { table } from '../../src/index.js'; function dataFromTable(table, column, type, nullable) { const nrows = table.numRows(); diff --git a/test/format/from-arrow-test.js b/test/arrow/from-arrow-test.js similarity index 88% rename from test/format/from-arrow-test.js rename to test/arrow/from-arrow-test.js index d2e38ca1..63fc054f 100644 --- a/test/format/from-arrow-test.js +++ b/test/arrow/from-arrow-test.js @@ -1,13 +1,13 @@ import assert from 'node:assert'; import { Utf8 } from 'apache-arrow'; import tableEqual from '../table-equal.js'; -import fromArrow from '../../src/format/from-arrow.js'; +import fromArrow from '../../src/arrow/from-arrow.js'; +import toArrow from '../../src/arrow/to-arrow.js'; import { not } from '../../src/helpers/selection.js'; -import { table } from '../../src/index.js'; -import { isFixedSizeList, isList, isStruct } from '../../src/arrow/arrow-types.js'; +import { table } from '../../src/index-browser.js'; function arrowTable(data, types) { - return table(data).toArrow({ types }); + return toArrow(table(data), { types }); } describe('fromArrow', () => { @@ -66,7 +66,7 @@ describe('fromArrow', () => { const l = [[1, 2, 3], null, [4, 5]]; const at = arrowTable({ l }); - if (!isList(at.getChild('l').type)) { + if (at.getChild('l').type.typeId !== 12) { assert.fail('Arrow column should have List type'); } tableEqual(fromArrow(at), { l }, 'extract Arrow list'); @@ -76,7 +76,7 @@ describe('fromArrow', () => { const l = [[1, 2], null, [4, 5]]; const at = arrowTable({ l }); - if (!isFixedSizeList(at.getChild('l').type)) { + if (at.getChild('l').type.typeId !== 16) { assert.fail('Arrow column should have FixedSizeList type'); } tableEqual(fromArrow(at), { l }, 'extract Arrow list'); @@ -86,7 +86,7 @@ describe('fromArrow', () => { const s = [{ foo: 1, bar: [2, 3] }, null, { foo: 2, bar: [4] }]; const at = arrowTable({ s }); - if (!isStruct(at.getChild('s').type)) { + if (at.getChild('s').type.typeId !== 13) { assert.fail('Arrow column should have Struct type'); } tableEqual(fromArrow(at), { s }, 'extract Arrow struct'); @@ -96,7 +96,7 @@ describe('fromArrow', () => { const s = [{ foo: 1, bar: { bop: 2 } }, { foo: 2, bar: { bop: 3 } }]; const at = arrowTable({ s }); - if (!isStruct(at.getChild('s').type)) { + if (at.getChild('s').type.typeId !== 13) { assert.fail('Arrow column should have Struct type'); } tableEqual(fromArrow(at), { s }, 'extract nested Arrow struct'); diff --git a/test/format/to-arrow-test.js b/test/arrow/to-arrow-test.js similarity index 84% rename from test/format/to-arrow-test.js rename to test/arrow/to-arrow-test.js index 94989f38..1858f4d0 100644 --- a/test/format/to-arrow-test.js +++ b/test/arrow/to-arrow-test.js @@ -1,11 +1,11 @@ import assert from 'node:assert'; import { readFileSync } from 'node:fs'; -import { Int8, Type, tableFromIPC, tableToIPC, vectorFromArray } from 'apache-arrow'; -import fromArrow from '../../src/format/from-arrow.js'; -import fromCSV from '../../src/format/from-csv.js'; -import fromJSON from '../../src/format/from-json.js'; -import toArrow from '../../src/format/to-arrow.js'; -import { table } from '../../src/table/index.js'; +import { + Int8, Type, tableFromIPC, tableToIPC, vectorFromArray +} from 'apache-arrow'; +import { + fromArrow, fromCSV, fromJSON, table, toArrow, toArrowIPC, toJSON +} from '../../src/index.js'; function date(year, month=0, date=1, hours=0, minutes=0, seconds=0, ms=0) { return new Date(year, month, date, hours, minutes, seconds, ms); @@ -38,8 +38,8 @@ function compareColumns(name, aqt, art) { const arc = art.getChild(name); const err = []; for (let i = 0; i < idx.length; ++i) { - let v1 = normalize(aqc.get(idx[i])); - let v2 = normalize(arc.get(i)); + let v1 = normalize(aqc.at(idx[i])); + let v2 = normalize(arc.at(i)); if (isArrayType(v1)) { v1 = v1.join(); v2 = [...v2].join(); @@ -71,7 +71,7 @@ describe('toArrow', () => { o: [1, 2, 3, null, 5, 6].map(v => v ? { key: v } : null) }); - const at = dt.toArrow(); + const at = toArrow(dt); assert.equal( compareTables(dt, at), 0, @@ -100,7 +100,7 @@ describe('toArrow', () => { it('produces Arrow data for an input CSV', async () => { const dt = fromCSV(readFileSync('test/format/data/beers.csv', 'utf8')); const st = dt.derive({ name: d => d.name + '' }); - const at = dt.toArrow(); + const at = toArrow(dt); assert.equal( compareTables(st, at), 0, @@ -126,7 +126,7 @@ describe('toArrow', () => { }); it('handles ambiguously typed data', async () => { - const at = table({ x: [1, 2, 3, 'foo'] }).toArrow(); + const at = toArrow(table({ x: [1, 2, 3, 'foo'] })); assert.deepEqual( [...at.getChild('x')], ['1', '2', '3', 'foo'], @@ -134,7 +134,7 @@ describe('toArrow', () => { ); assert.throws( - () => table({ x: [1, 2, 3, true] }).toArrow(), + () => toArrow(table({ x: [1, 2, 3, true] })), 'fail on mixed types' ); }); @@ -143,14 +143,14 @@ describe('toArrow', () => { const dt = fromCSV(readFileSync('test/format/data/beers.csv', 'utf8')) .derive({ name: d => d.name + '' }); - const json = dt.toJSON(); + const json = toJSON(dt); const jt = fromJSON(json); - const bytes = tableToIPC(dt.toArrow()); + const bytes = tableToIPC(toArrow(dt)); const bt = fromArrow(tableFromIPC(bytes)); assert.deepEqual( - [bt.toJSON(), jt.toJSON()], + [toJSON(bt), toJSON(jt)], [json, json], 'arrow and json round trips match' ); @@ -164,7 +164,7 @@ describe('toArrow', () => { z: [true, true, false] }); - const at = dt.toArrow({ columns: ['w', 'y'] }); + const at = toArrow(dt, { columns: ['w', 'y'] }); assert.deepEqual( at.schema.fields.map(f => f.name), @@ -182,17 +182,17 @@ describe('toArrow', () => { }); assert.equal( - JSON.stringify([...dt.toArrow({ limit: 2 })]), + JSON.stringify([...toArrow(dt, { limit: 2 })]), '[{"w":"a","x":1,"y":1.6181,"z":true},{"w":"b","x":2,"y":2.7182,"z":true}]', 'limit' ); assert.equal( - JSON.stringify([...dt.toArrow({ offset: 1 })]), + JSON.stringify([...toArrow(dt, { offset: 1 })]), '[{"w":"b","x":2,"y":2.7182,"z":true},{"w":"a","x":3,"y":3.1415,"z":false}]', 'offset' ); assert.equal( - JSON.stringify([...dt.toArrow({ offset: 1, limit: 1 })]), + JSON.stringify([...toArrow(dt, { offset: 1, limit: 1 })]), '[{"w":"b","x":2,"y":2.7182,"z":true}]', 'limit and offset' ); @@ -206,7 +206,7 @@ describe('toArrow', () => { z: [true, true, false] }); - const at = dt.toArrow({ + const at = toArrow(dt, { types: { w: Type.Utf8, x: Type.Int32, y: Type.Float32 } }); @@ -222,7 +222,7 @@ describe('toArrow', () => { }); }); -describe('toArrowBuffer', () => { +describe('toArrowIPC', () => { it('generates the correct output for file option', () => { const dt = table({ w: ['a', 'b', 'a'], @@ -231,7 +231,7 @@ describe('toArrowBuffer', () => { z: [true, true, false] }); - const buffer = dt.toArrowBuffer({ format: 'file' }); + const buffer = toArrowIPC(dt, { format: 'file' }); assert.deepEqual( buffer.slice(0, 8), @@ -247,7 +247,7 @@ describe('toArrowBuffer', () => { z: [true, true, false] }); - const buffer = dt.toArrowBuffer({ format: 'stream' }); + const buffer = toArrowIPC(dt, { format: 'stream' }); assert.deepEqual( buffer.slice(0, 8), @@ -263,7 +263,7 @@ describe('toArrowBuffer', () => { z: [true, true, false] }); - const buffer = dt.toArrowBuffer(); + const buffer = toArrowIPC(dt); assert.deepEqual( buffer.slice(0, 8), @@ -279,7 +279,7 @@ describe('toArrowBuffer', () => { y: [1.6181, 2.7182, 3.1415], z: [true, true, false] }); - dt.toArrowBuffer({ format: 'nonsense' }); + toArrowIPC(dt, { format: 'nonsense' }); }, 'Unrecognized output format'); }); }); diff --git a/test/expression/params-test.js b/test/expression/params-test.js index 35497b7b..acb7b578 100644 --- a/test/expression/params-test.js +++ b/test/expression/params-test.js @@ -1,7 +1,6 @@ import assert from 'node:assert'; import tableEqual from '../table-equal.js'; -import op from '../../src/op/op-api.js'; -import { table } from '../../src/table/index.js'; +import { op, table} from '../../src/index.js'; describe('parse with params', () => { it('supports table expression with parameter arg', () => { diff --git a/test/expression/parse-test.js b/test/expression/parse-test.js index 9b586f09..76686eb2 100644 --- a/test/expression/parse-test.js +++ b/test/expression/parse-test.js @@ -1,7 +1,5 @@ import assert from 'node:assert'; -import parse from '../../src/expression/parse.js'; -import op from '../../src/op/op-api.js'; -import rolling from '../../src/helpers/rolling.js'; +import { op, parse, rolling } from '../../src/index.js'; // pass code through for testing const compiler = { param: x => x, expr: x => x }; @@ -10,11 +8,11 @@ function test(input) { const { ops, names, exprs } = parse(input, { compiler }); assert.deepEqual(ops, [ - { name: 'mean', fields: ['data.a.get(row)'], params: [], id: 0 }, - { name: 'corr', fields: ['data.a.get(row)', 'data.b.get(row)'], params: [], id: 1}, - { name: 'quantile', fields: ['(-data.bar.get(row))'], params: ['(0.5 / 2)'], id: 2}, - { name: 'lag', fields: ['data.value.get(row)'], params: [2], id: 3 }, - { name: 'mean', fields: ['data.value.get(row)'], params: [], frame: [-3, 3], peers: false, id: 4 }, + { name: 'mean', fields: ['data.a.at(row)'], params: [], id: 0 }, + { name: 'corr', fields: ['data.a.at(row)', 'data.b.at(row)'], params: [], id: 1}, + { name: 'quantile', fields: ['(-data.bar.at(row))'], params: ['(0.5 / 2)'], id: 2}, + { name: 'lag', fields: ['data.value.at(row)'], params: [2], id: 3 }, + { name: 'mean', fields: ['data.value.at(row)'], params: [], frame: [-3, 3], peers: false, id: 4 }, { name: 'count', fields: [], params: [], frame: [-3, 3], peers: true, id: 5 } ], 'parsed operators'); @@ -31,11 +29,11 @@ function test(input) { assert.deepEqual(exprs, [ '(1 + 1)', - '(data.a.get(row) * data.b.get(row))', + '(data.a.at(row) * data.b.at(row))', 'op(0,row)', 'op(1,row)', '(1 + op(2,row))', - '(data.value.get(row) - op(3,row))', + '(data.value.at(row) - op(3,row))', 'op(4,row)', 'op(5,row)' ], 'parsed output expressions'); @@ -87,19 +85,19 @@ describe('parse', () => { it('parses expressions with Math object', () => { assert.equal( parse({ f: d => Math.sqrt(d.x) }).exprs[0] + '', - '(row,data,op)=>fn.sqrt(data.x.get(row))', + '(row,data,op)=>fn.sqrt(data.x.at(row))', 'parse Math.sqrt' ); assert.equal( parse({ f: d => Math.max(d.x) }).exprs[0] + '', - '(row,data,op)=>fn.greatest(data.x.get(row))', + '(row,data,op)=>fn.greatest(data.x.at(row))', 'parse Math.max, rewrite as greatest' ); assert.equal( parse({ f: d => Math.min(d.x) }).exprs[0] + '', - '(row,data,op)=>fn.least(data.x.get(row))', + '(row,data,op)=>fn.least(data.x.at(row))', 'parse Math.min, rewrite as least' ); }); @@ -163,19 +161,19 @@ describe('parse', () => { it('parses column references with nested properties', () => { assert.equal( parse({ f: d => d.x.y }).exprs[0] + '', - '(row,data,op)=>data.x.get(row).y', + '(row,data,op)=>data.x.at(row).y', 'parsed nested members' ); assert.equal( parse({ f: d => d['x'].y }).exprs[0] + '', - '(row,data,op)=>data["x"].get(row).y', + '(row,data,op)=>data["x"].at(row).y', 'parsed nested members' ); assert.equal( parse({ f: d => d['x']['y'] }).exprs[0] + '', - '(row,data,op)=>data["x"].get(row)[\'y\']', + '(row,data,op)=>data["x"].at(row)[\'y\']', 'parsed nested members' ); }); @@ -184,7 +182,7 @@ describe('parse', () => { // direct expression assert.equal( parse({ f: d => d['x' + 'y'] }).exprs[0] + '', - '(row,data,op)=>data["xy"].get(row)', + '(row,data,op)=>data["xy"].at(row)', 'parsed indirect member as expression' ); @@ -197,7 +195,7 @@ describe('parse', () => { }; assert.equal( parse({ f: (d, $) => d[$.col] }, opt).exprs[0] + '', - '(row,data,op)=>data["a"].get(row)', + '(row,data,op)=>data["a"].at(row)', 'parsed indirect member as param' ); @@ -257,7 +255,7 @@ describe('parse', () => { const { exprs } = parse({ f: d => ({ [d.x]: d.y }) }); assert.equal( exprs[0] + '', - '(row,data,op)=>({[data.x.get(row)]:data.y.get(row)})', + '(row,data,op)=>({[data.x.at(row)]:data.y.at(row)})', 'parsed computed object property' ); }); @@ -266,7 +264,7 @@ describe('parse', () => { const { exprs } = parse({ f: d => `${d.x} + ${d.y}` }); assert.equal( exprs[0] + '', - '(row,data,op)=>`${data.x.get(row)} + ${data.y.get(row)}`', + '(row,data,op)=>`${data.x.at(row)} + ${data.y.at(row)}`', 'parsed template literal' ); }); @@ -282,7 +280,7 @@ describe('parse', () => { names: [ 'val' ], exprs: [ '{const s=op(0,row);return (s * s);}' ], ops: [ - { name: 'sum', fields: [ 'data.a.get(row)' ], params: [], id: 0 } + { name: 'sum', fields: [ 'data.a.at(row)' ], params: [], id: 0 } ] }, 'parsed block' @@ -686,11 +684,11 @@ describe('parse', () => { { names: [ 'ref', 'nest', 'destr', 'l_lte', 'r_lte' ], exprs: [ - 'data.v.get(row)', - "(data.v.get(row).x === 'a')", - "(data.v.get(row).x === 'a')", - "(data.v.get(row) <= 'a')", - "('a' <= data.v.get(row))" + 'data.v.at(row)', + "(data.v.at(row).x === 'a')", + "(data.v.at(row).x === 'a')", + "(data.v.at(row) <= 'a')", + "('a' <= data.v.at(row))" ], ops: [] }, diff --git a/test/format/from-csv-test.js b/test/format/from-csv-test.js index 931b8b86..ae48ea29 100644 --- a/test/format/from-csv-test.js +++ b/test/format/from-csv-test.js @@ -1,6 +1,6 @@ import assert from 'node:assert'; import tableEqual from '../table-equal.js'; -import fromCSV from '../../src/format/from-csv.js'; +import { fromCSV } from '../../src/index.js'; function data() { return { diff --git a/test/format/from-fixed-test.js b/test/format/from-fixed-test.js index 7e878eb2..31d846a6 100644 --- a/test/format/from-fixed-test.js +++ b/test/format/from-fixed-test.js @@ -1,6 +1,6 @@ import assert from 'node:assert'; import tableEqual from '../table-equal.js'; -import fromFixed from '../../src/format/from-fixed.js'; +import { fromFixed } from '../../src/index.js'; function data() { return { diff --git a/test/format/from-json-test.js b/test/format/from-json-test.js index 512404c1..c01f7d8b 100644 --- a/test/format/from-json-test.js +++ b/test/format/from-json-test.js @@ -1,6 +1,6 @@ import assert from 'node:assert'; import tableEqual from '../table-equal.js'; -import fromJSON from '../../src/format/from-json.js'; +import { fromJSON } from '../../src/index.js'; function data() { return { @@ -91,6 +91,6 @@ describe('fromJSON', () => { ]; const json = '{"v":' + JSON.stringify(str) + '}'; const table = fromJSON(json); - assert.deepEqual(table.column('v').data, values, 'column values'); + assert.deepEqual(table.column('v'), values, 'column values'); }); }); diff --git a/test/format/load-file-test.js b/test/format/load-file-test.js index 789289da..c5d28386 100644 --- a/test/format/load-file-test.js +++ b/test/format/load-file-test.js @@ -1,5 +1,5 @@ import assert from 'node:assert'; -import { load, loadArrow, loadCSV, loadJSON } from '../../src/format/load-file.js'; +import { load, loadArrow, loadCSV, loadJSON } from '../../src/index.js'; const PATH = 'test/format/data'; diff --git a/test/format/load-file-url-test.js b/test/format/load-file-url-test.js index 32e700f9..5a8bb427 100644 --- a/test/format/load-file-url-test.js +++ b/test/format/load-file-url-test.js @@ -1,5 +1,5 @@ import assert from 'node:assert'; -import { load, loadArrow, loadCSV, loadJSON } from '../../src/format/load-file.js'; +import { load, loadArrow, loadCSV, loadJSON } from '../../src/index.js'; describe('load file url', () => { it('loads from a URL', async () => { diff --git a/test/format/load-url-test.js b/test/format/load-url-test.js index 1d4cd9ab..21edc25b 100644 --- a/test/format/load-url-test.js +++ b/test/format/load-url-test.js @@ -1,6 +1,6 @@ import assert from 'node:assert'; import fetch from 'node-fetch'; -import { load, loadArrow, loadCSV, loadJSON } from '../../src/format/load-url.js'; +import { load, loadArrow, loadCSV, loadJSON } from '../../src/index-browser.js'; // add global fetch to emulate DOM environment global.fetch = fetch; diff --git a/test/format/to-csv-test.js b/test/format/to-csv-test.js index 1ee6f9ec..661ca0a7 100644 --- a/test/format/to-csv-test.js +++ b/test/format/to-csv-test.js @@ -1,7 +1,5 @@ import assert from 'node:assert'; -import BitSet from '../../src/table/bit-set.js'; -import ColumnTable from '../../src/table/column-table.js'; -import toCSV from '../../src/format/to-csv.js'; +import { BitSet, ColumnTable, toCSV } from '../../src/index.js'; function data() { return { diff --git a/test/format/to-html-test.js b/test/format/to-html-test.js index 1a983dd0..29ad77bb 100644 --- a/test/format/to-html-test.js +++ b/test/format/to-html-test.js @@ -1,6 +1,5 @@ import assert from 'node:assert'; -import ColumnTable from '../../src/table/column-table.js'; -import toHTML from '../../src/format/to-html.js'; +import { ColumnTable, toHTML } from '../../src/index.js'; describe('toHTML', () => { it('formats html table text', () => { diff --git a/test/format/to-json-test.js b/test/format/to-json-test.js index 079ff0a1..4c1ba2ce 100644 --- a/test/format/to-json-test.js +++ b/test/format/to-json-test.js @@ -1,6 +1,5 @@ import assert from 'node:assert'; -import ColumnTable from '../../src/table/column-table.js'; -import toJSON from '../../src/format/to-json.js'; +import { ColumnTable, toJSON } from '../../src/index.js'; function data() { return { diff --git a/test/format/to-markdown-test.js b/test/format/to-markdown-test.js index dd1ece3c..52368b6f 100644 --- a/test/format/to-markdown-test.js +++ b/test/format/to-markdown-test.js @@ -1,6 +1,5 @@ import assert from 'node:assert'; -import ColumnTable from '../../src/table/column-table.js'; -import toMarkdown from '../../src/format/to-markdown.js'; +import { ColumnTable, toMarkdown } from '../../src/index.js'; describe('toMarkdown', () => { it('formats markdown table text', () => { diff --git a/test/helpers/escape-test.js b/test/helpers/escape-test.js index bf5d4cb0..93c32663 100644 --- a/test/helpers/escape-test.js +++ b/test/helpers/escape-test.js @@ -1,6 +1,6 @@ import assert from 'node:assert'; import tableEqual from '../table-equal.js'; -import { escape, op, query, table } from '../../src/index.js'; +import { escape, op, table } from '../../src/index.js'; describe('escape', () => { it('derive supports escaped functions', () => { @@ -80,18 +80,4 @@ describe('escape', () => { 'pivot throws on escaped function' ); }); - - it('query serialization throws for escaped functions', () => { - const sq = d => d.a * d.a; - - assert.throws( - () => query().derive({ z: escape(sq) }).toObject(), - 'query toObject throws on escaped function' - ); - - assert.throws( - () => query().derive({ z: escape(sq) }).toAST(), - 'query toAST throws on escape function' - ); - }); }); diff --git a/test/op/register-test.js b/test/op/register-test.js new file mode 100644 index 00000000..522c8f01 --- /dev/null +++ b/test/op/register-test.js @@ -0,0 +1,93 @@ +import assert from 'node:assert'; +import tableEqual from '../table-equal.js'; +import { + aggregateFunctions, + functions, + windowFunctions +} from '../../src/op/index.js'; +import { + addAggregateFunction, + addFunction, + addWindowFunction +} from '../../src/op/register.js'; +import { op, table } from '../../src/index-browser.js'; + +describe('register', () => { + it('addFunction registers new function', () => { + const SECRET = 0xDEADBEEF; + function secret() { return 0xDEADBEEF; } + + addFunction(secret); + addFunction('sssh', secret); + assert.equal(functions.secret(), SECRET, 'add implicitly named function'); + assert.equal(functions.sssh(), SECRET, 'add explicitly named function'); + + assert.throws( + () => addFunction(() => 'foo'), + 'do not accept anonymous functions' + ); + + assert.throws( + () => addFunction('abs', val => val < 0 ? -val : val), + 'do not overwrite existing functions' + ); + + const abs = op.abs; + assert.doesNotThrow( + () => { + addFunction('abs', val => val < 0 ? -val : val, { override: true }); + addFunction('abs', abs, { override: true }); + }, + 'support override option' + ); + }); + + it('addAggregateFunction registers new aggregate function', () => { + const create = () => ({ + init: s => (s.altsign = -1, s.altsum = 0), + add: (s, v) => s.altsum += (s.altsign *= -1) * v, + rem: () => {}, + value: s => s.altsum + }); + + addAggregateFunction('altsum', { create, param: [1, 0] }); + assert.deepEqual( + aggregateFunctions.altsum, + { create, param: [1, 0] }, + 'register aggregate function' + ); + assert.equal( + table({ x: [1, 2, 3, 4, 5]}).rollup({ a: d => op.altsum(d.x) }).get('a', 0), + 3, 'evaluate aggregate function' + ); + + assert.throws( + () => addAggregateFunction('mean', { create }), + 'do not overwrite existing function' + ); + }); + + it('addWindowFunction registers new window function', () => { + const create = (offset) => ({ + init: () => {}, + value: (w, f) => w.value(w.index, f) - w.index + (offset || 0) + }); + + addWindowFunction('vmi', { create, param: [1, 1] }); + assert.deepEqual( + windowFunctions.vmi, + { create, param: [1, 1] }, + 'register window function' + ); + tableEqual( + table({ x: [1, 2, 3, 4, 5] }).derive({ a: d => op.vmi(d.x, 1) }).select('a'), + { a: [2, 2, 2, 2, 2] }, + 'evaluate window function' + ); + + assert.throws( + () => addWindowFunction('rank', { create }), + 'do not overwrite existing function' + ); + }); +}); diff --git a/test/query/query-test.js b/test/query/query-test.js deleted file mode 100644 index 896a6fb9..00000000 --- a/test/query/query-test.js +++ /dev/null @@ -1,1094 +0,0 @@ -import assert from 'node:assert'; -import groupbyEqual from '../groupby-equal.js'; -import tableEqual from '../table-equal.js'; -import Query, { query } from '../../src/query/query.js'; -import { Verbs } from '../../src/query/verb.js'; -import isFunction from '../../src/util/is-function.js'; -import { all, desc, not, op, range, rolling, seed, table } from '../../src/index.js'; -import { field, func } from './util.js'; - -const { - count, dedupe, derive, filter, groupby, orderby, - reify, rollup, select, sample, ungroup, unorder, - relocate, rename, impute, fold, pivot, spread, unroll, - cross, join, semijoin, antijoin, - concat, union, except, intersect -} = Verbs; - -describe('Query', () => { - it('builds single-table queries', () => { - const q = query() - .derive({ bar: d => d.foo + 1 }) - .rollup({ count: op.count(), sum: op.sum('bar') }) - .orderby('foo', desc('bar'), d => d.baz, desc(d => d.bop)) - .groupby('foo', { baz: d => d.baz, bop: d => d.bop }); - - assert.deepEqual(q.toObject(), { - verbs: [ - { - verb: 'derive', - values: { bar: func('d => d.foo + 1') }, - options: undefined - }, - { - verb: 'rollup', - values: { - count: func('d => op.count()'), - sum: func('d => op.sum(d["bar"])') - } - }, - { - verb: 'orderby', - keys: [ - field('foo'), - field('bar', { desc: true }), - func('d => d.baz'), - func('d => d.bop', { desc: true }) - ] - }, - { - verb: 'groupby', - keys: [ - 'foo', - { - baz: func('d => d.baz'), - bop: func('d => d.bop') - } - ] - } - ] - }, 'serialized query from builder'); - }); - - it('supports multi-table verbs', () => { - const q = query() - .concat('concat_table') - .join('join_table'); - - assert.deepEqual(q.toObject(), { - verbs: [ - { - verb: 'concat', - tables: ['concat_table'] - }, - { - verb: 'join', - table: 'join_table', - on: undefined, - values: undefined, - options: undefined - } - ] - }, 'serialized query from builder'); - }); - - it('supports multi-table queries', () => { - const qc = query('concat_table') - .select(not('foo')); - - const qj = query('join_table') - .select(not('bar')); - - const q = query() - .concat(qc) - .join(qj); - - assert.deepEqual(q.toObject(), { - verbs: [ - { - verb: 'concat', - tables: [ qc.toObject() ] - }, - { - verb: 'join', - table: qj.toObject(), - on: undefined, - values: undefined, - options: undefined - } - ] - }, 'serialized query from builder'); - }); - - it('supports all defined verbs', () => { - const verbs = Object.keys(Verbs); - const q = query(); - assert.equal( - verbs.filter(v => isFunction(q[v])).length, - verbs.length, - 'query builder supports all verbs' - ); - }); - - it('serializes to objects', () => { - const q = new Query([ - derive({ bar: d => d.foo + 1 }), - rollup({ - count: op.count(), - sum: op.sum('bar') - }), - orderby(['foo', desc('bar'), d => d.baz, desc(d => d.bop)]), - groupby(['foo', { baz: d => d.baz, bop: d => d.bop }]) - ]); - - assert.deepEqual(q.toObject(), { - verbs: [ - { - verb: 'derive', - values: { bar: func('d => d.foo + 1') }, - options: undefined - }, - { - verb: 'rollup', - values: { - count: func('d => op.count()'), - sum: func('d => op.sum(d["bar"])') - } - }, - { - verb: 'orderby', - keys: [ - field('foo'), - field('bar', { desc: true }), - func('d => d.baz'), - func('d => d.bop', { desc: true }) - ] - }, - { - verb: 'groupby', - keys: [ - 'foo', - { - baz: func('d => d.baz'), - bop: func('d => d.bop') - } - ] - } - ] - }, 'serialized query'); - }); - - it('evaluates unmodified inputs', () => { - const q = new Query([ - derive({ bar: (d, $) => d.foo + $.offset }), - rollup({ count: op.count(), sum: op.sum('bar') }) - ], { offset: 1}); - - const dt = table({ foo: [0, 1, 2, 3] }); - const dr = q.evaluate(dt); - - tableEqual(dr, { count: [4], sum: [10] }, 'query data'); - }); - - it('evaluates serialized inputs', () => { - const dt = table({ - foo: [0, 1, 2, 3], - bar: [1, 1, 0, 0] - }); - - tableEqual( - Query.from( - new Query([ - derive({ baz: (d, $) => d.foo + $.offset }), - orderby(['bar', 0]), - select([not('bar')]) - ], { offset: 1 }).toObject() - ).evaluate(dt), - { foo: [ 2, 3, 0, 1 ], baz: [ 3, 4, 1, 2 ] }, - 'serialized query data' - ); - - tableEqual( - Query.from( - new Query([ - derive({ bar: (d, $) => d.foo + $.offset }), - rollup({ count: op.count(), sum: op.sum('bar') }) - ], { offset: 1 }).toObject() - ).evaluate(dt), - { count: [4], sum: [10] }, - 'serialized query data' - ); - }); - - it('evaluates count verbs', () => { - const dt = table({ - foo: [0, 1, 2, 3], - bar: [1, 1, 0, 0] - }); - - tableEqual( - Query.from( - new Query([count()]).toObject() - ).evaluate(dt), - { count: [4] }, - 'count query result' - ); - - tableEqual( - Query.from( - new Query([count({ as: 'cnt' })]).toObject() - ).evaluate(dt), - { cnt: [4] }, - 'count query result, with options' - ); - }); - - it('evaluates dedupe verbs', () => { - const dt = table({ - foo: [0, 1, 2, 3], - bar: [1, 1, 0, 0] - }); - - tableEqual( - Query.from( - new Query([dedupe([])]).toObject() - ).evaluate(dt), - { foo: [0, 1, 2, 3], bar: [1, 1, 0, 0] }, - 'dedupe query result' - ); - - tableEqual( - Query.from( - new Query([dedupe(['bar'])]).toObject() - ).evaluate(dt), - { foo: [0, 2], bar: [1, 0] }, - 'dedupe query result, key' - ); - - tableEqual( - Query.from( - new Query([dedupe([not('foo')])]).toObject() - ).evaluate(dt), - { foo: [0, 2], bar: [1, 0] }, - 'dedupe query result, key selection' - ); - }); - - it('evaluates derive verbs', () => { - const dt = table({ - foo: [0, 1, 2, 3], - bar: [1, 1, 0, 0] - }); - - const verb = derive( - { - baz: d => d.foo + 1 - op.mean(d.foo), - bop: 'd => 2 * (d.foo - op.mean(d.foo))', - sum: rolling(d => op.sum(d.foo)), - win: rolling(d => op.product(d.foo), [0, 1]) - }, - { - before: 'bar' - } - ); - - tableEqual( - Query.from( - new Query([verb]).toObject() - ).evaluate(dt), - { - foo: [0, 1, 2, 3], - baz: [-0.5, 0.5, 1.5, 2.5], - bop: [-3, -1, 1, 3], - sum: [0, 1, 3, 6], - win: [0, 2, 6, 3], - bar: [1, 1, 0, 0] - }, - 'derive query result' - ); - }); - - it('evaluates filter verbs', () => { - const dt = table({ - foo: [0, 1, 2, 3], - bar: [1, 1, 0, 0] - }); - - const verb = filter(d => d.bar > 0); - - tableEqual( - Query.from( - new Query([verb]).toObject() - ).evaluate(dt), - { - foo: [0, 1], - bar: [1, 1] - }, - 'filter query result' - ); - }); - - it('evaluates groupby verbs', () => { - const dt = table({ - foo: [0, 1, 2, 3], - bar: [1, 1, 0, 0] - }); - - groupbyEqual( - Query.from( - new Query([groupby(['bar'])]).toObject() - ).evaluate(dt), - dt.groupby('bar'), - 'groupby query result' - ); - - groupbyEqual( - Query.from( - new Query([groupby([{bar: d => d.bar}])]).toObject() - ).evaluate(dt), - dt.groupby('bar'), - 'groupby query result, table expression' - ); - - groupbyEqual( - Query.from( - new Query([groupby([not('foo')])]).toObject() - ).evaluate(dt), - dt.groupby('bar'), - 'groupby query result, selection' - ); - }); - - it('evaluates orderby verbs', () => { - const dt = table({ - foo: [0, 1, 2, 3], - bar: [1, 1, 0, 0] - }); - - tableEqual( - Query.from( - new Query([orderby(['bar', 'foo'])]).toObject() - ).evaluate(dt), - { - foo: [2, 3, 0, 1], - bar: [0, 0, 1, 1] - }, - 'orderby query result' - ); - - tableEqual( - Query.from( - new Query([orderby([ - d => d.bar, - d => d.foo - ])]).toObject() - ).evaluate(dt), - { - foo: [2, 3, 0, 1], - bar: [0, 0, 1, 1] - }, - 'orderby query result' - ); - - tableEqual( - Query.from( - new Query([orderby([desc('bar'), desc('foo')])]).toObject() - ).evaluate(dt), - { - foo: [1, 0, 3, 2], - bar: [1, 1, 0, 0] - }, - 'orderby query result, desc' - ); - }); - - it('evaluates reify verbs', () => { - const dt = table({ - foo: [0, 1, 2, 3], - bar: [1, 1, 0, 0] - }).filter(d => d.foo < 1); - - tableEqual( - Query.from( - new Query([ reify() ]).toObject() - ).evaluate(dt), - { foo: [0], bar: [1] }, - 'reify query result' - ); - }); - - it('evaluates relocate verbs', () => { - const a = [1], b = [2], c = [3], d = [4]; - const dt = table({ a, b, c, d }); - - tableEqual( - Query.from( - new Query([ - relocate('b', { after: 'b' }) - ]).toObject() - ).evaluate(dt), - { a, c, d, b }, - 'relocate query result' - ); - - tableEqual( - Query.from( - new Query([ - relocate(not('b', 'd'), { before: range(0, 1) }) - ]).toObject() - ).evaluate(dt), - { a, c, b, d }, - 'relocate query result' - ); - }); - - it('evaluates rename verbs', () => { - const a = [1], b = [2], c = [3], d = [4]; - const dt = table({ a, b, c, d }); - - tableEqual( - Query.from( - new Query([ - rename({ d: 'w', a: 'z' }) - ]).toObject() - ).evaluate(dt), - { z: a, b, c, w: d }, - 'rename query result' - ); - }); - - it('evaluates rollup verbs', () => { - const dt = table({ - foo: [0, 1, 2, 3], - bar: [1, 1, 0, 0] - }); - - tableEqual( - Query.from( - new Query([rollup({ - count: op.count(), - sum: op.sum('foo'), - sump1: d => 1 + op.sum(d.foo + d.bar), - avgt2: 'd => 2 * op.mean(op.abs(d.foo))' - })]).toObject() - ).evaluate(dt), - { count: [4], sum: [6], sump1: [9], avgt2: [3] }, - 'rollup query result' - ); - }); - - it('evaluates sample verbs', () => { - seed(12345); - - const dt = table({ - foo: [0, 1, 2, 3], - bar: [1, 1, 0, 0] - }); - - tableEqual( - Query.from( - new Query([sample(2)]).toObject() - ).evaluate(dt), - { foo: [ 3, 1 ], bar: [ 0, 1 ] }, - 'sample query result' - ); - - tableEqual( - Query.from( - new Query([sample(2, { replace: true })]).toObject() - ).evaluate(dt), - { foo: [ 3, 0 ], bar: [ 0, 1 ] }, - 'sample query result, replace' - ); - - tableEqual( - Query.from( - new Query([sample(2, { weight: 'foo' })]).toObject() - ).evaluate(dt), - { foo: [ 2, 3 ], bar: [ 0, 0 ] }, - 'sample query result, weight column name' - ); - - tableEqual( - Query.from( - new Query([sample(2, { weight: d => d.foo })]).toObject() - ).evaluate(dt), - { foo: [ 3, 2 ], bar: [ 0, 0 ] }, - 'sample query result, weight table expression' - ); - - seed(null); - }); - - it('evaluates select verbs', () => { - const dt = table({ - foo: [0, 1, 2, 3], - bar: [1, 1, 0, 0] - }); - - tableEqual( - Query.from( - new Query([select(['bar'])]).toObject() - ).evaluate(dt), - { bar: [1, 1, 0, 0] }, - 'select query result, column name' - ); - - tableEqual( - Query.from( - new Query([select([all()])]).toObject() - ).evaluate(dt), - { foo: [0, 1, 2, 3], bar: [1, 1, 0, 0] }, - 'select query result, all' - ); - - tableEqual( - Query.from( - new Query([select([not('foo')])]).toObject() - ).evaluate(dt), - { bar: [1, 1, 0, 0] }, - 'select query result, not' - ); - - tableEqual( - Query.from( - new Query([select([range(1, 1)])]).toObject() - ).evaluate(dt), - { bar: [1, 1, 0, 0] }, - 'select query result, range' - ); - }); - - it('evaluates ungroup verbs', () => { - const dt = table({ - foo: [0, 1, 2, 3], - bar: [1, 1, 0, 0] - }).groupby('bar'); - - const qt = Query - .from( new Query([ ungroup() ]).toObject() ) - .evaluate(dt); - - assert.equal(qt.isGrouped(), false, 'table is not grouped'); - }); - - it('evaluates unorder verbs', () => { - const dt = table({ - foo: [0, 1, 2, 3], - bar: [1, 1, 0, 0] - }).orderby('foo'); - - const qt = Query - .from( new Query([ unorder() ]).toObject() ) - .evaluate(dt); - - assert.equal(qt.isOrdered(), false, 'table is not ordered'); - }); - - it('evaluates impute verbs', () => { - const dt = table({ - x: [1, 2], - y: [3, 4], - z: [1, 1] - }); - - const imputed = { - x: [1, 2, 1, 2], - y: [3, 4, 4, 3], - z: [1, 1, 0, 0] - }; - - const verb = impute( - { z: () => 0 }, - { expand: ['x', 'y'] } - ); - - tableEqual( - Query.from( - new Query([verb]).toObject() - ).evaluate(dt), - imputed, - 'impute query result' - ); - }); - - it('evaluates fold verbs', () => { - const dt = table({ - foo: [0, 1, 2, 3], - bar: [1, 1, 0, 0] - }); - - const folded = { - key: [ 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar' ], - value: [ 0, 1, 1, 1, 2, 0, 3, 0 ] - }; - - tableEqual( - Query.from( - new Query([fold(['foo', 'bar'])]).toObject() - ).evaluate(dt), - folded, - 'fold query result, column names' - ); - - tableEqual( - Query.from( - new Query([fold([all()])]).toObject() - ).evaluate(dt), - folded, - 'fold query result, all' - ); - - tableEqual( - Query.from( - new Query([fold([{ foo: d => d.foo }])]).toObject() - ).evaluate(dt), - { - bar: [ 1, 1, 0, 0 ], - key: [ 'foo', 'foo', 'foo', 'foo' ], - value: [ 0, 1, 2, 3 ] - }, - 'fold query result, table expression' - ); - }); - - it('evaluates pivot verbs', () => { - const dt = table({ - foo: [0, 1, 2, 3], - bar: [1, 1, 0, 0] - }); - - tableEqual( - Query.from( - new Query([pivot(['bar'], ['foo'])]).toObject() - ).evaluate(dt), - { '0': [2], '1': [0] }, - 'pivot query result, column names' - ); - - tableEqual( - Query.from( - new Query([pivot( - [{ bar: d => d.bar }], - [{ foo: op.sum('foo') }] - )]).toObject() - ).evaluate(dt), - { '0': [5], '1': [1] }, - 'pivot query result, table expressions' - ); - }); - - it('evaluates spread verbs', () => { - const dt = table({ - list: [[1, 2, 3]] - }); - - tableEqual( - Query.from( - new Query([spread(['list'])]).toObject() - ).evaluate(dt), - { - 'list_1': [1], - 'list_2': [2], - 'list_3': [3] - }, - 'spread query result, column names' - ); - - tableEqual( - Query.from( - new Query([spread(['list'], { drop: false })]).toObject() - ).evaluate(dt), - { - 'list': [[1, 2, 3]], - 'list_1': [1], - 'list_2': [2], - 'list_3': [3] - }, - 'spread query result, column names' - ); - - tableEqual( - Query.from( - new Query([spread([{ list: d => d.list }])]).toObject() - ).evaluate(dt), - { - // 'list': [[1, 2, 3]], - 'list_1': [1], - 'list_2': [2], - 'list_3': [3] - }, - 'spread query result, table expression' - ); - - tableEqual( - Query.from( - new Query([spread(['list'], { limit: 2 })]).toObject() - ).evaluate(dt), - { - // 'list': [[1, 2, 3]], - 'list_1': [1], - 'list_2': [2] - }, - 'spread query result, limit' - ); - }); - - it('evaluates unroll verbs', () => { - const dt = table({ - list: [[1, 2, 3]] - }); - - tableEqual( - Query.from( - new Query([unroll(['list'])]).toObject() - ).evaluate(dt), - { 'list': [1, 2, 3] }, - 'unroll query result, column names' - ); - - tableEqual( - Query.from( - new Query([unroll([{ list: d => d.list }])]).toObject() - ).evaluate(dt), - { 'list': [1, 2, 3] }, - 'unroll query result, table expression' - ); - - tableEqual( - Query.from( - new Query([unroll(['list'], { limit: 2 })]).toObject() - ).evaluate(dt), - { 'list': [1, 2] }, - 'unroll query result, limit' - ); - }); - - it('evaluates cross verbs', () => { - const lt = table({ - x: ['A', 'B'], - y: [1, 2] - }); - - const rt = table({ - u: ['C'], - v: [3] - }); - - const catalog = name => name === 'other' ? rt : null; - - tableEqual( - Query.from( - new Query([ - cross('other') - ]).toObject() - ).evaluate(lt, catalog), - { x: ['A', 'B'], y: [1, 2], u: ['C', 'C'], v: [3, 3] }, - 'cross query result' - ); - - tableEqual( - Query.from( - new Query([ - cross('other', ['y', 'v']) - ]).toObject() - ).evaluate(lt, catalog), - { y: [1, 2], v: [3, 3] }, - 'cross query result, column name values' - ); - - tableEqual( - Query.from( - new Query([ - cross('other', [ - { y: d => d.y }, - { v: d => d.v } - ]) - ]).toObject() - ).evaluate(lt, catalog), - { y: [1, 2], v: [3, 3] }, - 'cross query result, table expression values' - ); - - tableEqual( - Query.from( - new Query([ - cross('other', { - y: a => a.y, - v: (a, b) => b.v - }) - ]).toObject() - ).evaluate(lt, catalog), - { y: [1, 2], v: [3, 3] }, - 'cross query result, two-table expression values' - ); - }); - - it('evaluates join verbs', () => { - const lt = table({ - x: ['A', 'B', 'C'], - y: [1, 2, 3] - }); - - const rt = table({ - u: ['A', 'B', 'D'], - v: [4, 5, 6] - }); - - const catalog = name => name === 'other' ? rt : null; - - tableEqual( - Query.from( - new Query([ - join('other', ['x', 'u']) - ]).toObject() - ).evaluate(lt, catalog), - { x: ['A', 'B'], y: [1, 2], u: ['A', 'B'], v: [4, 5] }, - 'join query result, column name keys' - ); - - tableEqual( - Query.from( - new Query([ - join('other', (a, b) => op.equal(a.x, b.u)) - ]).toObject() - ).evaluate(lt, catalog), - { x: ['A', 'B'], y: [1, 2], u: ['A', 'B'], v: [4, 5] }, - 'join query result, predicate expression' - ); - - tableEqual( - Query.from( - new Query([ - join('other', ['x', 'u'], [['x', 'y'], 'v']) - ]).toObject() - ).evaluate(lt, catalog), - { x: ['A', 'B'], y: [1, 2], v: [4, 5] }, - 'join query result, column name values' - ); - - tableEqual( - Query.from( - new Query([ - join('other', ['x', 'u'], [all(), not('u')]) - ]).toObject() - ).evaluate(lt, catalog), - { x: ['A', 'B'], y: [1, 2], v: [4, 5] }, - 'join query result, selection values' - ); - - tableEqual( - Query.from( - new Query([ - join('other', ['x', 'u'], [ - { x: d => d.x, y: d => d.y }, - { v: d => d.v } - ]) - ]).toObject() - ).evaluate(lt, catalog), - { x: ['A', 'B'], y: [1, 2], v: [4, 5] }, - 'join query result, table expression values' - ); - - tableEqual( - Query.from( - new Query([ - join('other', ['x', 'u'], { - x: a => a.x, - y: a => a.y, - v: (a, b) => b.v - }) - ]).toObject() - ).evaluate(lt, catalog), - { x: ['A', 'B'], y: [1, 2], v: [4, 5] }, - 'join query result, two-table expression values' - ); - - tableEqual( - Query.from( - new Query([ - join('other', ['x', 'u'], [['x', 'y'], ['u', 'v']], - { left: true, right: true}) - ]).toObject() - ).evaluate(lt, catalog), - { - x: [ 'A', 'B', 'C', undefined ], - y: [ 1, 2, 3, undefined ], - u: [ 'A', 'B', undefined, 'D' ], - v: [ 4, 5, undefined, 6 ] - }, - 'join query result, full join' - ); - }); - - it('evaluates semijoin verbs', () => { - const lt = table({ - x: ['A', 'B', 'C'], - y: [1, 2, 3] - }); - - const rt = table({ - u: ['A', 'B', 'D'], - v: [4, 5, 6] - }); - - const catalog = name => name === 'other' ? rt : null; - - tableEqual( - Query.from( - new Query([ - semijoin('other', ['x', 'u']) - ]).toObject() - ).evaluate(lt, catalog), - { x: ['A', 'B'], y: [1, 2] }, - 'semijoin query result, column name keys' - ); - - tableEqual( - Query.from( - new Query([ - semijoin('other', (a, b) => op.equal(a.x, b.u)) - ]).toObject() - ).evaluate(lt, catalog), - { x: ['A', 'B'], y: [1, 2] }, - 'semijoin query result, predicate expression' - ); - }); - - it('evaluates antijoin verbs', () => { - const lt = table({ - x: ['A', 'B', 'C'], - y: [1, 2, 3] - }); - - const rt = table({ - u: ['A', 'B', 'D'], - v: [4, 5, 6] - }); - - const catalog = name => name === 'other' ? rt : null; - - tableEqual( - Query.from( - new Query([ - antijoin('other', ['x', 'u']) - ]).toObject() - ).evaluate(lt, catalog), - { x: ['C'], y: [3] }, - 'antijoin query result, column name keys' - ); - - tableEqual( - Query.from( - new Query([ - antijoin('other', (a, b) => op.equal(a.x, b.u)) - ]).toObject() - ).evaluate(lt, catalog), - { x: ['C'], y: [3] }, - 'antijoin query result, predicate expression' - ); - }); - - it('evaluates concat verbs', () => { - const lt = table({ - x: ['A', 'B'], - y: [1, 2] - }); - - const rt = table({ - x: ['B', 'C'], - y: [2, 3] - }); - - const catalog = name => name === 'other' ? rt : null; - - tableEqual( - Query.from( - new Query([ concat(['other']) ]).toObject() - ).evaluate(lt, catalog), - { x: ['A', 'B', 'B', 'C'], y: [1, 2, 2, 3] }, - 'concat query result' - ); - }); - - it('evaluates concat verbs with subqueries', () => { - const lt = table({ - x: ['A', 'B'], - y: [1, 2] - }); - - const rt = table({ - a: ['B', 'C'], - b: [2, 3] - }); - - const catalog = name => name === 'other' ? rt : null; - - const sub = query('other') - .select({ a: 'x', b: 'y' }); - - tableEqual( - Query.from( - new Query([ concat([sub]) ]).toObject() - ).evaluate(lt, catalog), - { x: ['A', 'B', 'B', 'C'], y: [1, 2, 2, 3] }, - 'concat query result' - ); - }); - - it('evaluates union verbs', () => { - const lt = table({ - x: ['A', 'B'], - y: [1, 2] - }); - - const rt = table({ - x: ['B', 'C'], - y: [2, 3] - }); - - const catalog = name => name === 'other' ? rt : null; - - tableEqual( - Query.from( - new Query([ union(['other']) ]).toObject() - ).evaluate(lt, catalog), - { x: ['A', 'B', 'C'], y: [1, 2, 3] }, - 'union query result' - ); - }); - - it('evaluates except verbs', () => { - const lt = table({ - x: ['A', 'B'], - y: [1, 2] - }); - - const rt = table({ - x: ['B', 'C'], - y: [2, 3] - }); - - const catalog = name => name === 'other' ? rt : null; - - tableEqual( - Query.from( - new Query([ except(['other']) ]).toObject() - ).evaluate(lt, catalog), - { x: ['A'], y: [1] }, - 'except query result' - ); - }); - - it('evaluates intersect verbs', () => { - const lt = table({ - x: ['A', 'B'], - y: [1, 2] - }); - - const rt = table({ - x: ['B', 'C'], - y: [2, 3] - }); - - const catalog = name => name === 'other' ? rt : null; - - tableEqual( - Query.from( - new Query([ intersect(['other']) ]).toObject() - ).evaluate(lt, catalog), - { x: ['B'], y: [2] }, - 'intersect query result' - ); - }); -}); diff --git a/test/query/util.js b/test/query/util.js deleted file mode 100644 index 2afeb5e9..00000000 --- a/test/query/util.js +++ /dev/null @@ -1,7 +0,0 @@ -export const field = (expr, props) => ({ - expr, field: true, ...props -}); - -export const func = (expr, props) => ({ - expr, func: true, ...props -}); diff --git a/test/query/verb-test.js b/test/query/verb-test.js deleted file mode 100644 index 662b4968..00000000 --- a/test/query/verb-test.js +++ /dev/null @@ -1,576 +0,0 @@ -import assert from 'node:assert'; -import { query } from '../../src/query/query.js'; -import { Verb, Verbs } from '../../src/query/verb.js'; -import { - all, bin, desc, endswith, matches, not, op, range, rolling, startswith -} from '../../src/index.js'; -import { field, func } from './util.js'; - -const { - count, dedupe, derive, filter, groupby, orderby, - reify, rollup, sample, select, ungroup, unorder, - relocate, rename, impute, pivot, unroll, join, concat -} = Verbs; - -function test(verb, expect, msg) { - const object = verb.toObject(); - assert.deepEqual(object, expect, msg); - const rt = Verb.from(object).toObject(); - assert.deepEqual(rt, expect, msg + ' round-trip'); -} - -describe('serialize', () => { - it('count verb serializes to object', () => { - test( - count(), - { - verb: 'count', - options: undefined - }, - 'serialized count, no options' - ); - - test( - count({ as: 'cnt' }), - { - verb: 'count', - options: { as: 'cnt' } - }, - 'serialized count, with options' - ); - }); - - it('dedupe verb serializes to object', () => { - test( - dedupe(), - { - verb: 'dedupe', - keys: [] - }, - 'serialized dedupe, no keys' - ); - - test( - dedupe(['id', d => d.foo]), - { - verb: 'dedupe', - keys: [ - 'id', - func('d => d.foo') - ] - }, - 'serialized dedupe, with keys' - ); - }); - - it('derive verb serializes to object', () => { - const verb = derive( - { - foo: 'd.bar * 5', - bar: d => d.foo + 1, - baz: rolling(d => op.mean(d.foo), [-3, 3]) - }, - { - before: 'bop' - } - ); - - test( - verb, - { - verb: 'derive', - values: { - foo: 'd.bar * 5', - bar: func('d => d.foo + 1'), - baz: func( - 'd => op.mean(d.foo)', - { window: { frame: [ -3, 3 ], peers: false } } - ) - }, - options: { - before: 'bop' - } - }, - 'serialized derive verb' - ); - }); - - it('filter verb serializes to object', () => { - test( - filter(d => d.foo > 2), - { - verb: 'filter', - criteria: func('d => d.foo > 2') - }, - 'serialized filter verb' - ); - }); - - it('groupby verb serializes to object', () => { - const verb = groupby([ - 'foo', - { baz: d => d.baz, bop: d => d.bop } - ]); - - test( - verb, - { - verb: 'groupby', - keys: [ - 'foo', - { - baz: func('d => d.baz'), - bop: func('d => d.bop') - } - ] - }, - 'serialized groupby verb' - ); - - const binVerb = groupby([{ bin0: bin('foo') }]); - - test( - binVerb, - { - verb: 'groupby', - keys: [ - { - bin0: 'd => op.bin(d["foo"], ...op.bins(d["foo"]), 0)' - } - ] - }, - 'serialized groupby verb, with binning' - ); - }); - - it('orderby verb serializes to object', () => { - const verb = orderby([ - 1, - 'foo', - desc('bar'), - d => d.baz, - desc(d => d.bop) - ]); - - test( - verb, - { - verb: 'orderby', - keys: [ - 1, - field('foo'), - field('bar', { desc: true }), - func('d => d.baz'), - func('d => d.bop', { desc: true }) - ] - }, - 'serialized orderby verb' - ); - }); - - it('reify verb serializes to AST', () => { - const verb = reify(); - - test( - verb, - { verb: 'reify' }, - 'serialized reify verb' - ); - }); - - it('relocate verb serializes to object', () => { - test( - relocate(['foo', 'bar'], { before: 'baz' }), - { - verb: 'relocate', - columns: ['foo', 'bar'], - options: { before: 'baz' } - }, - 'serialized relocate verb' - ); - - test( - relocate(not('foo'), { after: range('a', 'b') }), - { - verb: 'relocate', - columns: { not: ['foo'] }, - options: { after: { range: ['a', 'b'] } } - }, - 'serialized relocate verb' - ); - }); - - it('rename verb serializes to object', () => { - test( - rename([{ foo: 'bar' }]), - { - verb: 'rename', - columns: [{ foo: 'bar' }] - }, - 'serialized rename verb' - ); - - test( - rename([{ foo: 'bar' }, { baz: 'bop' }]), - { - verb: 'rename', - columns: [{ foo: 'bar' }, { baz: 'bop' }] - }, - 'serialized rename verb' - ); - }); - - it('rollup verb serializes to object', () => { - const verb = rollup({ - count: op.count(), - sum: op.sum('bar'), - mean: d => op.mean(d.foo) - }); - - test( - verb, - { - verb: 'rollup', - values: { - count: func('d => op.count()'), - sum: func('d => op.sum(d["bar"])'), - mean: func('d => op.mean(d.foo)') - } - }, - 'serialized rollup verb' - ); - }); - - it('sample verb serializes to object', () => { - test( - sample(2, { replace: true }), - { - verb: 'sample', - size: 2, - options: { replace: true } - }, - 'serialized sample verb' - ); - - test( - sample(() => op.count()), - { - verb: 'sample', - size: { expr: '() => op.count()', func: true }, - options: undefined - }, - 'serialized sample verb, size function' - ); - - test( - sample('() => op.count()'), - { - verb: 'sample', - size: '() => op.count()', - options: undefined - }, - 'serialized sample verb, size function as string' - ); - - test( - sample(2, { weight: 'foo' }), - { - verb: 'sample', - size: 2, - options: { weight: 'foo' } - }, - 'serialized sample verb, weight column name' - ); - - test( - sample(2, { weight: d => 2 * d.foo }), - { - verb: 'sample', - size: 2, - options: { weight: { expr: 'd => 2 * d.foo', func: true } } - }, - 'serialized sample verb, weight table expression' - ); - }); - - it('select verb serializes to object', () => { - const verb = select([ - 'foo', - 'bar', - { bop: 'boo', baz: 'bao' }, - all(), - range(0, 1), - range('a', 'b'), - not('foo', 'bar', range(0, 1), range('a', 'b')), - matches('foo.bar'), - matches(/a|b/i), - startswith('foo.'), - endswith('.baz') - ]); - - test( - verb, - { - verb: 'select', - columns: [ - 'foo', - 'bar', - { bop: 'boo', baz: 'bao' }, - { all: [] }, - { range: [0, 1] }, - { range: ['a', 'b'] }, - { - not: [ - 'foo', - 'bar', - { range: [0, 1] }, - { range: ['a', 'b'] } - ] - }, - { matches: ['foo\\.bar', ''] }, - { matches: ['a|b', 'i'] }, - { matches: ['^foo\\.', ''] }, - { matches: ['\\.baz$', ''] } - ] - }, - 'serialized select verb' - ); - }); - - it('ungroup verb serializes to object', () => { - test( - ungroup(), - { verb: 'ungroup' }, - 'serialized ungroup verb' - ); - }); - - it('unorder verb serializes to object', () => { - test( - unorder(), - { verb: 'unorder' }, - 'serialized unorder verb' - ); - }); - - it('impute verb serializes to object', () => { - const verb = impute( - { v: () => 0 }, - { expand: 'x' } - ); - - test( - verb, - { - verb: 'impute', - values: { - v: func('() => 0') - }, - options: { - expand: 'x' - } - }, - 'serialized impute verb' - ); - }); - - it('pivot verb serializes to object', () => { - const verb = pivot( - ['key'], - ['value', { sum: d => op.sum(d.foo), prod: op.product('bar') }], - { sort: false } - ); - - test( - verb, - { - verb: 'pivot', - keys: ['key'], - values: [ - 'value', - { - sum: func('d => op.sum(d.foo)'), - prod: func('d => op.product(d["bar"])') - } - ], - options: { sort: false } - }, - 'serialized pivot verb' - ); - }); - - it('unroll verb serializes to object', () => { - test( - unroll(['foo', 1]), - { - verb: 'unroll', - values: [ 'foo', 1 ], - options: undefined - }, - 'serialized unroll verb' - ); - - test( - unroll({ - foo: d => d.foo, - bar: d => op.split(d.bar, ' ') - }), - { - verb: 'unroll', - values: { - foo: { expr: 'd => d.foo', func: true }, - bar: { expr: 'd => op.split(d.bar, \' \')', func: true } - }, - options: undefined - }, - 'serialized unroll verb, values object' - ); - - test( - unroll(['foo'], { index: true }), - { - verb: 'unroll', - values: [ 'foo' ], - options: { index: true } - }, - 'serialized unroll verb, index boolean' - ); - - test( - unroll(['foo'], { index: 'idxnum' }), - { - verb: 'unroll', - values: [ 'foo' ], - options: { index: 'idxnum' } - }, - 'serialized unroll verb, index string' - ); - - test( - unroll(['foo'], { drop: [ 'bar' ] }), - { - verb: 'unroll', - values: [ 'foo' ], - options: { drop: [ 'bar' ] } - }, - 'serialized unroll verb, drop column name' - ); - - test( - unroll(['foo'], { drop: d => d.bar }), - { - verb: 'unroll', - values: [ 'foo' ], - options: { drop: { expr: 'd => d.bar', func: true } } - }, - 'serialized unroll verb, drop table expression' - ); - }); - - it('join verb serializes to object', () => { - const verbSel = join( - 'tableRef', - ['keyL', 'keyR'], - [all(), not('keyR')], - { suffix: ['_L', '_R'] } - ); - - test( - verbSel, - { - verb: 'join', - table: 'tableRef', - on: [ - [field('keyL')], - [field('keyR')] - ], - values: [ - [ { all: [] } ], - [ { not: ['keyR'] } ] - ], - options: { suffix: ['_L', '_R'] } - }, - 'serialized join verb, column selections' - ); - - const verbCols = join( - 'tableRef', - [ - [d => d.keyL], - [d => d.keyR] - ], - [ - ['keyL', 'valL', { foo: d => 1 + d.valL }], - ['valR', { bar: d => 2 * d.valR }] - ], - { suffix: ['_L', '_R'] } - ); - - test( - verbCols, - { - verb: 'join', - table: 'tableRef', - on: [ - [ func('d => d.keyL') ], - [ func('d => d.keyR') ] - ], - values: [ - ['keyL', 'valL', { foo: func('d => 1 + d.valL') } ], - ['valR', { bar: func('d => 2 * d.valR') }] - ], - options: { suffix: ['_L', '_R'] } - }, - 'serialized join verb, column lists' - ); - - const verbExpr = join( - 'tableRef', - (a, b) => op.equal(a.keyL, b.keyR), - { - key: a => a.keyL, - foo: a => a.foo, - bar: (a, b) => b.bar - } - ); - - test( - verbExpr, - { - verb: 'join', - table: 'tableRef', - on: func('(a, b) => op.equal(a.keyL, b.keyR)'), - values: { - key: func('a => a.keyL'), - foo: func('a => a.foo'), - bar: func('(a, b) => b.bar') - }, - options: undefined - }, - 'serialized join verb, table expressions' - ); - }); - - it('concat verb serializes to object', () => { - test( - concat(['foo', 'bar']), - { - verb: 'concat', - tables: ['foo', 'bar'] - }, - 'serialized concat verb' - ); - - const ct1 = query('foo').select(not('bar')); - const ct2 = query('bar').select(not('foo')); - - test( - concat([ct1, ct2]), - { - verb: 'concat', - tables: [ ct1.toObject(), ct2.toObject() ] - }, - 'serialized concat verb, with subqueries' - ); - }); -}); diff --git a/test/query/verb-to-ast-test.js b/test/query/verb-to-ast-test.js deleted file mode 100644 index 66ee3731..00000000 --- a/test/query/verb-to-ast-test.js +++ /dev/null @@ -1,845 +0,0 @@ -import assert from 'node:assert'; -import { query } from '../../src/query/query.js'; -import { Verbs } from '../../src/query/verb.js'; -import { - all, bin, desc, endswith, matches, not, op, range, rolling, startswith -} from '../../src/index.js'; - -const { - count, dedupe, derive, filter, groupby, orderby, - reify, rollup, sample, select, ungroup, unorder, - relocate, rename, impute, pivot, unroll, join, concat -} = Verbs; - -function toAST(verb) { - return JSON.parse(JSON.stringify(verb.toAST())); -} - -describe('serialize to AST', () => { - it('count verb serializes to AST', () => { - assert.deepEqual( - toAST(count()), - { type: 'Verb', verb: 'count' }, - 'ast count, no options' - ); - - assert.deepEqual( - toAST(count({ as: 'cnt' })), - { - type: 'Verb', - verb: 'count', - options: { as: 'cnt' } - }, - 'ast count, with options' - ); - }); - - it('dedupe verb serializes to AST', () => { - assert.deepEqual( - toAST(dedupe()), - { - type: 'Verb', - verb: 'dedupe', - keys: [] - }, - 'ast dedupe, no keys' - ); - - assert.deepEqual( - toAST(dedupe(['id', d => d.foo, d => Math.abs(d.bar)])), - { - type: 'Verb', - verb: 'dedupe', - keys: [ - { type: 'Column', name: 'id' }, - { type: 'Column', name: 'foo' }, - { - type: 'CallExpression', - callee: { type: 'Function', name: 'abs' }, - arguments: [ { type: 'Column', name: 'bar' } ] - } - ] - }, - 'ast dedupe, with keys' - ); - }); - - it('derive verb serializes to AST', () => { - const verb = derive( - { - col: d => d.foo, - foo: 'd.bar * 5', - bar: d => d.foo + 1, - baz: rolling(d => op.mean(d.foo), [-3, 3]) - }, - { - before: 'bop' - } - ); - - assert.deepEqual( - toAST(verb), - { - type: 'Verb', - verb: 'derive', - values: [ - { type: 'Column', name: 'foo', as: 'col' }, - { - type: 'BinaryExpression', - left: { type: 'Column', name: 'bar' }, - operator: '*', - right: { type: 'Literal', value: 5, raw: '5' }, - as: 'foo' - }, - { - type: 'BinaryExpression', - left: { type: 'Column', name: 'foo' }, - operator: '+', - right: { type: 'Literal', value: 1, raw: '1' }, - as: 'bar' - }, - { - type: 'Window', - frame: [ -3, 3 ], - peers: false, - expr: { - type: 'CallExpression', - callee: { type: 'Function', name: 'mean' }, - arguments: [ { type: 'Column', name: 'foo' } ] - }, - as: 'baz' - } - ], - options: { - before: [ - { type: 'Column', name: 'bop' } - ] - } - }, - 'ast derive verb' - ); - }); - - it('filter verb serializes to AST', () => { - const ast = { - type: 'Verb', - verb: 'filter', - criteria: { - type: 'BinaryExpression', - left: { type: 'Column', name: 'foo' }, - operator: '>', - right: { type: 'Literal', value: 2, raw: '2' } - } - }; - - assert.deepEqual( - toAST(filter(d => d.foo > 2)), - ast, - 'ast filter verb' - ); - - assert.deepEqual( - toAST(filter('d.foo > 2')), - ast, - 'ast filter verb, expr string' - ); - }); - - it('groupby verb serializes to AST', () => { - assert.deepEqual( - toAST(groupby([ - 'foo', - 1, - { baz: d => d.baz, bop: d => d.bop, bar: d => Math.abs(d.bar) } - ])), - { - type: 'Verb', - verb: 'groupby', - keys: [ - { type: 'Column', name: 'foo' }, - { type: 'Column', index: 1 }, - { type: 'Column', name: 'baz', as: 'baz' }, - { type: 'Column', name: 'bop', as: 'bop' }, - { - type: 'CallExpression', - callee: { type: 'Function', name: 'abs' }, - arguments: [ { type: 'Column', name: 'bar' } ], - as: 'bar' - } - ] - }, - 'ast groupby verb' - ); - - assert.deepEqual( - toAST(groupby([{ bin0: bin('foo') }])), - { - type: 'Verb', - verb: 'groupby', - keys: [ - { - as: 'bin0', - type: 'CallExpression', - callee: { type: 'Function', name: 'bin' }, - arguments: [ - { type: 'Column', name: 'foo' }, - { - type: 'SpreadElement', - argument: { - type: 'CallExpression', - callee: { type: 'Function', name: 'bins' }, - arguments: [{ type: 'Column', name: 'foo' }] - } - }, - { type: 'Literal', value: 0, raw: '0' } - ] - } - ] - }, - 'ast groupby verb, with binning' - ); - }); - - it('orderby verb serializes to AST', () => { - const verb = orderby([ - 1, - 'foo', - desc('bar'), - d => d.baz, - desc(d => d.bop) - ]); - - assert.deepEqual( - toAST(verb), - { - type: 'Verb', - verb: 'orderby', - keys: [ - { type: 'Column', index: 1 }, - { type: 'Column', name: 'foo' }, - { type: 'Descending', expr: { type: 'Column', name: 'bar' } }, - { type: 'Column', name: 'baz' }, - { type: 'Descending', expr: { type: 'Column', name: 'bop' } } - ] - }, - 'ast orderby verb' - ); - }); - - it('reify verb serializes to AST', () => { - const verb = reify(); - - assert.deepEqual( - toAST(verb), - { type: 'Verb', verb: 'reify' }, - 'ast reify verb' - ); - }); - - it('relocate verb serializes to AST', () => { - assert.deepEqual( - toAST(relocate(['foo', 'bar'], { before: 'baz' })), - { - type: 'Verb', - verb: 'relocate', - columns: [ - { type: 'Column', name: 'foo' }, - { type: 'Column', name: 'bar' } - ], - options: { - before: [ { type: 'Column', name: 'baz' } ] - } - }, - 'ast relocate verb' - ); - - assert.deepEqual( - toAST(relocate(not('foo'), { after: range('a', 'b') })), - { - type: 'Verb', - verb: 'relocate', - columns: [ - { - type: 'Selection', - operator: 'not', - arguments: [ { type: 'Column', name: 'foo' } ] - } - ], - options: { - after: [ - { - type: 'Selection', - operator: 'range', - arguments: [ - { type: 'Column', name: 'a' }, - { type: 'Column', name: 'b' } - ] - } - ] - } - }, - 'ast relocate verb' - ); - }); - - it('rename verb serializes to AST', () => { - assert.deepEqual( - toAST(rename([{ foo: 'bar' }])), - { - type: 'Verb', - verb: 'rename', - columns: [ - { type: 'Column', name: 'foo', as: 'bar' } - ] - }, - 'ast rename verb' - ); - - assert.deepEqual( - toAST(rename([{ foo: 'bar' }, { baz: 'bop' }])), - { - type: 'Verb', - verb: 'rename', - columns: [ - { type: 'Column', name: 'foo', as: 'bar' }, - { type: 'Column', name: 'baz', as: 'bop' } - ] - }, - 'ast rename verb' - ); - }); - - it('rollup verb serializes to AST', () => { - const verb = rollup({ - count: op.count(), - sum: op.sum('bar'), - mean: d => op.mean(d.foo) - }); - - assert.deepEqual( - toAST(verb), - { - type: 'Verb', - verb: 'rollup', - values: [ - { - as: 'count', - type: 'CallExpression', - callee: { type: 'Function', name: 'count' }, - arguments: [] - }, - { - as: 'sum', - type: 'CallExpression', - callee: { type: 'Function', name: 'sum' }, - arguments: [{ type: 'Column', name: 'bar' } ] - }, - { - as: 'mean', - type: 'CallExpression', - callee: { type: 'Function', name: 'mean' }, - arguments: [ { type: 'Column', name: 'foo' } ] - } - ] - }, - 'ast rollup verb' - ); - }); - - it('sample verb serializes to AST', () => { - assert.deepEqual( - toAST(sample(2, { replace: true })), - { - type: 'Verb', - verb: 'sample', - size: 2, - options: { replace: true } - }, - 'ast sample verb' - ); - - assert.deepEqual( - toAST(sample(() => op.count())), - { - type: 'Verb', - verb: 'sample', - size: { - type: 'CallExpression', - callee: { type: 'Function', name: 'count' }, - arguments: [] - } - }, - 'ast sample verb, size function' - ); - - assert.deepEqual( - toAST(sample('() => op.count()')), - { - type: 'Verb', - verb: 'sample', - size: { - type: 'CallExpression', - callee: { type: 'Function', name: 'count' }, - arguments: [] - } - }, - 'ast sample verb, size function as string' - ); - - assert.deepEqual( - toAST(sample(2, { weight: 'foo' })), - { - type: 'Verb', - verb: 'sample', - size: 2, - options: { weight: { type: 'Column', name: 'foo' } } - }, - 'ast sample verb, weight column name' - ); - - assert.deepEqual( - toAST(sample(2, { weight: d => 2 * d.foo })), - { - type: 'Verb', - verb: 'sample', - size: 2, - options: { - weight: { - type: 'BinaryExpression', - left: { type: 'Literal', value: 2, raw: '2' }, - operator: '*', - right: { type: 'Column', name: 'foo' } - } - } - }, - 'ast sample verb, weight table expression' - ); - }); - - it('select verb serializes to AST', () => { - const verb = select([ - 'foo', - 'bar', - { bop: 'boo', baz: 'bao' }, - all(), - range(0, 1), - range('a', 'b'), - not('foo', 'bar', range(0, 1), range('a', 'b')), - matches('foo.bar'), - matches(/a|b/i), - startswith('foo.'), - endswith('.baz') - ]); - - assert.deepEqual( - toAST(verb), - { - type: 'Verb', - verb: 'select', - columns: [ - { type: 'Column', name: 'foo' }, - { type: 'Column', name: 'bar' }, - { type: 'Column', name: 'bop', as: 'boo' }, - { type: 'Column', name: 'baz', as: 'bao' }, - { type: 'Selection', operator: 'all' }, - { - type: 'Selection', - operator: 'range', - arguments: [ - { type: 'Column', index: 0 }, - { type: 'Column', index: 1 } - ] - }, - { - type: 'Selection', - operator: 'range', - arguments: [ - { type: 'Column', name: 'a' }, - { type: 'Column', name: 'b' } - ] - }, - { - type: 'Selection', - operator: 'not', - arguments: [ - { type: 'Column', name: 'foo' }, - { type: 'Column', name: 'bar' }, - { - type: 'Selection', - operator: 'range', - arguments: [ - { type: 'Column', index: 0 }, - { type: 'Column', index: 1 } - ] - }, - { - type: 'Selection', - operator: 'range', - arguments: [ - { type: 'Column', name: 'a' }, - { type: 'Column', name: 'b' } - ] - } - ] - }, - { - type: 'Selection', - operator: 'matches', - arguments: [ 'foo\\.bar', '' ] - }, - { - type: 'Selection', - operator: 'matches', - arguments: [ 'a|b', 'i' ] - }, - { - type: 'Selection', - operator: 'matches', - arguments: [ '^foo\\.', '' ] - }, - { - type: 'Selection', - operator: 'matches', - arguments: [ '\\.baz$', '' ] - } - ] - }, - 'ast select verb' - ); - }); - - it('ungroup verb serializes to AST', () => { - const verb = ungroup(); - - assert.deepEqual( - toAST(verb), - { type: 'Verb', verb: 'ungroup' }, - 'ast ungroup verb' - ); - }); - - it('unorder verb serializes to AST', () => { - const verb = unorder(); - - assert.deepEqual( - toAST(verb), - { type: 'Verb', verb: 'unorder' }, - 'ast unorder verb' - ); - }); - - it('pivot verb serializes to AST', () => { - const verb = pivot( - ['key'], - ['value', { sum: d => op.sum(d.foo), prod: op.product('bar') }], - { sort: false } - ); - - assert.deepEqual( - toAST(verb), - { - type: 'Verb', - verb: 'pivot', - keys: [ { type: 'Column', name: 'key' } ], - values: [ - { type: 'Column', name: 'value' }, - { - as: 'sum', - type: 'CallExpression', - callee: { type: 'Function', name: 'sum' }, - arguments: [ { type: 'Column', name: 'foo' } ] - }, - { - as: 'prod', - type: 'CallExpression', - callee: { type: 'Function', name: 'product' }, - arguments: [ { type: 'Column', name: 'bar' } ] - } - ], - options: { sort: false } - }, - 'ast pivot verb' - ); - }); - - it('impute verb serializes to AST', () => { - const verb = impute( - { v: () => 0 }, - { expand: 'x' } - ); - - assert.deepEqual( - toAST(verb), - { - type: 'Verb', - verb: 'impute', - values: [ { as: 'v', type: 'Literal', raw: '0', value: 0 } ], - options: { expand: [ { type: 'Column', name: 'x' } ] } - }, - 'ast impute verb' - ); - }); - - it('unroll verb serializes to AST', () => { - assert.deepEqual( - toAST(unroll(['foo', 1])), - { - type: 'Verb', - verb: 'unroll', - values: [ - { type: 'Column', name: 'foo' }, - { type: 'Column', index: 1 } - ] - }, - 'ast unroll verb' - ); - - assert.deepEqual( - toAST(unroll({ - foo: d => d.foo, - bar: d => op.split(d.bar, ' ') - })), - { - type: 'Verb', - verb: 'unroll', - values: [ - { type: 'Column', name: 'foo', as: 'foo' }, - { - as: 'bar', - type: 'CallExpression', - callee: { type: 'Function', name: 'split' }, - arguments: [ - { type: 'Column', name: 'bar' }, - { type: 'Literal', value: ' ', raw: '\' \'' } - ] - } - ] - }, - 'ast unroll verb, values object' - ); - - assert.deepEqual( - toAST(unroll(['foo'], { index: true })), - { - type: 'Verb', - verb: 'unroll', - values: [ { type: 'Column', name: 'foo' } ], - options: { index: true } - }, - 'ast unroll verb, index boolean' - ); - - assert.deepEqual( - toAST(unroll(['foo'], { index: 'idxnum' })), - { - type: 'Verb', - verb: 'unroll', - values: [ { type: 'Column', name: 'foo' } ], - options: { index: 'idxnum' } - }, - 'ast unroll verb, index string' - ); - - assert.deepEqual( - toAST(unroll(['foo'], { drop: [ 'bar' ] })), - { - type: 'Verb', - verb: 'unroll', - values: [ { type: 'Column', name: 'foo' } ], - options: { - drop: [ { type: 'Column', name: 'bar' } ] - } - }, - 'ast unroll verb, drop column name' - ); - - assert.deepEqual( - toAST(unroll(['foo'], { drop: d => d.bar })), - { - type: 'Verb', - verb: 'unroll', - values: [ { type: 'Column', name: 'foo' } ], - options: { - drop: [ { type: 'Column', name: 'bar' } ] - } - }, - 'ast unroll verb, drop table expression' - ); - }); - - it('join verb serializes to AST', () => { - const verbSel = join( - 'tableRef', - ['keyL', 'keyR'], - [all(), not('keyR')], - { suffix: ['_L', '_R'] } - ); - - assert.deepEqual( - toAST(verbSel), - { - type: 'Verb', - verb: 'join', - table: 'tableRef', - on: [ - [ { type: 'Column', name: 'keyL' } ], - [ { type: 'Column', name: 'keyR' } ] - ], - values: [ - [ { type: 'Selection', operator: 'all' } ], - [ { - type: 'Selection', - operator: 'not', - arguments: [ { type: 'Column', name: 'keyR' } ] - } ] - ], - options: { suffix: ['_L', '_R'] } - }, - 'ast join verb, column selections' - ); - - const verbCols = join( - 'tableRef', - [ - [d => d.keyL], - [d => d.keyR] - ], - [ - ['keyL', 'valL', { foo: d => 1 + d.valL }], - ['valR', { bar: d => 2 * d.valR }] - ], - { suffix: ['_L', '_R'] } - ); - - assert.deepEqual( - toAST(verbCols), - { - type: 'Verb', - verb: 'join', - table: 'tableRef', - on: [ - [ { type: 'Column', name: 'keyL' } ], - [ { type: 'Column', name: 'keyR' } ] - ], - values: [ - [ - { type: 'Column', name: 'keyL' }, - { type: 'Column', name: 'valL' }, - { - as: 'foo', - type: 'BinaryExpression', - left: { type: 'Literal', 'value': 1, 'raw': '1' }, - operator: '+', - right: { type: 'Column', name: 'valL' } - } - ], - [ - { type: 'Column', name: 'valR' }, - { - as: 'bar', - type: 'BinaryExpression', - left: { type: 'Literal', 'value': 2, 'raw': '2' }, - operator: '*', - right: { type: 'Column', name: 'valR' } - } - ] - ], - options: { suffix: ['_L', '_R'] } - }, - 'ast join verb, column lists' - ); - - const verbExpr = join( - 'tableRef', - (a, b) => op.equal(a.keyL, b.keyR), - { - key: a => a.keyL, - foo: a => a.foo, - bar: (a, b) => b.bar - } - ); - - assert.deepEqual( - toAST(verbExpr), - { - type: 'Verb', - verb: 'join', - table: 'tableRef', - on: { - type: 'CallExpression', - callee: { type: 'Function', name: 'equal' }, - arguments: [ - { type: 'Column', table: 1, name: 'keyL' }, - { type: 'Column', table: 2, name: 'keyR' } - ] - }, - values: [ - { type: 'Column', table: 1, name: 'keyL', as: 'key' }, - { type: 'Column', table: 1, name: 'foo', as: 'foo' }, - { type: 'Column', table: 2, name: 'bar', as: 'bar' } - ] - }, - 'ast join verb, table expressions' - ); - }); - - it('concat verb serializes to AST', () => { - assert.deepEqual( - toAST(concat(['foo', 'bar'])), - { - type: 'Verb', - verb: 'concat', - tables: ['foo', 'bar'] - }, - 'ast concat verb' - ); - - const ct1 = query('foo').select(not('bar')); - const ct2 = query('bar').select(not('foo')); - - assert.deepEqual( - toAST(concat([ct1, ct2])), - { - type: 'Verb', - verb: 'concat', - tables: [ - { - type: 'Query', - verbs: [ - { - type: 'Verb', - verb: 'select', - columns: [ - { - type: 'Selection', - operator: 'not', - arguments: [ { type: 'Column', name: 'bar' } ] - } - ] - } - ], - table: 'foo' - }, - { - type: 'Query', - verbs: [ - { - type: 'Verb', - verb: 'select', - columns: [ - { - type: 'Selection', - operator: 'not', - arguments: [ { type: 'Column', name: 'foo' } ] - } - ] - } - ], - table: 'bar' - } - ] - }, - 'ast concat verb, with subqueries' - ); - }); -}); diff --git a/test/register-test.js b/test/register-test.js deleted file mode 100644 index 2486a84f..00000000 --- a/test/register-test.js +++ /dev/null @@ -1,226 +0,0 @@ -import assert from 'node:assert'; -import tableEqual from './table-equal.js'; -import { aggregateFunctions, functions, windowFunctions } from '../src/op/index.js'; -import { ExprObject } from '../src/query/constants.js'; -import { - addAggregateFunction, - addFunction, - addPackage, - addTableMethod, - addVerb, - addWindowFunction -} from '../src/register.js'; -import { op, query, table } from '../src/index.js'; - -describe('register', () => { - it('addFunction registers new function', () => { - const SECRET = 0xDEADBEEF; - function secret() { return 0xDEADBEEF; } - - addFunction(secret); - addFunction('sssh', secret); - assert.equal(functions.secret(), SECRET, 'add implicitly named function'); - assert.equal(functions.sssh(), SECRET, 'add explicitly named function'); - - assert.throws( - () => addFunction(() => 'foo'), - 'do not accept anonymous functions' - ); - - assert.throws( - () => addFunction('abs', val => val < 0 ? -val : val), - 'do not overwrite existing functions' - ); - - const abs = op.abs; - assert.doesNotThrow( - () => { - addFunction('abs', val => val < 0 ? -val : val, { override: true }); - addFunction('abs', abs, { override: true }); - }, - 'support override option' - ); - }); - - it('addAggregateFunction registers new aggregate function', () => { - const create = () => ({ - init: s => (s.altsign = -1, s.altsum = 0), - add: (s, v) => s.altsum += (s.altsign *= -1) * v, - rem: () => {}, - value: s => s.altsum - }); - - addAggregateFunction('altsum', { create, param: [1, 0] }); - assert.deepEqual( - aggregateFunctions.altsum, - { create, param: [1, 0] }, - 'register aggregate function' - ); - assert.equal( - table({ x: [1, 2, 3, 4, 5]}).rollup({ a: d => op.altsum(d.x) }).get('a', 0), - 3, 'evaluate aggregate function' - ); - - assert.throws( - () => addAggregateFunction('mean', { create }), - 'do not overwrite existing function' - ); - }); - - it('addWindowFunction registers new window function', () => { - const create = (offset) => ({ - init: () => {}, - value: (w, f) => w.value(w.index, f) - w.index + (offset || 0) - }); - - addWindowFunction('vmi', { create, param: [1, 1] }); - assert.deepEqual( - windowFunctions.vmi, - { create, param: [1, 1] }, - 'register window function' - ); - tableEqual( - table({ x: [1, 2, 3, 4, 5] }).derive({ a: d => op.vmi(d.x, 1) }).select('a'), - { a: [2, 2, 2, 2, 2] }, - 'evaluate window function' - ); - - assert.throws( - () => addWindowFunction('rank', { create }), - 'do not overwrite existing function' - ); - }); - - it('addTableMethod registers new table method', () => { - const dim1 = (t, ...args) => [t.numRows(), t.numCols(), ...args]; - const dim2 = (t) => [t.numRows(), t.numCols()]; - - addTableMethod('dims', dim1); - - assert.deepEqual( - table({ a: [1, 2, 3], b: [4, 5, 6] }).dims('a', 'b'), - [3, 2, 'a', 'b'], - 'register table method' - ); - - assert.throws( - () => addTableMethod('_foo', dim1), - 'do not allow names that start with underscore' - ); - - assert.throws( - () => addTableMethod('toCSV', dim1, { override: true }), - 'do not override reserved names' - ); - - assert.doesNotThrow( - () => addTableMethod('dims', dim1), - 'allow reassignment of existing value' - ); - - assert.throws( - () => addTableMethod('dims', dim2), - 'do not override without option' - ); - - assert.doesNotThrow( - () => addTableMethod('dims', dim2, { override: true }), - 'allow override with option' - ); - - assert.deepEqual( - table({ a: [1, 2, 3], b: [4, 5, 6] }).dims('a', 'b'), - [3, 2], - 'register overridden table method' - ); - }); - - it('addVerb registers a new verb', () => { - const rup = (t, exprs) => t.rollup(exprs); - - addVerb('rup', rup, [ - { name: 'exprs', type: ExprObject } - ]); - - tableEqual( - table({ a: [1, 2, 3], b: [4, 5, 6] }).rup({ sum: op.sum('a') }), - { sum: [ 6 ] }, - 'register verb with table' - ); - - assert.deepEqual( - query().rup({ sum: op.sum('a') }).toObject(), - { - verbs: [ - { - verb: 'rup', - exprs: { sum: { expr: 'd => op.sum(d["a"])', func: true } } - } - ] - }, - 'register verb with query' - ); - }); - - it('addPackage registers an extension package', () => { - const pkg = { - functions: { - secret_p: () => 0xDEADBEEF - }, - aggregateFunctions: { - altsum_p: { - create: () => ({ - init: s => (s.altsign = -1, s.altsum = 0), - add: (s, v) => s.altsum += (s.altsign *= -1) * v, - rem: () => {}, - value: s => s.altsum - }), - param: [1, 0] - } - }, - windowFunctions: { - vmi_p: { - create: (offset) => ({ - init: () => {}, - value: (w, f) => w.value(w.index, f) - w.index + (offset || 0) - }), - param: [1, 1] - } - }, - tableMethods: { - dims_p: t => [t.numRows(), t.numCols()] - }, - verbs: { - rup_p: { - method: (t, exprs) => t.rollup(exprs), - params: [ { name: 'exprs', type: ExprObject } ] - } - } - }; - - addPackage(pkg); - - assert.equal(functions.secret_p, pkg.functions.secret_p, 'functions'); - assert.equal(aggregateFunctions.altsum_p, pkg.aggregateFunctions.altsum_p, 'aggregate functions'); - assert.equal(windowFunctions.vmi_p, pkg.windowFunctions.vmi_p, 'window functions'); - assert.equal(table().dims_p.fn, pkg.tableMethods.dims_p, 'table methods'); - assert.equal(table().rup_p.fn, pkg.verbs.rup_p.method, 'verbs'); - - assert.doesNotThrow( - () => addPackage(pkg), - 'allow reassignment of existing value' - ); - - assert.throws( - () => addPackage({ functions: { secret_p: () => 1 } }), - 'do not override without option' - ); - - const secret_p = () => 42; - addPackage({ functions: { secret_p } }, { override: true }); - assert.equal( - functions.secret_p, secret_p, - 'allow override with option' - ); - }); -}); diff --git a/test/table/bitset-test.js b/test/table/bitset-test.js index 04e4a189..00ed0132 100644 --- a/test/table/bitset-test.js +++ b/test/table/bitset-test.js @@ -1,5 +1,5 @@ import assert from 'node:assert'; -import BitSet from '../../src/table/bit-set.js'; +import { BitSet } from '../../src/index.js'; describe('BitSet', () => { it('manages a set of bits', () => { diff --git a/test/table/column-table-test.js b/test/table/column-table-test.js index 1ac0688c..a7b98b0a 100644 --- a/test/table/column-table-test.js +++ b/test/table/column-table-test.js @@ -1,522 +1,52 @@ import assert from 'node:assert'; -import tableEqual from '../table-equal.js'; -import { not } from '../../src/helpers/selection.js'; -import BitSet from '../../src/table/bit-set.js'; -import ColumnTable from '../../src/table/column-table.js'; +import { ColumnTable, Table } from '../../src/index.js'; +import * as verbs from '../../src/verbs/index.js'; describe('ColumnTable', () => { - it('supports varied column types', () => { - const data = { - int: Uint32Array.of(1, 2, 3, 4, 5), - num: Float64Array.of(1.2, 2.3, 3.4, 4.5, 5.6), - str: ['a1', 'b2', 'c3', 'd4', 'e5'], - chr: 'abcde', - obj: [{a:1}, {b:2}, {c:3}, {d:4}, {e:5}] - }; - - const ref = { - int: [1, 2, 3, 4, 5], - num: [1.2, 2.3, 3.4, 4.5, 5.6], - str: ['a1', 'b2', 'c3', 'd4', 'e5'], - chr: ['a', 'b', 'c', 'd', 'e'], - obj: [{a:1}, {b:2}, {c:3}, {d:4}, {e:5}] - }; - - const ct = new ColumnTable(data); - - assert.equal(ct.numRows(), 5, 'num rows'); - assert.equal(ct.numCols(), 5, 'num cols'); - - const rows = [0, 1, 2, 3, 4]; - const get = { - int: rows.map(row => ct.get('int', row)), - num: rows.map(row => ct.get('num', row)), - str: rows.map(row => ct.get('str', row)), - chr: rows.map(row => ct.get('chr', row)), - obj: rows.map(row => ct.get('obj', row)) - }; - assert.deepEqual(get, ref, 'extracted get values match'); - - const getters = ['int', 'num', 'str', 'chr', 'obj'].map(name => ct.getter(name)); - const getter = { - int: rows.map(row => getters[0](row)), - num: rows.map(row => getters[1](row)), - str: rows.map(row => getters[2](row)), - chr: rows.map(row => getters[3](row)), - obj: rows.map(row => getters[4](row)) - }; - assert.deepEqual(getter, ref, 'extracted getter values match'); - - const arrays = ['int', 'num', 'str', 'chr', 'obj'].map(name => ct.columnArray(name)); - const array = { - int: rows.map(row => arrays[0][row]), - num: rows.map(row => arrays[1][row]), - str: rows.map(row => arrays[2][row]), - chr: rows.map(row => arrays[3][row]), - obj: rows.map(row => arrays[4][row]) - }; - assert.deepEqual(array, ref, 'extracted columnArray values match'); - - const scanned = { - int: [], - num: [], - str: [], - chr: [], - obj: [] - }; - ct.scan((row, data) => { - for (const col in data) { - scanned[col].push(data[col].get(row)); - } - }); - assert.deepEqual(scanned, ref, 'scanned values match'); - }); - - it('scan supports filtering and ordering', () => { - const table = new ColumnTable({ - a: ['a', 'a', 'a', 'b', 'b'], - b: [2, 1, 4, 5, 3] - }); - - let idx = []; - table.scan(row => idx.push(row), true); - assert.deepEqual(idx, [0, 1, 2, 3, 4], 'standard scan'); - - const filter = new BitSet(5); - [1, 2, 4].forEach(i => filter.set(i)); - const ft = table.create({ filter }); - idx = []; - ft.scan(row => idx.push(row), true); - assert.deepEqual(idx, [1, 2, 4], 'filtered scan'); - - const order = (u, v, { b }) => b.get(u) - b.get(v); - const ot = table.create({ order }); - assert.ok(ot.isOrdered(), 'is ordered'); - idx = []; - ot.scan(row => idx.push(row), true); - assert.deepEqual(idx, [1, 0, 4, 2, 3], 'ordered scan'); - - idx = []; - ot.scan(row => idx.push(row)); - assert.deepEqual(idx, [0, 1, 2, 3, 4], 'no-order scan'); - }); - - it('scan supports early termination', () => { - const table = new ColumnTable({ - a: ['a', 'a', 'a', 'b', 'b'], - b: [2, 1, 4, 5, 3] - }); - - let count; - const visitor = (row, data, stop) => { if (++count > 1) stop(); }; - - count = 0; - table.scan(visitor, true); - assert.equal(count, 2, 'standard scan'); - - count = 0; - const filter = new BitSet(5); - [1, 2, 4].forEach(i => filter.set(i)); - table.create({ filter }).scan(visitor, true); - assert.equal(count, 2, 'filtered scan'); - - count = 0; - const order = (u, v, { b }) => b.get(u) - b.get(v); - table.create({ order }).scan(visitor, true); - assert.equal(count, 2, 'ordered scan'); - }); - - it('memoizes indices', () => { - const ut = new ColumnTable({ v: [1, 3, 2] }); - const ui = ut.indices(false); - assert.equal(ut.indices(), ui, 'memoize unordered'); - - const ot = ut.orderby('v'); - const of = ot.indices(false); - const oi = ot.indices(); - assert.notEqual(of, oi, 'respect order flag'); - assert.equal(ot.indices(), oi, 'memoize ordered'); - assert.deepEqual(Array.from(oi), [0, 2, 1], 'indices ordered'); - }); - - it('supports column values output', () => { - const dt = new ColumnTable({ - u: ['a', 'a', 'a', 'b', 'b'], - v: [2, 1, 4, 5, 3] - }) - .filter(d => d.v > 1) - .orderby('v'); - - assert.deepEqual( - Array.from(dt.values('u')), - ['a', 'b', 'a', 'b'], - 'column values, strings' - ); - - assert.deepEqual( - Array.from(dt.values('v')), - [2, 3, 4, 5], - 'column values, numbers' - ); - - assert.deepEqual( - Int32Array.from(dt.values('v')), - Int32Array.of(2, 3, 4, 5), - 'column values, typed array' - ); - }); - - it('supports column array output', () => { - const dt = new ColumnTable({ - u: ['a', 'a', 'a', 'b', 'b'], - v: [2, 1, 4, 5, 3] - }) - .filter(d => d.v > 1) - .orderby('v'); - - assert.deepEqual( - dt.array('u'), - ['a', 'b', 'a', 'b'], - 'column array, strings' - ); - - assert.deepEqual( - dt.array('v'), - [2, 3, 4, 5], - 'column array, numbers' - ); - - assert.deepEqual( - dt.array('v', Int32Array), - Int32Array.of(2, 3, 4, 5), - 'column array, typed array' - ); - }); - - it('supports object output', () => { - const output = [ - { u: 'a', v: 1 }, - { u: 'a', v: 2 }, - { u: 'b', v: 3 }, - { u: 'a', v: 4 }, - { u: 'b', v: 5 } - ]; - - const dt = new ColumnTable({ - u: ['a', 'a', 'a', 'b', 'b'], - v: [2, 1, 4, 5, 3] - }) - .orderby('v'); - - assert.deepEqual(dt.objects(), output, 'object data'); - - assert.deepEqual( - dt.objects({ limit: 3 }), - output.slice(0, 3), - 'object data with limit' - ); - - assert.deepEqual( - dt.objects({ columns: not('v') }), - output.map(d => ({ u: d.u })), - 'object data with column selection' - ); - - assert.deepEqual( - dt.objects({ columns: { u: 'a', v: 'b'} }), - output.map(d => ({ a: d.u, b: d.v })), - 'object data with renaming column selection' - ); - - assert.deepEqual( - dt.object(), - output[0], - 'single object, implicit row' - ); - - assert.deepEqual( - dt.object(0), - output[0], - 'single object, explicit row' - ); - - assert.deepEqual( - dt.object(1), - output[1], - 'single object, explicit row' - ); - }); - - it('supports grouped object output', () => { - const dt = new ColumnTable({ - u: ['a', 'a', 'a', 'b', 'b'], - v: [2, 1, 4, 5, 3] - }) - .orderby('v'); - - assert.deepEqual( - dt.groupby('u').objects({ grouped: 'object' }), - { - a: [ - { u: 'a', v: 1 }, - { u: 'a', v: 2 }, - { u: 'a', v: 4 } - ], - b: [ - { u: 'b', v: 3 }, - { u: 'b', v: 5 } - ] - }, - 'grouped object output' - ); - - assert.deepEqual( - dt.groupby('u').objects({ grouped: 'entries' }), - [ - ['a',[ - { u: 'a', v: 1 }, - { u: 'a', v: 2 }, - { u: 'a', v: 4 } - ]], - ['b',[ - { u: 'b', v: 3 }, - { u: 'b', v: 5 } - ]] - ], - 'grouped entries output' - ); - - assert.deepEqual( - dt.groupby('u').objects({ grouped: 'map' }), - new Map([ - ['a',[ - { u: 'a', v: 1 }, - { u: 'a', v: 2 }, - { u: 'a', v: 4 } - ]], - ['b',[ - { u: 'b', v: 3 }, - { u: 'b', v: 5 } - ]] - ]), - 'grouped map output' - ); - - assert.deepEqual( - dt.groupby('u').objects({ grouped: true }), - new Map([ - ['a',[ - { u: 'a', v: 1 }, - { u: 'a', v: 2 }, - { u: 'a', v: 4 } - ]], - ['b',[ - { u: 'b', v: 3 }, - { u: 'b', v: 5 } - ]] - ]), - 'grouped map output, using true' - ); - - assert.deepEqual( - dt.filter(d => d.v < 4).groupby('u').objects({ grouped: 'object' }), - { - a: [ - { u: 'a', v: 1 }, - { u: 'a', v: 2 } - ], - b: [ - { u: 'b', v: 3 } - ] - }, - 'grouped object output, with filter' - ); - - assert.deepEqual( - dt.groupby('u').objects({ limit: 3, grouped: 'object' }), - { - a: [ - { u: 'a', v: 1 }, - { u: 'a', v: 2 } - ], - b: [ - { u: 'b', v: 3 } - ] - }, - 'grouped object output, with limit' - ); - - assert.deepEqual( - dt.groupby('u').objects({ offset: 2, grouped: 'object' }), - { - a: [ - { u: 'a', v: 4 } - ], - b: [ - { u: 'b', v: 3 }, - { u: 'b', v: 5 } - ] - }, - 'grouped object output, with offset' - ); - - const dt2 = new ColumnTable({ - u: ['a', 'a', 'a', 'b', 'b'], - w: ['y', 'x', 'y', 'z', 'x'], - v: [2, 1, 4, 5, 3] - }) - .orderby('v'); - - assert.deepEqual( - dt2.groupby(['u', 'w']).objects({ grouped: 'object' }), - { - a: { - x: [{ u: 'a', w: 'x', v: 1 }], - y: [{ u: 'a', w: 'y', v: 2 },{ u: 'a', w: 'y', v: 4 }] - }, - b: { - x: [{ u: 'b', w: 'x', v: 3 }], - z: [{ u: 'b', w: 'z', v: 5 }] - } - }, - 'grouped nested object output' - ); - - assert.deepEqual( - dt2.groupby(['u', 'w']).objects({ grouped: 'entries' }), - [ - ['a', [ - ['y', [{ u: 'a', w: 'y', v: 2 }, { u: 'a', w: 'y', v: 4 }]], - ['x', [{ u: 'a', w: 'x', v: 1 }]] - ]], - ['b', [ - ['z', [{ u: 'b', w: 'z', v: 5 }]], - ['x', [{ u: 'b', w: 'x', v: 3 }]] - ]] - ], - 'grouped nested entries output' - ); - - assert.deepEqual( - dt2.groupby(['u', 'w']).objects({ grouped: 'map' }), - new Map([ - ['a', new Map([ - ['x', [{ u: 'a', w: 'x', v: 1 }]], - ['y', [{ u: 'a', w: 'y', v: 2 },{ u: 'a', w: 'y', v: 4 }]] - ])], - ['b', new Map([ - ['x', [{ u: 'b', w: 'x', v: 3 }]], - ['z', [{ u: 'b', w: 'z', v: 5 }]] - ])] - ]), - 'grouped nested map output' - ); - }); - - it('supports iterator output', () => { - const output = [ - { u: 'a', v: 2 }, - { u: 'a', v: 1 }, - { u: 'a', v: 4 }, - { u: 'b', v: 5 }, - { u: 'b', v: 3 } - ]; - - const dt = new ColumnTable({ - u: ['a', 'a', 'a', 'b', 'b'], - v: [2, 1, 4, 5, 3] - }); - - assert.deepEqual([...dt], output, 'iterator data'); - assert.deepEqual( - [...dt.orderby('v')], - output.slice().sort((a, b) => a.v - b.v), - 'iterator data orderby' - ); - }); - - it('toString shows table state', () => { - const dt = new ColumnTable({ - a: ['a', 'a', 'a', 'b', 'b'], - b: [2, 1, 4, 5, 3] - }); - assert.equal( - dt.toString(), - '[object Table: 2 cols x 5 rows]', - 'table toString' - ); - - const filter = new BitSet(5); - [1, 2, 4].forEach(i => filter.set(i)); - assert.equal( - dt.create({ filter }).toString(), - '[object Table: 2 cols x 3 rows (5 backing)]', - 'filtered table toString' - ); - - const groups = { names: ['a'], get: [row => dt.get('a', row)], size: 2 }; - assert.equal( - dt.create({ groups }).toString(), - '[object Table: 2 cols x 5 rows, 2 groups]', - 'grouped table toString' - ); - - const order = (u, v, { b }) => b[u] - b[v]; - assert.equal( - dt.create({ order }).toString(), - '[object Table: 2 cols x 5 rows, ordered]', - 'ordered table toString' - ); - - assert.equal( - dt.create({ filter, order, groups }).toString(), - '[object Table: 2 cols x 3 rows (5 backing), 2 groups, ordered]', - 'filtered, grouped, ordered table toString' - ); - }); - - it('assign merges tables', () => { - const t1 = new ColumnTable({ a: [1], b: [2], c: [3] }); - const t2 = new ColumnTable({ b: [-2], d: [4] }); - const t3 = new ColumnTable({ a: [-1], e: [5] }); - const dt = t1.assign(t2, t3); - - tableEqual(dt, { - a: [-1], b: [-2], c: [3], d: [4], e: [5] - }, 'assigned data'); - - assert.deepEqual( - dt.columnNames(), - ['a', 'b', 'c', 'd', 'e'], - 'assigned names' - ); - - assert.throws( - () => t1.assign(new ColumnTable({ c: [1, 2, 3] })), - 'throws on mismatched row counts' - ); - - tableEqual(t1.assign({ b: [-2], d: [4] }), { - a: [1], b: [-2], c: [3], d: [4] - }, 'assigned data from object'); - - assert.throws( - () => t1.assign({ c: [1, 2, 3] }), - 'throws on mismatched row counts from object' - ); - }); - - it('transform applies transformations', () => { - const dt = new ColumnTable({ a: [1, 2], b: [2, 3], c: [3, 4] }); - - tableEqual( - dt.transform( - t => t.filter(d => d.c > 3), - t => t.select('a', 'b'), - t => t.reify() - ), - { a: [2], b: [3] }, - 'transform pipeline' + it('extends Table', () => { + const dt = new ColumnTable({ x: [1, 2, 3] }); + assert.ok(dt instanceof Table, 'ColumnTable extends Table'); + }); + + it('includes transformation verbs', () => { + const proto = ColumnTable.prototype; + assert.ok( + typeof proto.count === 'function', + 'ColumnTable includes count verb' + ); + for (const verbName of Object.keys(verbs)) { + assert.ok( + typeof proto[verbName] === 'function', + `ColumnTable includes ${verbName} verb` + ); + } + }); + + it('includes output format methods', () => { + const proto = ColumnTable.prototype; + assert.ok( + typeof proto.toArrow === 'function', + 'ColumnTable includes toArrow' + ); + assert.ok( + typeof proto.toArrowIPC === 'function', + 'ColumnTable includes toArrowIPC' + ); + assert.ok( + typeof proto.toCSV === 'function', + 'ColumnTable includes toCSV' + ); + assert.ok( + typeof proto.toHTML === 'function', + 'ColumnTable includes toHTML' + ); + assert.ok( + typeof proto.toJSON === 'function', + 'ColumnTable includes toJSON' + ); + assert.ok( + typeof proto.toMarkdown === 'function', + 'ColumnTable includes toMarkdown' ); }); }); diff --git a/test/table/columns-from-test.js b/test/table/columns-from-test.js index 2a92f57e..a0f9105c 100644 --- a/test/table/columns-from-test.js +++ b/test/table/columns-from-test.js @@ -1,5 +1,5 @@ import assert from 'node:assert'; -import columnsFrom from '../../src/table/columns-from.js'; +import { columnsFrom } from '../../src/table/columns-from.js'; describe('columnsFrom', () => { it('supports array input', () => { diff --git a/test/table/table-test.js b/test/table/table-test.js new file mode 100644 index 00000000..e2146d5d --- /dev/null +++ b/test/table/table-test.js @@ -0,0 +1,491 @@ +import assert from 'node:assert'; +import { BitSet, Table, not } from '../../src/index.js'; +import { filter, groupby, orderby } from '../../src/verbs/index.js'; + +describe('Table', () => { + it('supports varied column types', () => { + const data = { + int: Uint32Array.of(1, 2, 3, 4, 5), + num: Float64Array.of(1.2, 2.3, 3.4, 4.5, 5.6), + str: ['a1', 'b2', 'c3', 'd4', 'e5'], + chr: 'abcde', + obj: [{a:1}, {b:2}, {c:3}, {d:4}, {e:5}] + }; + + const ref = { + int: [1, 2, 3, 4, 5], + num: [1.2, 2.3, 3.4, 4.5, 5.6], + str: ['a1', 'b2', 'c3', 'd4', 'e5'], + chr: ['a', 'b', 'c', 'd', 'e'], + obj: [{a:1}, {b:2}, {c:3}, {d:4}, {e:5}] + }; + + const ct = new Table(data); + + assert.equal(ct.numRows(), 5, 'num rows'); + assert.equal(ct.numCols(), 5, 'num cols'); + + const rows = [0, 1, 2, 3, 4]; + const get = { + int: rows.map(row => ct.get('int', row)), + num: rows.map(row => ct.get('num', row)), + str: rows.map(row => ct.get('str', row)), + chr: rows.map(row => ct.get('chr', row)), + obj: rows.map(row => ct.get('obj', row)) + }; + assert.deepEqual(get, ref, 'extracted get values match'); + + const getters = ['int', 'num', 'str', 'chr', 'obj'].map(name => ct.getter(name)); + const getter = { + int: rows.map(row => getters[0](row)), + num: rows.map(row => getters[1](row)), + str: rows.map(row => getters[2](row)), + chr: rows.map(row => getters[3](row)), + obj: rows.map(row => getters[4](row)) + }; + assert.deepEqual(getter, ref, 'extracted getter values match'); + + const arrays = ['int', 'num', 'str', 'chr', 'obj'].map(name => ct.array(name)); + const array = { + int: rows.map(row => arrays[0][row]), + num: rows.map(row => arrays[1][row]), + str: rows.map(row => arrays[2][row]), + chr: rows.map(row => arrays[3][row]), + obj: rows.map(row => arrays[4][row]) + }; + assert.deepEqual(array, ref, 'extracted array values match'); + + const scanned = { + int: [], + num: [], + str: [], + chr: [], + obj: [] + }; + ct.scan((row, data) => { + for (const col in data) { + scanned[col].push(data[col].at(row)); + } + }); + assert.deepEqual(scanned, ref, 'scanned values match'); + }); + + it('copies and freezes column object', () => { + const cols = { x: [1, 2, 3 ]}; + const table = new Table(cols); + const data = table.data(); + assert.notStrictEqual(data, cols, 'is copied'); + assert.strictEqual(data, table._data, 'is direct'); + assert.ok(Object.isFrozen(data), 'is frozen'); + assert.throws(() => data.y = [4, 5, 6], 'throws on edit'); + }); + + it('copies and freezes column name list', () => { + const names = ['y', 'x']; + const cols = { x: [1, 2, 3 ], y: [4, 5, 6]}; + const table = new Table(cols, names); + assert.notStrictEqual(table._names, names, 'is copied'); + assert.ok(Object.isFrozen(table._names), 'is frozen'); + assert.throws(() => table._names.push('z'), 'throws on edit'); + }); + + it('scan supports filtering and ordering', () => { + const table = new Table({ + a: ['a', 'a', 'a', 'b', 'b'], + b: [2, 1, 4, 5, 3] + }); + + let idx = []; + table.scan(row => idx.push(row), true); + assert.deepEqual(idx, [0, 1, 2, 3, 4], 'standard scan'); + + const filter = new BitSet(5); + [1, 2, 4].forEach(i => filter.set(i)); + const ft = table.create({ filter }); + idx = []; + ft.scan(row => idx.push(row), true); + assert.deepEqual(idx, [1, 2, 4], 'filtered scan'); + + const order = (u, v, { b }) => b.at(u) - b.at(v); + const ot = table.create({ order }); + assert.ok(ot.isOrdered(), 'is ordered'); + idx = []; + ot.scan(row => idx.push(row), true); + assert.deepEqual(idx, [1, 0, 4, 2, 3], 'ordered scan'); + + idx = []; + ot.scan(row => idx.push(row)); + assert.deepEqual(idx, [0, 1, 2, 3, 4], 'no-order scan'); + }); + + it('scan supports early termination', () => { + const table = new Table({ + a: ['a', 'a', 'a', 'b', 'b'], + b: [2, 1, 4, 5, 3] + }); + + let count; + const visitor = (row, data, stop) => { if (++count > 1) stop(); }; + + count = 0; + table.scan(visitor, true); + assert.equal(count, 2, 'standard scan'); + + count = 0; + const filter = new BitSet(5); + [1, 2, 4].forEach(i => filter.set(i)); + table.create({ filter }).scan(visitor, true); + assert.equal(count, 2, 'filtered scan'); + + count = 0; + const order = (u, v, { b }) => b.at(u) - b.at(v); + table.create({ order }).scan(visitor, true); + assert.equal(count, 2, 'ordered scan'); + }); + + it('memoizes indices', () => { + const ut = new Table({ v: [1, 3, 2] }); + const ui = ut.indices(false); + assert.equal(ut.indices(), ui, 'memoize unordered'); + + const ot = orderby(ut, 'v'); + const of = ot.indices(false); + const oi = ot.indices(); + assert.notEqual(of, oi, 'respect order flag'); + assert.equal(ot.indices(), oi, 'memoize ordered'); + assert.deepEqual(Array.from(oi), [0, 2, 1], 'indices ordered'); + }); + + it('supports column values output', () => { + const t = new Table({ + u: ['a', 'a', 'a', 'b', 'b'], + v: [2, 1, 4, 5, 3] + }); + const dt = orderby(filter(t, d => d.v > 1), 'v'); + + assert.deepEqual( + Array.from(dt.values('u')), + ['a', 'b', 'a', 'b'], + 'column values, strings' + ); + + assert.deepEqual( + Array.from(dt.values('v')), + [2, 3, 4, 5], + 'column values, numbers' + ); + + assert.deepEqual( + Int32Array.from(dt.values('v')), + Int32Array.of(2, 3, 4, 5), + 'column values, typed array' + ); + }); + + it('supports column array output', () => { + const t = new Table({ + u: ['a', 'a', 'a', 'b', 'b'], + v: [2, 1, 4, 5, 3] + }); + const dt = orderby(filter(t, d => d.v > 1), 'v'); + + assert.deepEqual( + dt.array('u'), + ['a', 'b', 'a', 'b'], + 'column array, strings' + ); + + assert.deepEqual( + dt.array('v'), + [2, 3, 4, 5], + 'column array, numbers' + ); + + assert.deepEqual( + dt.array('v', Int32Array), + Int32Array.of(2, 3, 4, 5), + 'column array, typed array' + ); + }); + + it('supports object output', () => { + const output = [ + { u: 'a', v: 1 }, + { u: 'a', v: 2 }, + { u: 'b', v: 3 }, + { u: 'a', v: 4 }, + { u: 'b', v: 5 } + ]; + + const dt = orderby(new Table({ + u: ['a', 'a', 'a', 'b', 'b'], + v: [2, 1, 4, 5, 3] + }), 'v'); + + assert.deepEqual(dt.objects(), output, 'object data'); + + assert.deepEqual( + dt.objects({ limit: 3 }), + output.slice(0, 3), + 'object data with limit' + ); + + assert.deepEqual( + dt.objects({ columns: not('v') }), + output.map(d => ({ u: d.u })), + 'object data with column selection' + ); + + assert.deepEqual( + dt.objects({ columns: { u: 'a', v: 'b'} }), + output.map(d => ({ a: d.u, b: d.v })), + 'object data with renaming column selection' + ); + + assert.deepEqual( + dt.object(), + output[0], + 'single object, implicit row' + ); + + assert.deepEqual( + dt.object(0), + output[0], + 'single object, explicit row' + ); + + assert.deepEqual( + dt.object(1), + output[1], + 'single object, explicit row' + ); + }); + + it('supports grouped object output', () => { + const dt = orderby(new Table({ + u: ['a', 'a', 'a', 'b', 'b'], + v: [2, 1, 4, 5, 3] + }), 'v'); + + assert.deepEqual( + groupby(dt, 'u').objects({ grouped: 'object' }), + { + a: [ + { u: 'a', v: 1 }, + { u: 'a', v: 2 }, + { u: 'a', v: 4 } + ], + b: [ + { u: 'b', v: 3 }, + { u: 'b', v: 5 } + ] + }, + 'grouped object output' + ); + + assert.deepEqual( + groupby(dt, 'u').objects({ grouped: 'entries' }), + [ + ['a',[ + { u: 'a', v: 1 }, + { u: 'a', v: 2 }, + { u: 'a', v: 4 } + ]], + ['b',[ + { u: 'b', v: 3 }, + { u: 'b', v: 5 } + ]] + ], + 'grouped entries output' + ); + + assert.deepEqual( + groupby(dt, 'u').objects({ grouped: 'map' }), + new Map([ + ['a',[ + { u: 'a', v: 1 }, + { u: 'a', v: 2 }, + { u: 'a', v: 4 } + ]], + ['b',[ + { u: 'b', v: 3 }, + { u: 'b', v: 5 } + ]] + ]), + 'grouped map output' + ); + + assert.deepEqual( + groupby(dt, 'u').objects({ grouped: true }), + new Map([ + ['a',[ + { u: 'a', v: 1 }, + { u: 'a', v: 2 }, + { u: 'a', v: 4 } + ]], + ['b',[ + { u: 'b', v: 3 }, + { u: 'b', v: 5 } + ]] + ]), + 'grouped map output, using true' + ); + + assert.deepEqual( + groupby(filter(dt, d => d.v < 4), 'u') + .objects({ grouped: 'object' }), + { + a: [ + { u: 'a', v: 1 }, + { u: 'a', v: 2 } + ], + b: [ + { u: 'b', v: 3 } + ] + }, + 'grouped object output, with filter' + ); + + assert.deepEqual( + groupby(dt, 'u').objects({ limit: 3, grouped: 'object' }), + { + a: [ + { u: 'a', v: 1 }, + { u: 'a', v: 2 } + ], + b: [ + { u: 'b', v: 3 } + ] + }, + 'grouped object output, with limit' + ); + + assert.deepEqual( + groupby(dt, 'u').objects({ offset: 2, grouped: 'object' }), + { + a: [ + { u: 'a', v: 4 } + ], + b: [ + { u: 'b', v: 3 }, + { u: 'b', v: 5 } + ] + }, + 'grouped object output, with offset' + ); + + const dt2 = orderby(new Table({ + u: ['a', 'a', 'a', 'b', 'b'], + w: ['y', 'x', 'y', 'z', 'x'], + v: [2, 1, 4, 5, 3] + }), 'v'); + + assert.deepEqual( + groupby(dt2, ['u', 'w']).objects({ grouped: 'object' }), + { + a: { + x: [{ u: 'a', w: 'x', v: 1 }], + y: [{ u: 'a', w: 'y', v: 2 },{ u: 'a', w: 'y', v: 4 }] + }, + b: { + x: [{ u: 'b', w: 'x', v: 3 }], + z: [{ u: 'b', w: 'z', v: 5 }] + } + }, + 'grouped nested object output' + ); + + assert.deepEqual( + groupby(dt2, ['u', 'w']).objects({ grouped: 'entries' }), + [ + ['a', [ + ['y', [{ u: 'a', w: 'y', v: 2 }, { u: 'a', w: 'y', v: 4 }]], + ['x', [{ u: 'a', w: 'x', v: 1 }]] + ]], + ['b', [ + ['z', [{ u: 'b', w: 'z', v: 5 }]], + ['x', [{ u: 'b', w: 'x', v: 3 }]] + ]] + ], + 'grouped nested entries output' + ); + + assert.deepEqual( + groupby(dt2, ['u', 'w']).objects({ grouped: 'map' }), + new Map([ + ['a', new Map([ + ['x', [{ u: 'a', w: 'x', v: 1 }]], + ['y', [{ u: 'a', w: 'y', v: 2 },{ u: 'a', w: 'y', v: 4 }]] + ])], + ['b', new Map([ + ['x', [{ u: 'b', w: 'x', v: 3 }]], + ['z', [{ u: 'b', w: 'z', v: 5 }]] + ])] + ]), + 'grouped nested map output' + ); + }); + + it('supports iterator output', () => { + const output = [ + { u: 'a', v: 2 }, + { u: 'a', v: 1 }, + { u: 'a', v: 4 }, + { u: 'b', v: 5 }, + { u: 'b', v: 3 } + ]; + + const dt = new Table({ + u: ['a', 'a', 'a', 'b', 'b'], + v: [2, 1, 4, 5, 3] + }); + + assert.deepEqual([...dt], output, 'iterator data'); + assert.deepEqual( + [...orderby(dt, 'v')], + output.slice().sort((a, b) => a.v - b.v), + 'iterator data orderby' + ); + }); + + it('toString shows table state', () => { + const dt = new Table({ + a: ['a', 'a', 'a', 'b', 'b'], + b: [2, 1, 4, 5, 3] + }); + + assert.equal( + dt.toString(), + '[object Table: 2 cols x 5 rows]', + 'table toString' + ); + + const filter = new BitSet(5); + [1, 2, 4].forEach(i => filter.set(i)); + assert.equal( + dt.create({ filter }).toString(), + '[object Table: 2 cols x 3 rows (5 backing)]', + 'filtered table toString' + ); + + const groups = { names: ['a'], get: [row => dt.get('a', row)], size: 2 }; + assert.equal( + dt.create({ groups }).toString(), + '[object Table: 2 cols x 5 rows, 2 groups]', + 'grouped table toString' + ); + + const order = (u, v, { b }) => b[u] - b[v]; + assert.equal( + dt.create({ order }).toString(), + '[object Table: 2 cols x 5 rows, ordered]', + 'ordered table toString' + ); + + assert.equal( + dt.create({ filter, order, groups }).toString(), + '[object Table: 2 cols x 3 rows (5 backing), 2 groups, ordered]', + 'filtered, grouped, ordered table toString' + ); + }); +}); diff --git a/test/types-test.ts b/test/types-test.ts new file mode 100644 index 00000000..b6422555 --- /dev/null +++ b/test/types-test.ts @@ -0,0 +1,104 @@ +// Example code that should not cause any TypeScript errors +import * as aq from '../src/api.js'; +const { op } = aq; + +const dt = aq.table({ + x: [1, 2, 3], + y: ['a', 'b', 'c'] +}); +const other = aq.table({ u: [3, 2, 1 ] }); +const other2 = aq.table({ x: [4, 5, 6 ] }); + +export const rt = dt + .antijoin(other) + .antijoin(other, ['keyL', 'keyR']) + .antijoin(other, (a, b) => op.equal(a.keyL, b.keyR)) + .assign({ z: [4, 5, 6] }, other) + .concat(other) + .concat(other, other2) + .concat([other, other2]) + .count() + .count({ as: 'foo' }) + .cross(other) + .cross(other, [['leftKey', 'leftVal'], ['rightVal']]) + .dedupe() + .dedupe('y') + .derive({ + row1: op.row_number(), + lead1: op.lead('s'), + row2: () => op.row_number(), + lead2: (d: {s: string}) => op.lead(op.trim(d.s)), + z: (d: {x: number}) => (d.x - op.average(d.x)) / op.stdev(d.x), + avg: aq.rolling( + (d: {x: number}) => op.average(d.x), + [-5, 5] + ), + mix: (d: any) => d.x > 2 ? d.u : d.z + }) + .except(other) + .except(other, other2) + .except([other, other2]) + .filter((d: any) => d.x > 2 && d.s !== 'foo') + .filter((d: {x: number, s: string}) => d.x > 2 && d.s !== 'foo') + .fold('colA') + .fold(['colA', 'colB'], { as: ['k', 'v'] }) + .groupby('y') + .ungroup() + .groupby({ g: 'y' }) + .ungroup() + .impute({ v: () => 0 }) + .impute({ v: (d: {v: number}) => op.mean(d.v) }) + .impute({ v: () => 0 }, { expand: ['x', 'y'] }) + .intersect(other) + .intersect(other, other2) + .intersect([other, other2]) + .join(other, ['keyL', 'keyR']) + .join(other, (a, b) => op.equal(a.keyL, b.keyR)) + .join_left(other, ['keyL', 'keyR']) + .join_left(other, (a, b) => op.equal(a.keyL, b.keyR)) + .join_right(other, ['keyL', 'keyR']) + .join_right(other, (a, b) => op.equal(a.keyL, b.keyR)) + .join_full(other, ['keyL', 'keyR']) + .join_full(other, (a, b) => op.equal(a.keyL, b.keyR)) + .lookup(other, ['key1', 'key2'], 'value1', 'value2') + .orderby('x', aq.desc('u')) + .unorder() + .pivot('key', 'value') + .pivot(['keyA', 'keyB'], ['valueA', 'valueB']) + .pivot({ key: (d: any) => d.key }, { value: (d: any) => op.sum(d.value) }) + .relocate(['x', 'y'], { after: 'z' }) + .rename({ x: 'xx', y: 'yy' }) + .rollup({ + min1: op.min('x'), + max1: op.max('x'), + sum1: op.sum('x'), + mode1: op.mode('x'), + min2: (d: {x: number}) => op.min(d.x), + max2: (d: {s: string}) => op.max(d.s), + sum2: (d: {x: number}) => op.sum(d.x), + mode2: (d: {d: Date}) => op.mode(d.d), + mix: (d: {x: number, z: number}) => op.min(d.x) + op.sum(d.z) + }) + .sample(100) + .sample(100, { replace: true }) + .select('x') + .select({ x: 'xx' }) + .select(aq.all(), aq.not('x'), aq.range(0, 5)) + .semijoin(other) + .semijoin(other, ['keyL', 'keyR']) + .semijoin(other, (a, b) => op.equal(a.keyL, b.keyR)) + .slice(1, -1) + .slice(2) + .spread({ a: (d: any) => op.split(d.y, '') }) + .spread('arrayCol', { limit: 100 }) + .union(other) + .union(other, other2) + .union([other, other2]) + .unroll('arrayCol', { limit: 1000 }); + +export const arrow : import('apache-arrow').Table = dt.toArrow(); +export const buf : Uint8Array = dt.toArrowIPC(); +export const csv : string = dt.toCSV({ delimiter: '\t' }); +export const json : string = dt.toJSON({ columns: ['x', 'y'] }); +export const html : string = dt.toHTML(); +export const md : string = dt.toMarkdown(); diff --git a/test/verbs/assign-test.js b/test/verbs/assign-test.js new file mode 100644 index 00000000..c256f29f --- /dev/null +++ b/test/verbs/assign-test.js @@ -0,0 +1,36 @@ +import assert from 'node:assert'; +import tableEqual from '../table-equal.js'; +import { table } from '../../src/index.js'; + +describe('assign', () => { + it('assign merges tables', () => { + const t1 = table({ a: [1], b: [2], c: [3] }); + const t2 = table({ b: [-2], d: [4] }); + const t3 = table({ a: [-1], e: [5] }); + const dt = t1.assign(t2, t3); + + tableEqual(dt, { + a: [-1], b: [-2], c: [3], d: [4], e: [5] + }, 'assigned data'); + + assert.deepEqual( + dt.columnNames(), + ['a', 'b', 'c', 'd', 'e'], + 'assigned names' + ); + + assert.throws( + () => t1.assign(table({ c: [1, 2, 3] })), + 'throws on mismatched row counts' + ); + + tableEqual(t1.assign({ b: [-2], d: [4] }), { + a: [1], b: [-2], c: [3], d: [4] + }, 'assigned data from object'); + + assert.throws( + () => t1.assign({ c: [1, 2, 3] }), + 'throws on mismatched row counts from object' + ); + }); +}); diff --git a/test/verbs/groupby-test.js b/test/verbs/groupby-test.js index 2dfe892a..601897ab 100644 --- a/test/verbs/groupby-test.js +++ b/test/verbs/groupby-test.js @@ -1,6 +1,5 @@ import assert from 'node:assert'; -import fromArrow from '../../src/format/from-arrow.js'; -import { desc, op, table } from '../../src/index.js'; +import { desc, fromArrow, op, table, toArrow } from '../../src/index.js'; describe('groupby', () => { it('computes groups based on field names', () => { @@ -145,12 +144,12 @@ describe('groupby', () => { }); it('optimizes Arrow dictionary columns', () => { - const dt = fromArrow( + const dt = fromArrow(toArrow( table({ d: ['a', 'a', 'b', 'b'], v: [1, 2, 3, 4] - }).toArrow() - ); + }) + )); const gt = dt.groupby('d'); assert.equal( diff --git a/test/verbs/reduce-test.js b/test/verbs/reduce-test.js index 277e8875..5e2531d4 100644 --- a/test/verbs/reduce-test.js +++ b/test/verbs/reduce-test.js @@ -1,7 +1,7 @@ import assert from 'node:assert'; import tableEqual from '../table-equal.js'; import { table } from '../../src/index.js'; -import countPattern from '../../src/engine/reduce/count-pattern.js'; +import countPattern from '../../src/verbs/reduce/count-pattern.js'; describe('reduce', () => { it('produces multiple aggregates', () => { diff --git a/test/verbs/reify-test.js b/test/verbs/reify-test.js index 62044e6e..ac11a16e 100644 --- a/test/verbs/reify-test.js +++ b/test/verbs/reify-test.js @@ -35,9 +35,9 @@ describe('reify', () => { b: ['a', 'b', 'd'], c: [[1], [2], [4]], d: [ - +(new Date(2000, 0, 1, 1)), - +(new Date(2001, 1, 1, 2)), - +(new Date(2003, 3, 1, 4)) + new Date(2000, 0, 1, 1), + new Date(2001, 1, 1, 2), + new Date(2003, 3, 1, 4) ] }, 'reify data' diff --git a/test/verbs/rollup-test.js b/test/verbs/rollup-test.js index f0dd79d8..3cd53a33 100644 --- a/test/verbs/rollup-test.js +++ b/test/verbs/rollup-test.js @@ -134,8 +134,7 @@ describe('rollup', () => { const dt = table(data).groupby('g'); assert.deepEqual( - dt.rollup({ o: op.object_agg('k', 'v') }) - .columnArray('o'), + dt.rollup({ o: op.object_agg('k', 'v') }).array('o'), [ { a: 1, b: 2 }, { a: 5, b: 4 } @@ -144,8 +143,7 @@ describe('rollup', () => { ); assert.deepEqual( - dt.rollup({ o: op.entries_agg('k', 'v') }) - .columnArray('o'), + dt.rollup({ o: op.entries_agg('k', 'v') }).array('o'), [ [['a', 1], ['b', 2]], [['a', 3], ['b', 4], ['a', 5]] @@ -154,8 +152,7 @@ describe('rollup', () => { ); assert.deepEqual( - dt.rollup({ o: op.map_agg('k', 'v') }) - .columnArray('o'), + dt.rollup({ o: op.map_agg('k', 'v') }).array('o'), [ new Map([['a', 1], ['b', 2]]), new Map([['a', 5], ['b', 4]]) diff --git a/test/verbs/sample-test.js b/test/verbs/sample-test.js index 3c78c451..415916f7 100644 --- a/test/verbs/sample-test.js +++ b/test/verbs/sample-test.js @@ -6,7 +6,7 @@ function check(table, replace, prefix = '') { const vals = []; const cnts = {}; table.scan((row, data) => { - const val = data.a.get(row); + const val = data.a.at(row); vals.push(val); cnts[val] = (cnts[val] || 0) + 1; }); @@ -141,7 +141,7 @@ describe('sample', () => { assert.equal(ft.numRows(), 2, 'num rows'); assert.equal(ft.numCols(), 2, 'num cols'); assert.deepEqual( - ft.column('b').data.sort((a, b) => a - b), + ft.column('b').sort((a, b) => a - b), [2, 4], 'stratify keys' ); diff --git a/tsconfig.json b/tsconfig.json index ee068bf1..b94ec148 100644 --- a/tsconfig.json +++ b/tsconfig.json @@ -1,10 +1,15 @@ { - "include": ["src/**/*"], + "include": ["src/index-browser.js", "src/index.js"], "compilerOptions": { "allowJs": true, + "checkJs": true, "declaration": true, "emitDeclarationOnly": true, + "esModuleInterop": true, + "module": "node16", + "moduleResolution": "node16", "outDir": "dist/types", + "target": "es2022", "skipLibCheck": true } }