Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add collate helper for custom sort orders. #371

Merged
merged 1 commit into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion docs/api/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ title: Arquero API Reference
* [load](#load), [loadArrow](#loadArrow), [loadCSV](#loadCSV), [loadFixed](#loadFixed), [loadJSON](#loadJSON)
* [Expression Helpers](#expression-helpers)
* [op](#op), [agg](#agg), [escape](#escape)
* [bin](#bin), [desc](#desc), [frac](#frac), [rolling](#rolling), [seed](#seed)
* [bin](#bin), [collate](#collate), [desc](#desc), [frac](#frac), [rolling](#rolling), [seed](#seed)
* [Selection Helpers](#selection-helpers)
* [all](#all), [not](#not), [range](#range)
* [matches](#matches), [startswith](#startswith), [endswith](#endswith)
Expand Down Expand Up @@ -491,6 +491,26 @@ Generate a table expression that performs uniform binning of number values. The
aq.bin('colA', { maxbins: 20 })
```

<hr/><a id="collate" href="#collate">#</a>
<em>aq</em>.<b>collate</b>(<i>expr</i>, <i>comparator</i>[, <i>options</i>]) · [Source](https://github.com/uwdata/arquero/blob/master/src/helpers/collate.js)

Annotate a table expression with collation metadata, indicating how expression values should be compared and sorted. The [orderby](verbs#orderby) verb uses collation metadata to determine sort order. The collate helper is particularly useful for locale-specific string comparisons. The collation information can either take the form a standard two-argument comparator function, or as locale and option arguments compatible with [`Intl.Collator`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Collator).

* *expr*: The table expression to annotate with collation metadata.
* *comparator*: A comparator function or the locale(s) to use. For locales, both string (e.g., `'de'`, `'tr'`, etc.) and [`Intl.Locale`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Locale) objects (or an array with either) is supported.
* *options*: Collation options compatible with [`Intl.Collator`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Collator). This argument only applies if locales are provided as the second argument.

*Examples*

```js
// order colA using a German locale
aq.collate('colA', 'de')
```

```js
// order colA using a provided comparator function
aq.collate('colA', new Intl.Collator('de').compare)
```

<hr/><a id="desc" href="#desc">#</a>
<em>aq</em>.<b>desc</b>(<i>expr</i>) · [Source](https://github.com/uwdata/arquero/blob/master/src/helpers/desc.js)
Expand Down
14 changes: 12 additions & 2 deletions docs/api/verbs.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ table.ungroup()

Order table rows based on a set of column values. Subsequent operations sensitive to ordering (such as window functions) will operate over sorted values. The resulting table provides an view over the original data, without any copying. To create a table with sorted data copied to new data strucures, call [reify](#reify) on the result of this method. To undo ordering, use [unorder](#unorder).

* *keys*: Key values to sort by, in precedence order. By default, sorting is done in ascending order. To sort in descending order, wrap values using [desc](./#desc). If a string, order by the column with that name. If a number, order by the column with that index. If a function, must be a valid table expression; aggregate functions are permitted, but window functions are not. If an object, object values must be valid values parameters with output column names for keys and table expressions for values (the output names will be ignored). If an array, array values must be valid key parameters.
* *keys*: Key values to sort by, in precedence order. By default, sorting is done in ascending order. To sort in descending order, wrap values using [desc](./#desc). To provide a custom sort order for a key (such as for locale-specific string comparison), wrap the key value using [collate](./#collate). If a key is a string, order by the column with that name. If a number, order by the column with that index. If a function, the key must be a valid table expression; aggregate functions are permitted, but window functions are not. If an object, object values must be valid values parameters with output column names for keys and table expressions for values (the output names will be ignored). If an array, array values must be valid key parameters.

*Examples*

Expand All @@ -135,9 +135,19 @@ table.orderby('a', aq.desc('b'))
table.orderby({ a: 'a', b: aq.desc('b') )})
```

```js
// order by column 'a' according to German locale settings
table.orderby(aq.collate('a', 'de'))
```

```js
// orderby accepts table expressions as well as column names
table.orderby(aq.desc(d => d.a))
table.orderby(d => d.a)
```

```js
// the configurations above can be combined
table.orderby(aq.desc(aq.collate(d => d.a, 'de')))
```

<hr/><a id="unorder" href="#unorder">#</a>
Expand Down
1 change: 1 addition & 0 deletions src/api.js
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ export { default as toJSON } from './format/to-json.js';
export { default as toMarkdown } from './format/to-markdown.js';
export { default as bin } from './helpers/bin.js';
export { default as escape } from './helpers/escape.js';
export { default as collate } from './helpers/collate.js';
export { default as desc } from './helpers/desc.js';
export { default as field } from './helpers/field.js';
export { default as frac } from './helpers/frac.js';
Expand Down
17 changes: 10 additions & 7 deletions src/expression/compare.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,8 @@ import parse from './parse.js';
import { aggregate } from '../verbs/reduce/util.js';

// generate code to compare a single field
const _compare = (u, v, lt, gt) =>
`((u = ${u}) < (v = ${v}) || u == null) && v != null ? ${lt}
: (u > v || v == null) && u != null ? ${gt}
: ((v = v instanceof Date ? +v : v), (u = u instanceof Date ? +u : u)) !== u && v === v ? ${lt}
: v !== v && u === u ? ${gt} : `;
const _compare = (u, v, lt, gt) => `((u = ${u}) < (v = ${v}) || u == null) && v != null ? ${lt} : (u > v || v == null) && u != null ? ${gt} : ((v = v instanceof Date ? +v : v), (u = u instanceof Date ? +u : u)) !== u && v === v ? ${lt} : v !== v && u === u ? ${gt} : `;
const _collate = (u, v, lt, gt, f) => `(v = ${v}, (u = ${u}) == null && v == null) ? 0 : v == null ? ${gt} : u == null ? ${lt} : (u = ${f}(u,v)) ? u : `;

export default function(table, fields) {
// parse expressions, generate code for both a and b values
Expand Down Expand Up @@ -50,9 +47,15 @@ export default function(table, fields) {
+ (op && table.isGrouped() ? 'const ka = keys[a], kb = keys[b];' : '')
+ 'let u, v; return ';
for (let i = 0; i < n; ++i) {
const o = fields.get(names[i]).desc ? -1 : 1;
const field = fields.get(names[i]);
const o = field.desc ? -1 : 1;
const [u, v] = exprs[i];
code += _compare(u, v, -o, o);
if (field.collate) {
code += _collate(u, v, -o, o, `${o < 0 ? '-' : ''}fn[${fn.length}]`);
fn.push(field.collate);
} else {
code += _compare(u, v, -o, o);
}
}
code += '0;};';

Expand Down
25 changes: 25 additions & 0 deletions src/helpers/collate.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import isFunction from '../util/is-function.js';
import wrap from './wrap.js';

/**
* Annotate a table expression with collation metadata, indicating how
* expression values should be compared and sorted. The orderby verb uses
* collation metadata to determine sort order. The collation information can
* either take the form a standard two-argument comparator function, or as
* locale and option arguments compatible with `Intl.Collator`.
* @param {string|Function|object} expr The table expression to annotate
* with collation metadata.
* @param {Intl.LocalesArgument | ((a: any, b: any) => number)} comparator
* A comparator function or the locale(s) to collate by.
* @param {Intl.CollatorOptions} [options] Collation options, applicable
* with locales only.
* @return {object} A wrapper object representing the collated value.
* @example orderby(collate('colA', 'de'))
*/
export default function(expr, comparator, options) {
return wrap(expr, {
collate: isFunction(comparator)
? comparator
: new Intl.Collator(comparator, options).compare
});
}
3 changes: 3 additions & 0 deletions src/util/is-function.js
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
/**
* @returns {value is Function}
*/
export default function(value) {
return typeof value === 'function';
}
88 changes: 87 additions & 1 deletion test/verbs/orderby-test.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import assert from 'node:assert';
import tableEqual from '../table-equal.js';
import { desc, op, table } from '../../src/index.js';
import { collate, desc, op, table } from '../../src/index.js';

describe('orderby', () => {
it('orders a table', () => {
Expand All @@ -23,6 +23,92 @@ describe('orderby', () => {
tableEqual(dt, ordered, 'orderby data');
});

it('orders a table with collate comparator', () => {
const cmp = new Intl.Collator('tr-TR').compare;

const data = {
a: ['çilek', 'şeftali', 'erik', 'armut', 'üzüm', 'erik'],
b: [1, 2, 1, 2, 1, 2]
};

const dt = table(data).orderby(collate('a', cmp), desc('b'));

const rows = [];
dt.scan(row => rows.push(row), true);
assert.deepEqual(rows, [3, 0, 5, 2, 1, 4], 'orderby scan');

tableEqual(
dt,
{
a: ['armut', 'çilek', 'erik', 'erik', 'şeftali', 'üzüm'],
b: [2, 1, 2, 1, 2, 1]
},
'orderby data'
);

tableEqual(
table(data).orderby(desc(collate('a', cmp)), desc('b')),
{
a: ['üzüm', 'şeftali', 'erik', 'erik', 'çilek', 'armut'],
b: [1, 2, 2, 1, 1, 2]
},
'orderby data'
);
});

it('orders a table with collate locale', () => {
const data = {
a: ['çilek', 'şeftali', 'erik', 'armut', 'üzüm', 'erik'],
b: [1, 2, 1, 2, 1, 2]
};

const dt = table(data).orderby(collate('a', 'tr-TR'), desc('b'));

const rows = [];
dt.scan(row => rows.push(row), true);
assert.deepEqual(rows, [3, 0, 5, 2, 1, 4], 'orderby scan');

tableEqual(
dt,
{
a: ['armut', 'çilek', 'erik', 'erik', 'şeftali', 'üzüm'],
b: [2, 1, 2, 1, 2, 1]
},
'orderby data'
);

tableEqual(
table(data).orderby(desc(collate('a', 'tr-TR')), desc('b')),
{
a: ['üzüm', 'şeftali', 'erik', 'erik', 'çilek', 'armut'],
b: [1, 2, 2, 1, 1, 2]
},
'orderby data'
);
});

it('orders a table with combined annotations', () => {
const data = {
a: ['çilek', 'şeftali', 'erik', 'armut', 'üzüm', 'erik'],
b: [1, 2, 1, 2, 1, 2]
};

const dt = table(data).orderby(desc(collate(d => d.a, 'tr-TR')), 'b');

const rows = [];
dt.scan(row => rows.push(row), true);
assert.deepEqual(rows, [4, 1, 2, 5, 0, 3], 'orderby scan');

tableEqual(
dt,
{
a: ['üzüm', 'şeftali', 'erik', 'erik', 'çilek', 'armut'],
b: [1, 2, 1, 2, 1, 2]
},
'orderby data'
);
});

it('supports aggregate functions', () => {
const data = {
a: [1, 2, 2, 3, 4, 5],
Expand Down