Skip to content

Releases: scikit-hep/awkward

Version 2.0.3

23 Dec 18:09
bd3efcc
Compare
Choose a tag to compare

Backward-incompatible changes

  • The flatten_records argument of all reducers (ak.all, ak.any, ..., ak.var) has effectively been removed: setting it now raises an error (PR #2020). This argument applies a reducer to all contents of a record, merging fields, and it had to be removed to properly implement axis=None. The old default, flatten_records=False, is now the only behavior, and to get the equivalent of flatten_records=True, you can use ak.ravel:
ak.sum(array, flatten_records=True)

becomes

ak.sum(ak.ravel(array))

Note: yanked from PyPI in favor of 2.0.4.

New features

  • feat: add data-touch reporting to the type-tracer. by @jpivarski in #2027

Bug-fixes and performance

  • fix: extend TypeTracerArray with eq, ne, and array_ufunc. by @jpivarski in #2021
  • fix: add support for Long64_t by @ianna in #2023
  • fix: replace protocol with direct subclass by @agoose77 in #2029
  • fix: support UnknownLength in ak.types.ArrayType by @agoose77 in #2031
  • refactor!: use exclusively axis=-1 reduction for axis=None by @agoose77 in #2020

Other

Full Changelog: v2.0.2...v2.0.3

Version 2.0.2

16 Dec 18:54
7e6f504
Compare
Choose a tag to compare

New features

Bug-fixes and performance

(none!)

Other

(none!)

Full Changelog: v2.0.1...v2.0.2

Version 2.0.1

15 Dec 19:10
3fc4adb
Compare
Choose a tag to compare

New features

Bug-fixes and performance

Other

Full Changelog: v2.0.0...v2.0.1

Version 2.0.0

10 Dec 00:52
Compare
Choose a tag to compare

Version 2.0.0 of Awkward Array

The Awkward Array version 2 project started in June of 2021 and has been developed alongside version 1 updates. For most of that time, it was available as a submodule, awkward._v2, so that it could be tested with the same tests as version 1 and could be experimented upon by early adopters.

The usual list of pull request titles would not be useful as release notes because the changes from 1.10.2 to 2.0.0 are too extensive. But here's a list of their PR numbers:

#884, #895, #896, #914, #957, #958, #959, #962, #977, #1025, #1031, #1036, #1045, #1059, #1063, #1072, #1073, #1074, #1079, #1082, #1092, #1099, #1101, #1109, #1110, #1111, #1116, #1117, #1119, #1121, #1122, #1123, #1124, #1125, #1130, #1131, #1132, #1134, #1135, #1137, #1138, #1140, #1141, #1142, #1143, #1145, #1146, #1147, #1148, #1149, #1150, #1153, #1154, #1156, #1159, #1160, #1161, #1162, #1164, #1165, #1183, #1184, #1201, #1203, #1204, #1206, #1207, #1211, #1214, #1215, #1217, #1218, #1219, #1220, #1221, #1222, #1225, #1226, #1227, #1228, #1229, #1233, #1234, #1240, #1242, #1245, #1248, #1259, #1270, #1276, #1279, #1289, #1290, #1292, #1293, #1294, #1296, #1297, #1300, #1301, #1304, #1306, #1307, #1309, #1312, #1317, #1321, #1327, #1329, #1338, #1340, #1346, #1347, #1351, #1352, #1354, #1355, #1356, #1359, #1360, #1364, #1365, #1367, #1368, #1369, #1370, #1372, #1373, #1374, #1376, #1378, #1380, #1381, #1383, #1384, #1385, #1387, #1390, #1392, #1393, #1394, #1395, #1397, #1398, #1399, #1401, #1404, #1407, #1408, #1409, #1410, #1412, #1413, #1415, #1416, #1418, #1419, #1421, #1422, #1425, #1426, #1427, #1428, #1429, #1430, #1431, #1432, #1433, #1434, #1435, #1437, #1440, #1443, #1444, #1445, #1446, #1447, #1449, #1455, #1456, #1457, #1458, #1462, #1464, #1465, #1467, #1468, #1469, #1470, #1474, #1475, #1476, #1478, #1484, #1485, #1486, #1487, #1490, #1491, #1492, #1493, #1494, #1496, #1497, #1498, #1499, #1502, #1503, #1505, #1508, #1510, #1513, #1514, #1515, #1516, #1518, #1519, #1520, #1521, #1523, #1524, #1527, #1531, #1532, #1533, #1535, #1536, #1537, #1538, #1539, #1540, #1541, #1542, #1543, #1544, #1550, #1555, #1556, #1559, #1560, #1561, #1562, #1564, #1565, #1566, #1567, #1568, #1572, #1573, #1576, #1579, #1581, #1589, #1593, #1597, #1598, #1602, #1603, #1604, #1605, #1607, #1609, #1610, #1613, #1614, #1615, #1616, #1617, #1618, #1619, #1620, #1621, #1625, #1627, #1629, #1632, #1636, #1641, #1642, #1645, #1649, #1650, #1651, #1652, #1653, #1661, #1665, #1666, #1671, #1673, #1674, #1675, #1677, #1679, #1689, #1691, #1692, #1695, #1698, #1699, #1700, #1706, #1708, #1712, #1715, #1716, #1717, #1721, #1722, #1723, #1725, #1730, #1731, #1732, #1733, #1739, #1740, #1743, #1744, #1746, #1748, #1749, #1750, #1751, #1752, #1754, #1757, #1758, #1759, #1760, #1761, #1763, #1768, #1769, #1770, #1773, #1774, #1776, #1777, #1779, #1781, #1783, #1787, #1788, #1795, #1796, #1797, #1798, #1800, #1801, #1803, #1804, #1811, #1812, #1813, #1815, #1816, #1822, #1825, #1826, #1827, #1829, #1830, #1831, #1832, #1835, #1836, #1837, #1838, #1841, #1844, #1845, #1848, #1851, #1852, #1853, #1854, #1856, #1857, #1858, #1859, #1860, #1861, #1863, #1867, #1869, #1871, #1873, #1876, #1877, #1878, #1880, #1881, #1891, #1892, #1894, #1895, #1897, #1898, #1900, #1905, #1907, #1908, #1911, #1912, #1913, #1915, #1919, #1920, #1921, #1922, #1928, #1930, #1934, #1938, #1939, #1940, #1942, #1943, #1946, #1948, #1949, #1950, #1951, #1952, #1953, #1954, #1955, #1956, #1959, #1960, #1962, #1965, #1966, #1968, #1970, #1971, #1972, #1974, #1976, #1977, #1979, #1981, #1982, #1983, #1985, #1986

Full Changelog: v1.10.2...v2.0.0

Despite the long list of PRs, the high-level interface changes from version 1 to version 2 were kept at a minimum. For the most part, the Awkward 1.x API is fine, but the internal implementation needed an overhaul to prevent technical debt.

The work was done by the Awkward Array developers:

In particular, most of the translation from version 1 to version 2 was the work of @ioanaif, the build/deployment was from @henryiii and @agoose77, the Awkward-RDataFrame bridge and other C++ interface from @ianna, GrowableBuffer/LayoutBuilder from @ManasviGoyal, and the CUDA and JAX foundations were laid by @swishdiff.

Additionally, we had help from:

Summary of changes

Nearly all of the code is written in Python now. Exceptions are the "kernel" functions, GrowableBuffer, LayoutBuilder, ArrayBuilder, AwkwardForth, and dynamically generated C++ code for RDataFrame.

Maintains performance because any algorithms that scale with the size of an array are implemented in compiled "kernel" functions.

Split into two packages: awkward-cpp for the C++ part (infrequently updated, binary distribution for most platforms and Python versions) and awkward, the Python part (frequently updated).

Virtual arrays and Partitions (collectively, "lazy arrays") have been removed in favor of dask-awkward.

Awkward Arrays can be converted to and from RDataFrame, generating C++ for ROOT to JIT-compile so that iteration over Awkward Array input is fast (adapted from the Numba implementation).

Auto-differentiation of functions on Awkward Arrays using JAX. (But not JAX JIT-compilation.)

Suite of header-only C++ that does not depend on Awkward Arrays, but can be used to produce them and quickly get them from C++ to Python. The header-only suite includes GrowableBuffer and LayoutBuilder.

New documentation website (https://awkward-array.org/), based on JupyterBooks, the NumPy/SciPy/Pandas style and organization, as well as a notebook that can be executed in your web browser.

More expressive error-messages, highlighting the ak.* function that was in progress when the error occurred, with its arguments. (That is, highlighting ak.* functions as the granularity of feedback to users of Awkward Array, rather than making you search through the stack trace to the hand-off from your code to ours.)

Brackets are always balanced in the console representation of an array:

>>> ak.Array([
...     [{"x": 1.1, "y": [1]}, {"x": 2.2, "y": [1, 2]}],
...     [],
...     [{"x": 3.3, "y": [1, 2, 3]}],
... ])
<Array [[{x: 1.1, y: [1]}, {...}], ...] type='3 * var * {x: float64, y: var...'>

as opposed to

<Array [[{x: 1.1, y: [1]}, ... y: [1, 2, 3]}]] type='3 * var * {"x": float64, "y...'>

in version 1. Also, show methods for values

[[{x: 1.1, y: [1]}, {x: 2.2, y: [1, 2]}],
 [],
 [{x: 3.3, y: [1, 2, 3]}]]

and types

3 * var * {
    x: float64,
    y: var * int64
}

This extended show output is the default representation in Jupyter.

Round-trip fidelity in ak.to_arrow/ak.from_arrow: no Awkward Array metadata is lost. Same for ak.to_parquet/ak.from_parquet, to the extent that pyarrow can read and write Parquet.

Parquet column selection using wildcards.

Data exported with version 1 ak.to_buffers can be imported by version 2 ak.from_buffers, with custom buffer_keys.

The majority of version 1 tests have been ported to version 2, to ensure that the interface and functionality doesn't change, except where intended (e.g. organizing naming conventions).

Consistent handling of date-time and time-delta types (matches NumPy's system).

Improved ak.to_json/ak.from_json arguments (for converting non-JSON types NaN, infinity, complex numbers) and using a known JSONSchema to accelerate ak.from_json. Removed ambiguities about newline-delimited JSON (requires explicit argument).

The world's fastest Avro file reader in Python, ak.from_avro_file (uses AwkwardForth).

"nan" versions of NumPy functions, such as np.nansum, np.nanmean, np.nanstd.

Renamed ak.to_pandasak.to_dataframe, to clarify distinction from awkward-pandas.

Organized Type and Form objects better, more consistent.

Clear specification of NumPy dtypes that can be used in Awkward Arrays (bool, numbers, including complex, and date-time/time-delta).

Organized naming conventions throughout the codebase, such as keys versus fields versus recordlookup.

Carefully examined the public API (all modules, functions, classes, and methods that don't start with an underscore) to be sure that we can support it going forward. Any removal or change of an interface will require a minor version number increase and a deprecation cycle, on the order of months. (New features and bug-fixes can be immediate, on patch releases.)

Flags and "configuration" function arguments are now keyword-only (order independent).

Started adding Python type hints (nowhere near complete, but started).

Removed the Identities from array nodes. They were never fully implemented—a placeholder for a feature that won't be developed within Awkward Array (SQL-style JOINs).

TypeTracerArray does a "dry run" of a calculation to predict its type at the end. Used to build a computation graph for dask-awkward.

Equivalent but ungainly type combinations, such as "option-type of option-type of X" or "union-type containing union-types," have been outlawed with tools to squash them into a canonical layout. Operations on the data now have fewer possibilities to worry about.

Simplified the semantics of nbytes.

Clarified ak.ravel and ak.flatten's treatments of missing data.

Added missing ArrayBuilder methods in Numba.

Set up framework for performing ak.* operations i...

Read more

Version 2.0.0rc8

09 Dec 16:00
d4fa292
Compare
Choose a tag to compare
Version 2.0.0rc8 Pre-release
Pre-release

This will very likely be the last pre-release before the final 2.0.0 release. If all goes well, that will be six hours after now. ("Now" is 16:00 UTC, December 9, 2022, so the final release will likely be at 22:00 UTC.)

New features

(none!)

Bug-fixes and performance

  • fix!: always broadcast with_field assignments against existing array by @agoose77 in #1962
  • fix: replace axis_wrap_if_negative with maybe_posaxis, simpler and more correct by @jpivarski in #1986

Other

Full Changelog: v2.0.0rc7...v2.0.0rc8

Version 2.0.0rc7

08 Dec 23:04
bbc24fa
Compare
Choose a tag to compare
Version 2.0.0rc7 Pre-release
Pre-release

New features

(none!)

Bug-fixes and performance

Other

  • refactor: rename Form to form_cls by @agoose77 in #1976
  • refactor: hide Content recursion entry points in ak._do submodule. by @jpivarski in #1972
  • docs: Jim's documentation touch-ups (API sidebar, obsolete kernels intro) by @jpivarski in #1982
  • ci: build(deps): bump pypa/gh-action-pypi-publish from 1.5.1 to 1.6.1 by @dependabot in #1948
  • ci: build(deps): bump pypa/cibuildwheel from 2.11.2 to 2.11.3 by @dependabot in #1965
  • ci: build(deps): bump pypa/gh-action-pypi-publish from 1.6.1 to 1.6.4 by @dependabot in #1970
  • chore: fix RTD configuration by @agoose77 in #1979

Full Changelog: v2.0.0rc6...v2.0.0rc7

Version 2.0.0rc6

06 Dec 20:01
17a1bad
Compare
Choose a tag to compare
Version 2.0.0rc6 Pre-release
Pre-release

The main purpose of this release is to fix Uproot.

New features

  • feat: ak.from_rdataframe should accept a single string 'columns'. by @jpivarski in #1956

Bug-fixes and performance

  • fix: __array__ = 'sorted_map' should only be allowed on RecordArrays, not lists. by @jpivarski in #1959
  • fix: unknown type column by @ianna in #1960
  • fix: add IPython as Sphinx extension by @agoose77 in #1966

Other

(none!)

Full Changelog: v2.0.0rc5...v2.0.0rc6

Version 2.0.0rc5

06 Dec 01:36
Compare
Choose a tag to compare
Version 2.0.0rc5 Pre-release
Pre-release

This is one of the last pre-releases before 2.0.0. Most of the focus now is on last-minute API changes; the API can't change without a deprecation cycle after 2.0.0.

New features

Bug-fixes and performance

  • fix: refactor '_nextcarry-outindex' to have the same signature everywhere by @ioanaif in #1911
  • fix: ignore .nox by @ianna in #1912
  • refactor!: make Content initialisers take nplike, parameters as keyword by @agoose77 in #1921
  • fix: backends should have defaults for user-facing operations by @agoose77 in #1940
  • fix: consolidate regular indexing by @agoose77 in #1943
  • fix: IndexedArray.project() preserves parameters. by @jpivarski in #1949
  • fix: preserve strings in ak.ravel by @agoose77 in #1934
  • fix: UnionArray.simplified preserves parameters. by @jpivarski in #1950

Other

Full Changelog: v2.0.0rc4...v2.0.0rc5

Version 2.0.0rc4

19 Nov 19:12
bacbab5
Compare
Choose a tag to compare
Version 2.0.0rc4 Pre-release
Pre-release

This is the first release in which the C++ code has been moved into a separate package, awkward-cpp. The pure Python awkward package is version-locked to version 1 of awkward-cpp.

New features

  • feat: better mask_identity defaults for reducer-like functions. by @jpivarski in #1873
  • feat: np.matmul should raise NotImplementedError until it gets implemented by @ioanaif in #1877
  • feat: more concise pretty-print. by @jpivarski in #1861

Bug-fixes and performance

  • fix: ensure that behaviors are propagated through ak.XXX operations by @agoose77 in #1869
  • fix: try to fix long long to double by @ianna in #1860
  • fix: set appropriate error message for decimal types in arrow by @agoose77 in #1871

Other

New Contributors

Full Changelog: v2.0.0rc3...v2.0.0rc4

Version 1.10.2

08 Nov 17:28
80bbef0
Compare
Choose a tag to compare

New features

  • feat: add RegularArray._reduce_next implementation (backport) by @agoose77 in #1813

Bug-fixes and performance

  • fix: don't assume trailing . for module name in is_XXX_buffer (backport) by @agoose77 in #1746
  • fix: use proper lengths in ByteMaskedArray.mergemany (backport) by @agoose77 in #1750
  • fix: simplify ListOffsetArray_reduce_nonlocal_outstartsstops (backport) by @agoose77 in #1797

Other

  • chore: remove v2 Python highlevel LayoutBuilder from main-v1. by @jpivarski in #1863

Full Changelog: v1.10.1...v1.10.2