Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: named axis for ak.Array #3238

Merged
merged 57 commits into from
Oct 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
eb5c188
start implementing named axis for awkward array
pfackeldey Sep 12, 2024
3978f43
style: pre-commit fixes
pre-commit-ci[bot] Sep 12, 2024
556362e
add support for named axis for first batch of highlevel functions
pfackeldey Sep 12, 2024
16a119b
formatting
pfackeldey Sep 12, 2024
c96f61e
next batch of high-level functions
pfackeldey Sep 13, 2024
7f7db85
fix type hints & safer control flow when checking for named axis
pfackeldey Sep 13, 2024
572051e
style: pre-commit fixes
pre-commit-ci[bot] Sep 13, 2024
e57e2c3
(hopefully) fix old (<3.10) python type annotation syntax
pfackeldey Sep 13, 2024
e338332
(hopefully) fix old (<3.10) python type annotation syntax
pfackeldey Sep 13, 2024
a07adfa
next batch of highlevel functions
pfackeldey Sep 16, 2024
4c51e6b
next batch of highlevel functions
pfackeldey Sep 17, 2024
d05517a
style: pre-commit fixes
pre-commit-ci[bot] Sep 17, 2024
a705981
update named axis implementation to not use tuples at all; start inde…
pfackeldey Sep 19, 2024
1ded71b
add named axis propagation for binary ops, some highlevel ops, and fi…
pfackeldey Sep 20, 2024
7158d07
Merge remote-tracking branch 'upstream/main' into feat/named_axes
pfackeldey Sep 20, 2024
00669a3
fix keepdims in ak.covar & ak.corr; properly propagate named axis in …
pfackeldey Sep 20, 2024
bb97999
add ak.std & ak.var; fix bug in indexing where == comparisons against…
pfackeldey Sep 20, 2024
8720a05
add ak.(arg)combinations and ak.(arg)cartesian; make named axis compa…
pfackeldey Sep 23, 2024
1af4376
avoid touching shape too much for purelist_depth, minmax_depth, and b…
pfackeldey Sep 23, 2024
01b459c
ak.without_named_axis: allow ak.Records
pfackeldey Sep 23, 2024
c970d06
ak.with_named_axis: add check to validate the given named axis mapping
pfackeldey Sep 23, 2024
f5f9495
fix doc strings and remove obsolete functions in _namedaxis.py module
pfackeldey Sep 23, 2024
51bc6d6
update Slicer doc string
pfackeldey Sep 23, 2024
3bb8efa
docs: add documentation page for named axes
pfackeldey Sep 25, 2024
5774280
Merge branch 'main' into feat/named_axes
pfackeldey Sep 27, 2024
a63c4c3
propagate named axis through broadcasting; add more highlevel ops (th…
pfackeldey Oct 1, 2024
f2febd9
add test for ak.where with named axis
pfackeldey Oct 1, 2024
f00aac5
fix test using pyarrow
pfackeldey Oct 1, 2024
36fd4f5
add test case for ak.broadcast_fields
pfackeldey Oct 1, 2024
2c103ef
streamline code
pfackeldey Oct 2, 2024
0d81859
add named axis to constructor, repr and .show(...) of highlevel ak.Ar…
pfackeldey Oct 2, 2024
7affb94
satisfy pylint
pfackeldey Oct 2, 2024
050579b
improve docs, comments, and add named/positional axis property to Rec…
pfackeldey Oct 3, 2024
96fe89a
remove ak.Slicer as numpy provides np.s_ and is a strict dependency
pfackeldey Oct 3, 2024
247c1e1
fix docs
pfackeldey Oct 3, 2024
fd3c166
Merge branch 'main' into feat/named_axes
pfackeldey Oct 3, 2024
c82a3d9
Merge branch 'main' into feat/named_axes
pfackeldey Oct 3, 2024
13700d8
mark right-broadcasting test with xfail for windows 32-bit
pfackeldey Oct 3, 2024
d3dc258
add tests & proper support for negative named axes
pfackeldey Oct 7, 2024
c469000
satisfy pylint
pfackeldey Oct 7, 2024
6c376f0
Merge branch 'main' into feat/named_axes
pfackeldey Oct 7, 2024
a2f6fcb
make xfail marker not strict
pfackeldey Oct 7, 2024
72565a9
chore: enable mypy on namedaxis
agoose77 Oct 8, 2024
e0fa11b
fix: use attribute name in type hints
agoose77 Oct 8, 2024
74f6950
refactor: rename _NamedAxisKey
agoose77 Oct 8, 2024
edc8290
fix: define NAMED_AXIS_KEY as a literal
agoose77 Oct 8, 2024
197de34
fix: appease mypy
agoose77 Oct 8, 2024
053f018
highlevel getitem: less (un)wrapping
pfackeldey Oct 8, 2024
4bbd5b1
highlevel: improve repr and .show for named_axis
pfackeldey Oct 8, 2024
f515ea7
named_axis: fix typo and avoid dictionary copies where possible
pfackeldey Oct 8, 2024
413c86c
named_axis: improve doc string of _prettify_named_axes
pfackeldey Oct 8, 2024
e3fc685
highlevel: fix instance check for attrs in __init__
pfackeldey Oct 9, 2024
894d1d0
add named_axis to jupyter repr
pfackeldey Oct 9, 2024
fb001b4
fix docs
pfackeldey Oct 9, 2024
df924c0
connect _neg2pos_axis with maybe_posaxis
pfackeldey Oct 9, 2024
595b240
named_axis: simplify type hint for named axis in attrs mapping
pfackeldey Oct 9, 2024
83c0aa6
regularize axis makes sure now that its type is either int or None
pfackeldey Oct 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
studies/**/sample-*
studies/named_axis.*
docs/demos/countries.geojson
docs/demos/test-program
docs/demos/test-program.cpp
Expand Down
12 changes: 9 additions & 3 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,11 @@ title: "Awkward Array"
defaults:
titlesonly: True


subtrees:
- entries:
- file: getting-started/index
subtrees:
- entries:
- entries:
- file: getting-started/what-is-an-awkward-array
- file: getting-started/10-minutes-to-awkward-array
- file: getting-started/uproot-awkward-columnar-hats
Expand All @@ -18,7 +17,7 @@ subtrees:
- file: getting-started/papers-and-talks
- file: user-guide/index
subtrees:
- entries:
- entries:
- file: user-guide/how-to-convert
title: "Converting arrays"
subtrees:
Expand Down Expand Up @@ -74,6 +73,13 @@ subtrees:
- file: user-guide/how-to-examine-checking-validity
title: "Checking validity"

- file: user-guide/how-to-array-properties
title: "Array properties"
subtrees:
- entries:
- file: user-guide/how-to-array-properties-named-axis
title: "Named axes"

- file: user-guide/how-to-math
title: "Numerical math"
subtrees:
Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@
html_js_files = ["js/awkward.js"]

# MyST settings
myst_enable_extensions = ["colon_fence"]
myst_enable_extensions = ["colon_fence", "deflist"]

nb_execution_mode = "cache"
nb_execution_raise_on_error = True
Expand Down
304 changes: 304 additions & 0 deletions docs/user-guide/how-to-array-properties-named-axis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,304 @@
---
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.14.1
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
---

Named axes
==========

Named axes are a feature in Awkward Array that allows you to give names to the axes of an array.
This can be useful for documentation, debugging, and for writing code that is more robust to changes in the structure of the data.
As argumented at [PyHEP.dev 2023](https://indico.cern.ch/event/1234156/) and by the Harvard NLP group in their ["Tensor Considered Harmful"](https://nlp.seas.harvard.edu/NamedTensor.html) write-up, named axes can be a powerful tool to make code more readable and less error-prone.

Awkward array ensures that named axes are properly propagated to the result.
All highlevel, indexing, and broadcasting operations in awkward array support named axes.

Other libraries that support named axes include:
- [hist](https://hist.readthedocs.io/en/latest/)
- [haliax](https://github.com/stanford-crfm/haliax)
- [Tensor Considered Harmful](https://nlp.seas.harvard.edu/NamedTensor.html)
- [PyTorch Named Tensors](https://pytorch.org/docs/stable/name_inference.html#name-inference-reference-doc)
- [Penzai Named Axis](https://penzai.readthedocs.io/en/stable/notebooks/named_axes.html)
- [xarray Named Axis](https://docs.xarray.dev/en/stable/user-guide/indexing.html#)

Named axes in Awkward Array are inspired primarily by `hist` and `PyTorch Named Tensors`.

+++

How to (de-)attach named axes?
-------------------------

Named axes can be attached to an array using the high-level {func}`ak.with_named_axis` function.
Awkward Array allows strings as named axes and integers as positional axes.

The `named_axis` argument of {func}`ak.with_named_axis` accepts either a `tuple` or `dict`:
- `tuple`:
- `named axis`: item
- `positional axis`: index of the item
- _additional_: `None` represents a wildcard for not specifying a name, e.g.: `("x", None)` means that the first axis is named "x" and the second is not named.
- `dict`:
- `named axis`: key
- `positional axis`: value
- _additional_: not specifying a name is not allowed, e.g.: `{"x": 0}` means that the first axis is named "x", all other existing dimensions are unnamed. The `dict` option also allows for renaming negative axes, e.g.: `{"x": -1}` means that the last axis is named "x".


```{code-cell}
import awkward as ak
import numpy as np
```

The axis names of an array can be attached through the constructor:
```{code-cell}
named_array = ak.Array([[1, 2], [3], [], [4, 5, 6]], named_axis=("x", "y"))
# or
named_array = ak.Array([[1, 2], [3], [], [4, 5, 6]], named_axis={"x": 0, "y": 1})
```

... or through `ak.with_named_axis`:
```{code-cell}
array = ak.Array([[1, 2], [3], [], [4, 5, 6]])
named_array = ak.with_named_axis(array, named_axis=("x", "y"))
# or
named_array = ak.with_named_axis(array, named_axis={"x": 0, "y": 1})
```

After attaching named axes, you can see the named axes comma-separated in the arrays representation and in `.show(named_axis=True)`:

```{code-cell}
ak.Array([[1, 2], [3], [], [4, 5, 6]], named_axis=("x", "y"))
```

```{code-cell}
ak.Array([[1, 2], [3], [], [4, 5, 6]], named_axis=("x", "y")).show(named_axis=True)
```

Accessing the named axis mapping to positional axis can be done using the `named_axis` and `positional_axis` properties:

```{code-cell}
named_array.named_axis
```

```{code-cell}
named_array.positional_axis
```

If you want to remove the named axes from an array, you can use the {func}`ak.without_named_axis` function:

```{code-cell}
array = ak.without_named_axis(named_array)
array.named_axis
```


Indexing with Named Axes
------------------------

Named axes can be used for indexing operations.
This is enabled throuhg a special syntax that allows you to index with a dictionary where keys refer to named (or positional) axes and the values to the slice or index.

Simple examples:

```{code-cell}
array = ak.Array([[[1, 2]], [[3]], [[4]], [[5, 6], [7]]])
named_array = ak.with_named_axis(array, named_axis=("x", "y", "z"))

# named axes
named_array[{"x": 0}] # array[0, :, :]
named_array[{"z": 0}] # array[:, :, 0]

named_array[{"x": 0, "y": 0}] # array[0, 0, :]
named_array[{"x": slice(0, 1), "y": 0}] # array[0:1, 0, :]

named_array[named_array > 3] # array[array > 3]


# positional axes
named_array[{0: 0}] # array[0, :, :]
named_array[{2: 0}] # array[:, :, 0]

named_array[{-3: 0}] # array[0, :, :]
named_array[{-1: 0}] # array[:, :, 0]
None
```

If multiple keys that point to the same positional axis are used, the last key will be used and all others will be ignored:

```{code-cell}
array = ak.Array([[[1, 2]], [[3]], [[4]], [[5, 6], [7]]])
named_array = ak.with_named_axis(array, named_axis=("x", "y", "z"))

assert ak.all(named_array[{0: 0, "x": slice(0, 2)}] == named_array[0:2])
assert ak.all(named_array[{"x": slice(0, 2), 0: 0}] == named_array[0])
```


More detailed example:

```{code-cell}
# create a Record Array that represents four events with a variable number of jets
events = ak.zip({
"event_no": np.arange(4),
"jetpt": ak.Array([[50, 60], [45], [], [80, 30, 50]]),
})
named_events = ak.with_named_axis(events, ("events", "jets"))

print("classic indexing:", named_events[0, 0:1])
print("named indexing :", named_events[{"events": 0, "jets": slice(0, 1)}])
```

For syntatic suger, use `np.s_` to define slices more easily:

```{code-cell}
array = ak.Array([[[1, 2]], [[3]], [[4]], [[5, 6], [7]]])
named_array = ak.with_named_axis(array, named_axis=("x", "y", "z"))

assert ak.all(named_array[{"x": np.s_[0:2]}] == named_array[{"x": slice(0, 2)}])
```

Highlevel Operations with Named Axes
------------------------------------

Named axes can be used for specifying the axis of a highlevel operation given that the operation is performed on an array that supports this named axis.

For example, the `ak.sum` operation can be performed on an array with named axes:

```{code-cell}
array = ak.Array([[[1, 2]], [[3]], [[4]], [[5, 6], [7]]])
named_array = ak.with_named_axis(array, named_axis=("x", "y", "z"))

print("Sum over axis 'x':", ak.sum(named_array, axis="x")) # ak.sum(array, axis=0)
print("Sum over axis 'y':", ak.sum(named_array, axis="y")) # ak.sum(array, axis=1)
print("Sum over axis 'z':", ak.sum(named_array, axis="z")) # ak.sum(array, axis=2)
```


Named Axes Propagation Strategies
---------------------------------


Named axes are propagated through all operations in Awkward Array.
For this, specific strategies are defined for each operation to ensure that the named axes are properly propagated to the result.

The possible strategies are:
- `keep all`: keep all named axes
- `keep one`: keep one named axis
- `keep up to`: keep all named axes up to a certain positional axis
- `remove all`: remove all named axis
- `remove one`: remove one named axis
- `add one`: add a new axis
- `unify`: unify named axes of two arrays. The named axes are unifiable if the have the same name (or `None`) and point to the same positional axis.

Indexing operations
: The following table shows the strategy for indexing operations:

| Operation | Strategy |
|----------------------|--------------|
| `array[:]` | `keep all` |
| `array[...]` | `keep all` |
| `array[()]` | `keep all` |
| `array[0:1]` | `keep all` |
| `array[[0, 1]]` | `keep all` |
| `array[array % 2]` | `keep all` |
| `array[0]` | `remove one` |
| `array[np.array(0)]` | `remove one` |
| `array[None]` | `add one` |
| `array[np.newaxis]` | `add one` |

Universal functions (`ufuncs`)
: `ufuncs` with single argument signatures (i.e. unary operations, such as `__abs__`, `__neg__`, `__invert__`, ...) do not modify named axes (strategy: `keep all`).
: `ufuncs` with two argument signatures (i.e. binary operations, such as `__add__`, `__sub__`, `__mul__`, ...) try to merge named axis of the given arrays (strategy: `unify`).
This means that the named axes of the two arrays are merged if they have the same name (or either is `None`) and point to the same positional axis.
If there's a mismatch of named axes, e.g., the same named axis has different names or point to different positional axes, an exception is raised.

```{code-cell}
array = ak.Array([[1, 2], [3], [], [4, 5, 6]])
named_array = ak.with_named_axis(array, named_axis=("x", "y"))

# unary operations with named axes
assert (-named_array).named_axis == {"x": 0, "y": 1}
assert (+named_array).named_axis == {"x": 0, "y": 1}
assert (~named_array).named_axis == {"x": 0, "y": 1}
assert abs(named_array).named_axis == {"x": 0, "y": 1}

# binary operations with named axes
named_array1 = ak.with_named_axis(array, named_axis=(None, "y"))
named_array2 = ak.with_named_axis(array, named_axis=("x", None))
named_array3 = ak.with_named_axis(array, named_axis=("x", "y"))

assert (array + array).named_axis == {}
assert (named_array1 + array).named_axis == {"y": 1}
assert (named_array2 + array).named_axis == {"x": 0}
assert (named_array3 + array).named_axis == {"x": 0, "y": 1}

assert (named_array1 + named_array2).named_axis == {"x": 0, "y": 1}
assert (named_array3 + named_array3).named_axis == {"x": 0, "y": 1}
```

Reducers (`ak.sum`, `ak.any`, ...)
: If `axis=int` and `keepdims=False` (typical use-case) removes the named axis that is reduced (strategy: `remove one`).
: If `keepdims=True` is set, the named axis is kept (strategy: `keep all`).
: If `axis=None` is set, all named axes are removed (strategy: `remove all`).

```{code-cell}
array = ak.Array([[1, 2], [3], [], [4, 5, 6]])
named_array = ak.with_named_axis(array, ("x", "y"))

assert ak.sum(named_array, axis="x", keepdims=False).named_axis == {"y": 0}
assert ak.sum(named_array, axis="x", keepdims=True).named_axis == {"x": 0, "y": 1}
```

---
A full list of operations and their strategies can be found in the following table.
If an operation is not listed, the strategy is either `keep all` or automatically inferred from the below listed operations.


| Operation | Strategy |
|-----------------------------------------------------|--------------------|
| `ak.all(..., axis=None)` | `remove all` |
| `ak.all(..., axis=int, keepdims=False)` | `remove one` |
| `ak.all(..., axis=int, keepdims=True)` | `keep all` |
| `ak.any(..., axis=None)` | `remove all` |
| `ak.any(..., axis=int, keepdims=False)` | `remove one` |
| `ak.any(..., axis=int, keepdims=True)` | `keep all` |
| `ak.[arg]cartesian` | `unify` |
| `ak.[arg]combinations` | `keep all` |
| `ak.[arg]max(..., axis=None)` | `remove all` |
| `ak.[arg]max(..., axis=int, keepdims=False)` | `remove one` |
| `ak.[arg]max(..., axis=int, keepdims=True)` | `keep all` |
| `ak.[arg]min(..., axis=None)` | `remove all` |
| `ak.[arg]min(..., axis=int, keepdims=False)` | `remove one` |
| `ak.[arg]min(..., axis=int, keepdims=True)` | `keep all` |
| `ak.[arg]sort` | `keep all` |
| `ak.broadcast_arrays` | `unify`, `add one` |
| `ak.broadcast_fields` | `unify`, `add one` |
| `ak.categories` | `remove all` |
| `ak.concatenate` | `unify` |
| `ak.count[_nonzero](..., axis=None)` | `remove all` |
| `ak.count[_nonzero](..., axis=int, keepdims=False)` | `remove one` |
| `ak.count[_nonzero](..., axis=int, keepdims=True)` | `keep all` |
| `ak.firsts` | `remove one` |
| `ak.flatten(..., axis=None)` | `remove all` |
| `ak.flatten(..., axis=0)` | `keep all` |
| `ak.flatten(..., axis=(!=0), keepdims=True)` | `remove one` |
| `ak.local_index` | `keep up to` |
| `ak.num` | `keep one` |
| `ak.prod(..., axis=None)` | `remove all` |
| `ak.prod(..., axis=int, keepdims=False)` | `remove one` |
| `ak.prod(..., axis=int, keepdims=True)` | `keep all` |
| `ak.ravel` | `remove all` |
| `ak.singletons` | `add one` |
| `ak.sum(..., axis=None)` | `remove all` |
| `ak.sum(..., axis=int, keepdims=False)` | `remove one` |
| `ak.sum(..., axis=int, keepdims=True)` | `keep all` |
| `ak.unflatten` | `remove all` |
| `ak.where` | `unify`, `add one` |
| `ak.with_field` | `unify`, `add one` |
| `ak.zip` | `unify`, `add one` |
pfackeldey marked this conversation as resolved.
Show resolved Hide resolved
23 changes: 23 additions & 0 deletions docs/user-guide/how-to-array-properties.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.10.3
kernelspec:
display_name: Python 3
language: python
name: python3
---

Array properties
================

The user guide is a collection of "how to..." guides for common tasks. See the left side-bar (or bring it into view by clicking on the upper-left `≡`) to access the guides, grouped by topic.

If you're looking for documentation on a specific function, see the API reference instead.

You can test any examples in a new window/tab by clicking on [![Try It! ⭷](https://img.shields.io/badge/-Try%20It%21%20%E2%86%97-orange?style=for-the-badge)](https://awkward-array.org/doc/main/_static/try-it.html).

<br><br><br><br><br>
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,7 @@ ignore_missing_imports = true
[[tool.mypy.overrides]]
module = [
'awkward._nplikes.*',
'awkward._namedaxis',
'awkward._behavior.*',
'awkward._backends.*',
'awkward._meta.*',
Expand Down
1 change: 1 addition & 0 deletions src/awkward/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
import awkward._errors
import awkward._lookup
import awkward._ext # strictly for unpickling from Awkward 1
import awkward._namedaxis

# third-party connectors
import awkward._connect.numpy
Expand Down
Loading
Loading