Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't concatenate arrays of records with no fields #3258

Open
jpivarski opened this issue Sep 26, 2024 · 2 comments
Open

Can't concatenate arrays of records with no fields #3258

jpivarski opened this issue Sep 26, 2024 · 2 comments
Labels
bug The problem described is something that must be fixed

Comments

@jpivarski
Copy link
Member

Version of Awkward Array

HEAD

Description and code to reproduce

This line:

first_record = next(c for c in contents if c.is_record)

doesn't handle the possibility that the RecordArray has no fields (len(contents) == 0).

Very, very few functions actually broadcast through fields; the ufuncs raise an error if they haven't been overloaded (and if they have been overloaded, the use the overload instead), ak.broadcast_arrays stops at records by a deliberate decision:

x.purelist_depth == 1 and not (x.is_option or x.is_indexed) for x in inputs

In our test suite, the only examples that reach the record-broadcasting code (broadcast_any_record, past the part that errors-out when options["allow_records"] is false) are ak.transform and ak.concatenate.

I tried to trigger this error and got a different one:

ak.concatenate([ak.Array([{}, {}, {}]), ak.Array([{}, {}])], axis=0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jpivarski/irishep/awkward/src/awkward/_dispatch.py", line 64, in dispatch
    next(gen_or_result)
  File "/home/jpivarski/irishep/awkward/src/awkward/operations/ak_concatenate.py", line 64, in concatenate
    return _impl(arrays, axis, mergebool, highlevel, behavior, attrs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpivarski/irishep/awkward/src/awkward/operations/ak_concatenate.py", line 160, in _impl
    contents = [ak._do.mergemany(b) for b in batches]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpivarski/irishep/awkward/src/awkward/operations/ak_concatenate.py", line 160, in <listcomp>
    contents = [ak._do.mergemany(b) for b in batches]
                ^^^^^^^^^^^^^^^^^^^
  File "/home/jpivarski/irishep/awkward/src/awkward/_do.py", line 218, in mergemany
    return contents[0]._mergemany(contents[1:])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpivarski/irishep/awkward/src/awkward/contents/recordarray.py", line 760, in _mergemany
    next = RecordArray(
           ^^^^^^^^^^^^
  File "/home/jpivarski/irishep/awkward/src/awkward/contents/recordarray.py", line 162, in __init__
    raise TypeError(
TypeError: RecordArray 'length' must be a non-negative integer or None, not awkward._util.UNSET

This error occurred while calling

    ak.concatenate(
        [<Array [{}, {}, {}] type='3 * {}'>, <Array [{}, {}] type='2 * {}'>]
        axis = 0
    )

It's not a high priority because it's such a weird case, but this really should result in

ak.Array([{}, {}, {}, {}, {}], type='5 * {}'>
@jpivarski jpivarski added bug (unverified) The problem described would be a bug, but needs to be triaged bug The problem described is something that must be fixed and removed bug (unverified) The problem described would be a bug, but needs to be triaged labels Sep 26, 2024
@agoose77
Copy link
Collaborator

The line linked above should never fail; it finds the first record array content in the broadcasted contents?

@jpivarski
Copy link
Member Author

Ah, that's right! I thought that contents was the contents of one of the RecordArrays:

>>> no_fields_record = ak.Array([{}, {}, {}, {}, {}])
>>> no_fields_record.layout
<RecordArray is_tuple='false' len='5'>
</RecordArray>
>>> no_fields_record.layout.contents
[]
>>> next(c for c in no_fields_record.layout.contents)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

But this contents is the subset of inputs that are Content instances (which enters this function as a closure):

contents = [x for x in inputs if isinstance(x, Content)]

Okay, this line should not fail. But in trying to make this line fail, something else went wrong in recordarray.py line 760 _mergemany.

So you're right, it was not the error I was looking for, and through some coincidence, I found a different error.

No-field records are not sufficiently tested, perhaps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
None yet
Development

No branches or pull requests

2 participants