-
Notifications
You must be signed in to change notification settings - Fork 35
feat: improve errors for invalid combinations of arguments in vector constructor methods #659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Saransh-cpp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, @ikrommyd! Thanks for working on this!!
Please see my comments below:
| if "pz" in coordinates: | ||
| is_momentum = True | ||
| generic_coordinates["z"] = coordinates.pop("pz") | ||
| if "E" in coordinates: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also have the and "t" not in generic_coordinates condition?
| if "energy" in coordinates and "t" not in generic_coordinates: | ||
| is_momentum = True | ||
| generic_coordinates["t"] = coordinates.pop("energy") | ||
| if "M" in coordinates: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, for tau
| # Validate coordinates using dimension-guard pattern (same as awkward _check_names) | ||
| _validate_numpy_coordinates(names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vector.array is just a wrapper around individual constructors of Vector/MomentumNumpy*D, which can be used to construct vectors (unlike the Awkward backend). Hence, it would be better if we move this check to the __array_finalize__ method of each class:
vector/src/vector/backends/numpy.py
Line 1168 in e374915
| def __array_finalize__(self, obj: typing.Any) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vaguely remember that I had some issue with __array_finalize__ regarding when is it being ran when I was making these edits but I will take a look again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that these tests are related to constructing vectors, we should move them to the appropriate test backend files. The object ones can go in:
vector/tests/backends/test_object.py
Line 14 in e374915
| def test_constructors_2D(): |
The NumPy ones (we don't have much constructor tests for NumPy at the moment, so this is perfect 😅):
vector/tests/backends/test_numpy.py
Line 85 in e374915
| def test_xy(): |
The Awkward ones (same as NumPy):
vector/tests/backends/test_awkward.py
Line 161 in e374915
| def test_basic(backend): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I just thought that maybe it should be a new test because it's like 1K lines but yeah I can move them under their respective backend tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, it might make sense to create a new function under test_issues.py for this then 🤔
|
Thanks @Saransh-cpp for the feedback. I'll tackle it as soon as find a time slot. There is one more case that I'd like to solve which was the original inspiration for this. That's when people do In [12]: x = ak.zip({"pt": events.Electron.pt, "eta": events.Electron.eta, "phi": events.Electron.phi, "mass": events.Electron.mass, "energy": -999, "rho": 1001}, with_name="Momentum4D")
In [13]: x.pt
Out[13]: <Array [[1001], [], [], ..., [1001, 1001], [1001]] type='55342 * var * int64'>
In [14]: x.mass
Out[14]: <Array [[-3.98e+03], [], ..., [...], [-1.53e+03]] type='55342 * var * float64'>People can assign nonsense fields together like |
|
I think @pfackeldey might be able to help with this - can we add constructor checks to our custom Awkward behaviors? |
I'm not sure about the following: If my understanding here is wrong, you can do the following: class MyBehavior:
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# check
if set(self._layout.fields) != {"pt", "eta", "phi", "mass"}:
raise ValueError('got different fields than this behavior expected')However, @ikrommyd and me talked about this and these constructor checks are indeed helpful, but do not prevent to break the behaviors. What would instead prevent this thoroughly would be:
(1.) and (3.) solves this issue entirely I think, (2.) is in addition to make people aware of when they invoke the This is of course major break in API, so I'm not sure if we can/want to do this. I'm adding this here to write down the fix that solves this issue thoroughly. I could add a way in awkward-array to make them truly immutable that you can define through behaviors in vector. And in theory I could also think of a way to make post init constructor checks in awkward arrays possible for behaviors. If you want @ikrommyd & @Saransh-cpp I can prepare a PR and you can have a look? Or what do you think? |
|
Sounds good to me, @pfackeldey! I'll be happy to review the PR in awkward and subsequently propagate the new immutability way to vector 🙂 |
|
Hi all. The subject sounded interesting/perplexing hence I had a read. Indeed good catches, @ikrommyd 👍. Now, there are 2 different matters being mixed together, as it were, it seems to me: The real problem of one being able to create a vector with some nonsensical combination is the main thing (to be) addressed in this PR, which is important to get fixed while providing relevant error messages to users so that they can see the issues in the way they wrote things. All good here and I won't comment on the details as @Saransh-cpp knows them infinitely better than me, The other issue mentioned relates to constructions such as |
|
Hi @eduardo-rodrigues, it's not trivial to catch when you do |
|
You are emphasising of my comments in some way, meaning that it is hard and/or dangerous to couple the 2 packages, but if you have no knowledge, then you can easily get into trouble. That's why I mentioned the possibility of having an explicit coupling using a class rather than a string, so with_name=vector.MomentumObject4D rather than with_name="Momentum4D". At least then somebody subclassing would be on its own, taking responsibility, and for other standard use cases, there would be some checks implemented. But I'm still likely being naive here and am showing how little I know about the internals of awkward, sorry if this is not helpful. |
|
Did a full write-up here of the issue: #660 |
| Raises TypeError if duplicate or conflicting coordinates are detected. | ||
| """ | ||
| complaint1 = "duplicate coordinates (through momentum-aliases): " + ", ".join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a picky comment but it's a bit sub-optimal that we duplicate these "complaint strings" in several submodules. For better maintainability it's probably best to move them to a trival submodule and import the strings in the several places, such as here but also in awkward_constructions.py for example.
|
Linking my proposed solution here as well: #660 (comment) |
I noticed that the vector constructor methods do not raise consistent errors with each other when invalid combinations of arguments are given or duplicate arguments that map to the same coordinate.
For example,
errors but the following does not.
The same case for the awkward constructor does error though
What I tried to do here is add checks for the numpy backend in the same way that take place for the awkward backend.
I also think I found a couple of cases (even for
vector.obj) where I think an error should be raised but it's not.I also noticed that invalid constructors are not tested so I added a big test file that tests such cases.
This is my first time even looking at the vector codebase help is appreciated.