enh!: Respect input order of collection in summary #1364
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a breaking change because the resulting array values' keyword arguments are no longer sorted, which affects the array's depth ordering.
With the current implementation, the depth order is decided based on the input variables to
ds.summary
; this PR will make it so it is decided based on the input order itself. Nothing is inherently wrong with the current implementation, but it makes it cumbersome to define a custom ordering of the output array, and can be hard to wrap around.An example is that the following
ds.summary(val=ds.min("s"), a=ds.where(ds.min("s")))
andds.summary(val=ds.min("s"), z=ds.where(ds.min("s")))
, will currently give different output arrays. Witha
the first layer would bea
and the second beingval
; withz
the first layer would beval
and the second beingz
. In a more real-world use case, you could imagine adding an extra inputds.summary(..., b=...)
, which will not be appended to the end of the dataset but put into the middle.This is hard to see if you do not know it is happening. An example is in HoloViews if you pass a summary as an aggregator. The only reason this can be "easily" seen in the following example is that
a
/z
is anint
layer and will, therefore, show-1
(ints nan value) as light blue.I need to think if there are some consequences that I currently don't see. One thing could be
__hash__
is right now sensitive to input order, even though the data could be the same