Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: consolidate ibis.case(), Value.case(), Value.cases(), Value.substitute() #7280

Closed
1 task done
NickCrews opened this issue Oct 2, 2023 · 9 comments
Closed
1 task done
Labels
feature Features or general enhancements

Comments

@NickCrews
Copy link
Contributor

NickCrews commented Oct 2, 2023

Is your feature request related to a problem?

All 4 functions are basically doing the same thing. It feels to me like we should be able to merge them all into once function with the same API. It is confusing to users (including me!) which they should use, and is extra burden keeping the docstrings etc all correct.

Also, a long-running grossness of .case() is that it returns a CaseBuilder object, which is undocumented. I don't think we want to add this CaseBuilder object to the public API, I think it would be totally doable and much simpler if we could keep it to a single function call.

Describe the solution you'd like

I propose:

  1. removing ibis.case() and Value.case()
  2. keeping Value.cases() and Value.substitute(), and possible adding ibis.cases() (but I'm +0 on this). All of these have an API of
def cases(value: Value, mapping: Mapping[Any, Any] | Iterable[tuple[Any, Any]], default: Any = None):
   ...

(not sure about default vs else_, and if mapping is the best name)

where the keys can be

  • literals in which case they are compared by equality eg "foo" or 5,
  • Values, in which case they are compared by equality, eg value.upper()
  • Deferreds, in which case they are bound to value and evaluated eg _.upper() == _
  • None, ibis.null(), etc. this case currently isn't handled at all, this should be supported, even if we don't do the rest of these things.
  • unary functions that take value and return one of the above eg lambda val: val.upper() == "FOO" or lambda val: val.upper()

The values can be

  • literals
  • deferred projections eg _.lower()
  • unary functions that take value and return one of the above eg lambda val: val.lower()

I'm trying to think if there is some ambiguity if we accept both Values and BooleanValues similar to the ambiguity with selection vs filtering in t[_.x.isnull()], but I can't think of it.

I want to support the usecase of chained comparing one column to another:

(t.x == t.y).ifelse("same", (t.x > t.y).ifelse("greater", "less"))

I think this would be done with

t.x.case({lambda x: x == t.y: "same", lambda x: x > t.y: "greater"}, default="less")

but this feels a little gross. Makes me think either a top level or a table base API would be better:

t.cases({_.x == _.y: "same", _.x > _.y: "greater"}, default="less")
ibis.cases(t, {_.x == _.y: "same", _.x > _.y: "greater"}, default="less")

I would want to make this

  • breaking for ibis.case() and Value.case() (obviously, thye are getting removed)
  • non-breaking for Value.cases() and Value.substitute() (except for changing the name of that argument perhaps?)

What version of ibis are you running?

main

What backend(s) are you using, if any?

all

Code of Conduct

  • I agree to follow this project's Code of Conduct
@NickCrews NickCrews added the feature Features or general enhancements label Oct 2, 2023
@cpcloud
Copy link
Member

cpcloud commented Oct 3, 2023

+1 to consolidating cases and substitute.

removing ibis.case() and Value.case()

I don't think we should remove these. They are extremely familiar for users coming from SQL. We can likely consolidate all the various APIs under the Case ops though.

Also, a long-running grossness of .case() is that it returns a CaseBuilder object

The builder pattern is probably going to become more used in the codebase, especially where there's state that needs to be tracked. For example, I'm working on a way to better support chaining joins that uses this pattern. We're also thinking about how this might help clean up the GroupedTable object.

Agree that we should document any of these that are user-facing.

@jcrist
Copy link
Member

jcrist commented Oct 3, 2023

I actually kinda like the idea of replacing the case builders with cases operations instead.

Say we had a top-level ibis.cases function to replace the ibis.case() builder with the following implementation:

def cases(*branches: tuple[ir.BooleanValue, ir.Value], default=None):
    cases, results = zip(*branches)
    return ops.SearchedCase(cases, results, default).to_expr()

Compare the following examples:

t = ibis.table({"sym": "str", "left": "int", "right": "int"})

# Existing `ibis.case`
expr1 = t.mutate(
    result=(
        ibis.case()
        .when(_.sym == "+", _.left + _.right)
        .when(_.sym == "-", _.left - _.right)
        .when(_.sym == "*", _.left * _.right)
        .when(_.sym == "/", _.left / _.right)
        .else_(0)
        .end()
    )
)

# New `ibis.cases`
expr2 = t.mutate(
    result=ibis.cases(
        (_.sym == "+", _.left + _.right),
        (_.sym == "-", _.left - _.right),
        (_.sym == "*", _.left * _.right),
        (_.sym == "/", _.left / _.right),
        default=0,
    )
)

I personally like the ibis.cases example and implementation better:

  • The implementation is much simpler. No builder object to keep track of, no constructing a new builder every .when call. It's just a function that returns an op.to_expr(), same as most of ibis.
  • From a documentation perspective, documenting a single function is easier than documenting a function that returns a builder as well as the builder itself. Fewer conceptual things for a user to wrap their head around.
  • Since it's not a builder, there's no .end() method the user needs to remember to call. Currently if you forget to call .end() you get a pretty confusing error message.
  • I personally find the .cases() example easier to read. There's less visual noise without all the .when( and .else_( calls going on.

IMO I think we should deprecate the builders in favor of ibis.cases and Value.cases.

@cpcloud
Copy link
Member

cpcloud commented Oct 3, 2023

cases is probably similar enough to case that coming from SQL wouldn't be a huge lift +1

@NickCrews
Copy link
Contributor Author

NickCrews commented Oct 3, 2023

What do you both think about my proposal of supporting both boolean conditions and equality values?

Passing single values already works with Value.cases:

t.x.cases([
   (t.y, "equal to y"),
   (t.z, "equal to z"),
])

vs the more precise/verbose

t.x.cases([
   (_ == t.y, "equal to y"),
   (_ == t.z, "equal to z"),
])

Should this feature get propagated, or should we deprecate and drop it?

Similarly, what about passing in lambdas for the predicates?

Similarly, what about supporting Mapping in addition to the exisiting Iterable[tuple]?

@jcrist
Copy link
Member

jcrist commented Oct 3, 2023

Should this feature get propagated, or should we deprecate and drop it?

I think that makes sense for Value.cases (where there's a concrete thing to compare against) but not for ibis.cases which is intended to be used inside a larger expression like Table.mutate. I definitely don't think any api should special case on boolean expressions, where if it isn't a bool already it's compared against a thing - that sounds like madness.

Similarly, what about passing in lambdas for the predicates?

Seems fine to me, although given how good deferred expressions have gotten I'm tempted to avoid introducing lambda support into new apis. 🤷 could go either way here.

Similarly, what about supporting Mapping in addition to the exisiting Iterable[tuple]?

Also seems fine. Only argument I'd have to avoid it is it's a bit weird to type/document/support variadic *branches as well as mapping/iterable-of-tuples. I think if we supported a mapping we should maybe ditch the variadic case? Something like:

def cases(branches: Mapping[Any, Any] | Iterable[tuple[Any, Any]], *, default: Any = None):
    ...

Again, no strong thoughts 🤷.

@NickCrews
Copy link
Contributor Author

NickCrews commented Oct 3, 2023

I think that makes sense for Value.cases (where there's a concrete thing to compare against) but not for ibis.cases

Ok thanks for pointing that out, I felt there was something assymmetrical there but I couldn't put my finger on it. Annoying can't-have-it-all here: I would really like both flavors to have the exact same API, but at the same time I also want it to be a drop-in replacement for .substitute(), which implies that we need to support the non-boolean keys...Not sure how to resolve this.

that sounds like madness

I had this vague intuition there was some madness here, but I haven't been able to come up with the actual case. do you have an example that you want to avoid?

lambdas

I think this can get added as a followup without breaking anything, so maybe we leave it out for now.

I think if we supported a mapping we should maybe ditch the variadic case?

Oh I didn't think of a variadic *branches API! I like that better. I would vote for just that, and not supporting mappings. I also just remembered this other issue I filed, where Mappings come with the problem that keys need to be hashable, so you can't use plenty of Ibis types such as Deferreds as keys.

@contang0
Copy link

will .case continue to work for a while after .cases is implemented? from user perspective I would like to have a bit of time to migrate the code to .cases

@jcrist
Copy link
Member

jcrist commented Oct 11, 2023

yes, any work done here will have a deprecation period.

NickCrews added a commit that referenced this issue Apr 24, 2024
NickCrews added a commit that referenced this issue Apr 24, 2024
NickCrews added a commit that referenced this issue Apr 24, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue May 1, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue May 1, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue May 1, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue May 1, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue May 1, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue May 1, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue May 2, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue May 2, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue May 2, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue May 7, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue May 7, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue May 7, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue May 8, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue Jul 12, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue Jul 12, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue Jul 24, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue Jul 24, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue Sep 11, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue Sep 11, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue Sep 11, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue Oct 7, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue Oct 7, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue Oct 8, 2024
NickCrews added a commit to NickCrews/ibis that referenced this issue Oct 8, 2024
cpcloud pushed a commit to NickCrews/ibis that referenced this issue Oct 9, 2024
cpcloud pushed a commit to NickCrews/ibis that referenced this issue Oct 9, 2024
cpcloud pushed a commit to NickCrews/ibis that referenced this issue Oct 9, 2024
cpcloud pushed a commit to NickCrews/ibis that referenced this issue Oct 9, 2024
cpcloud pushed a commit to NickCrews/ibis that referenced this issue Oct 9, 2024
cpcloud pushed a commit to NickCrews/ibis that referenced this issue Oct 9, 2024
@NickCrews
Copy link
Contributor Author

This was finally addressed in #9096

If there are further changes desired, please start a new issue instead of reopening this one (since the "current behavior" is now different from what it was when I originally opened this issue)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
Status: done
4 participants