-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC Perfect backwards compatibility policy #259
Comments
That can totally work for me, but I am not 100% sure if my tools would really suffer that much from the problems that you describe specifically. A lot of my transformations are stateless things. |
I think it might still affect even stateless things - suppose for example that Polars renamed If, however, you pinned your Narwhals API version, then we'd take care of the renamed method within Narwhals, and you code would just keep working without you needing to update |
From a narwhals user perspective I think this is ideal scenario, leading to minimal/no changes. I am mostly thinking and concerned about all the issues with narwhals internals rather than the benefits of this multiple polars api version support. Here a few that comes to mind:
On a similar consideration, how are pandas-like changes dealt with internally to be mapped to the same result? Is there any such situation? |
Yeah we already have to do this for some older pandas versions: narwhals/narwhals/_pandas_like/group_by.py Lines 164 to 167 in fcd7292
Most of the Polars renamings deprecations happened for methods a bit on the fringe rather than the fundamental ones - but still, even then, we have to do deal with it sometimes: narwhals/narwhals/dataframe.py Lines 93 to 102 in 509c370
true, but I think we can manage
glad to read this 😄 going to see how hard it is to make this happen then, because API changes in pandas/polars do seem to be a major source of headaches for people trying to build tools on top of them note on the |
Fair! But still, the proposed solution sounds totally fine to me. The only awkward thing that comes to mind is if there is a conflicting dependency. Say, scikit-playtime uses v20 and scikit-lego uses v21. I don't know how realistic this scenario is, but I might imagine a headacke happening here. I don't think there's a way to have two narwhals versions in a single venv. |
I think that could still be fine - for code called from scikit-playtime, Narwhals would use the v20 API, and for code called from scikit-lego, it would use the v21 API (side-note, and I know it's not the point of your comment, but there won't be a v0.20, because the next Polars version is 1.0! 🥳 🍾 🌶️ )
After having tried this out, I'm not even sure it's possible, at least not without extra perf overhead... an alternative would be to let people specify the api version when they call df = nw.from_native(df_any, api_version='0.20')
answer = df.with_columns(nw.col(col).shift(-lag).alias(col + str(lag)) for col in cols for lag in lags)
return nw.to_native(answer) 🤔 not very user-friendly to have to pass Another suggestion: consumers could do: from narwhals import StableNarwhals
nw = StableNarwhals('0.20.0')
def func_1(df_any):
df = nw.from_native(df_any)
df = df.group_by(...).agg(...)
return nw.to_native(df)
def func_2(df_any):
df = nw.from_native(df_any)
df = df.with_column(...)
return nw.to_native(df) Like this, they can choose to only define their Narwhals API version once, and then they reuse the I think this addresses all the issues noted above? Well, other than the maintenance complexity in Narwhals, but that's exactly what this projects exists to address 🤠 |
Just jumping in the discussion ;). Does having the info only in the
So here the solution is more internal to the source code and thus more explicit. In scikit-learn, it makes me think about the with nw.config_context(api_version="0.20"):
def func_1(df_any):
... or without a context manager: nw.set_config(api_version="0.20")
def func_1(df_any):
... I can imagine a third-party library to call the |
Thanks @glemaitre for your input! Agree that But, I think we're getting there 💪 :
I'm liking the look for this more-and-more.. BTW, if anyone here's interested, we have our first Narwhals community call today #264 📆 |
I'm late to the party! I basically agree with Guillaume. While it's cool to have the |
Thanks @baggiponte ! I also like the look of just putting nw.set_config(api_version='0.20') in It needs to be possible that:
If both |
That's something I haven't considered! @glemaitre how does it work with sklearn? |
We don't implement such thing so I did not think about the problem :) |
So somehow, you want to register the But devil is in the details, maybe implementing this is just to much of a spaghetti code. |
Thanks everyone for your inputs 🙏 I've tried implementing this in #331 . It's not finished, but it works well enough that I can test it out in scikit-lego and see what the impact is TL;DR: If you use Here is what the diff looks like (all tests pass, and this is backwards-compatible - you have the option to use
diff --git a/sklego/_narwhals.py b/sklego/_narwhals.py
new file mode 100644
index 0000000..5286f14
--- /dev/null
+++ b/sklego/_narwhals.py
@@ -0,0 +1,3 @@
+from narwhals import StableAPI
+
+nw = StableAPI('0.20')
diff --git a/sklego/linear_model.py b/sklego/linear_model.py
index 99f689b..cc896cc 100644
--- a/sklego/linear_model.py
+++ b/sklego/linear_model.py
@@ -9,7 +9,7 @@ from abc import ABC, abstractmethod
from inspect import signature
from warnings import warn
-import narwhals as nw
+from sklego._narwhals import nw
import numpy as np
from scipy.optimize import minimize
from scipy.special._ufuncs import expit
[...] I think this is fairly minimal, and gives a way to promise consumers: code you write today will keep working This is inspired by Rust editions, so there is a working precedent for doing this, just not (as far as I'm aware) in the Python space Any feedback on the design? Would you use this? Does any part of it risk being confusing? 🙏 |
@MarcoGorelli Have you considered having different classes for the different editions? Without that, I don't know how static types can reflect things like Expr.cum_sum() in 1.0 becoming Expr.cumulative_sum() in 2.0. (Also, hello from the Shiny team--we're thinking about using Narwhals for our various dataframe related features!) |
Thanks @jcheng5 for your input! Indeed, static types are in issue in the proposal I'd written previously Something I'm trying out is import narwhals.stable.v0_20 as nw
# import narwhals.stable.v1_0 as nw in which things seem to be behaving a lot more nicely Also, that's amazing that you're considering Narwhals for use in Shiny 🙏 ! Please do let me know if there's anything I can do on the Narwhals side that would make it easier for (e.g. any missing features you need?), and feel free to book time on https://calendly.com/marcogorelli if that would be easier over a call |
Alright, this took some effort, but There's several check to ensure it stays up-to-date with the main namespace I've also sneakily added |
Curious to hear thoughts on this one - especially from @baggiponte
Narwhals follows a strict subset of the Polars API. What happens if/when some part of this subset gets changed in Polars itself?
For example:
So, what do we do in Narwhals? I'd suggest allowing people to specify which API version they want to follow in
pyproject.toml
, e.g.:Then:
api_version = "0.20"
in pyproject.toml, then we promise to keep giving them the pre 1.0 behaviour indefinitely (coalesce by default in left joins)api_version = "1.0"
in pyproject.toml (or if they don't specify it), then we give them the post 1.0 Polars behaviour (don't coalesce by default in left joins)The idea is that if you implement your library using Narwhals, then so long as you specify an
api_version
inpyproject.toml
, then your code should preserve the same behaviour indefinitely. You may occasionally need to bump your minimum Narwhals package version (so we can deal with working around upstream changes in pandas/Polars/etc), but you should never need to rewrite your dataframe logic, it'll be perfectly backwards-compatible.Very keen to also get @FBruzzesi and @koaning 's take on this, as consumers. Would you appreciate / be open to specifying a Narwhals
api_version
in yourpyproject.toml
, in exchange for the promise that the Narwhals API you use won't change?So long as we stick to fundamental dataframe operations, and a strict subset of Polars (rather than trying to reimplement everything...) then I think this should be doable!
The text was updated successfully, but these errors were encountered: