-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement partial "lazy" support for DuckDB (even with this PR, DuckDB support is work-in-progress!) #1725
feat: Implement partial "lazy" support for DuckDB (even with this PR, DuckDB support is work-in-progress!) #1725
Conversation
can't wait for this to be merged πͺ thanks @MarcoGorelli that's gonna provide a great DX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting times π¦ π₯°
just added a couple of small questions and (hopefully useful) comments
if subset is not None and any(x not in self.columns for x in subset): | ||
msg = f"Column(s) {subset} not found in {self.columns}" | ||
raise ColumnNotFoundError(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we create a check_columns_exist
function in narwhals.utils
so we can reuse everywhere else? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah sure!
@@ -38,6 +38,8 @@ def test_arithmetic_expr( | |||
constructor: Constructor, | |||
request: pytest.FixtureRequest, | |||
) -> None: | |||
if "duckdb" in str(constructor) and attr == "__floordiv__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__floordiv__
should be implemented, or am I looking at the wrong thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it behaves differently though, so i think we need a separate discussion for how to deal with it, e.g.
In [3]: duckdb.sql('select 1.5 // 2.5')
Out[3]:
ββββββββββββββββ
β (1.5 // 2.5) β
β double β
ββββββββββββββββ€
β 0.6 β
ββββββββββββββββ
In [4]: 1.5 // 2.5
Out[4]: 0.0
@@ -109,18 +112,18 @@ def test_cast( | |||
|
|||
|
|||
def test_cast_series( | |||
constructor: Constructor, | |||
constructor_eager: ConstructorEager, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need to take a crash course in duckDB. I tried to leave a couple of comments.
Main point being, how we want to design the collect
method in the main LazyFrame
class
return ArrowDataFrame( | ||
native_dataframe=self._native_frame.arrow(), | ||
backend_version=parse_version(pa.__version__), | ||
version=self._version, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that duckdb is dependency free.
Should we jump to the discussion in #1479 before deciding how to collect for duckdb?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think even in that one the likely default for duckdb would still be pyarrow though, right? i've added a try-except anyway to show a less surprising error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to make a draft/RFC tomorrow to follow up on my comment in the thread ;)
narwhals/_duckdb/dataframe.py
Outdated
|
||
def to_pandas(self: Self) -> pd.DataFrame: | ||
# only is version if v1, keep around for backcompat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: implement version check? Same for to_arrow
?
# only is version if v1, keep around for backcompat | |
# only if version is v1, keep around for backcompat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_pandas
wouldn't be available on nw.LazyFrame
anyway so this wouldn't be reachable for non-v1
assert left_on is not None # noqa: S101 | ||
assert right_on is not None # noqa: S101 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should already never be the case after all the checks in BaseFrame.join
considering how in {"inner", "left"}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true but then mypy complains π
thanks both for your reviews and comments, much appreciated! any objections to merging as-is before merge conflicts, and then we iterate on it until it's complete? |
thanks all for comments! doesn't look like there's been objections, and this PR is quite self-standing (it doesn't affect existing core backends) so I'll go ahead and ship it so we can release, then we can fill it out bit-by-bit and it can become truly incredible @choucavalier thanks for your interest - please note that this is really work-in-progress so you'll likely run into quite a few missing methods if you try it out. Nonetheless, i'd be curious to hear how you find it if you do |
thanks @MarcoGorelli i'll try it out and report any issue with proper failing tests :) thanks for your amazing work. you're a MACHINE. like @EdAbati said: exciting times!! π |
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below