-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust,python): Add a config option to specify the default engine to use during lazyframe collect
calls
#20717
base: main
Are you sure you want to change the base?
Conversation
Wouldn't it be more practical if you can set any engine then as default? |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #20717 +/- ##
=======================================
Coverage 79.27% 79.27%
=======================================
Files 1585 1586 +1
Lines 226132 226169 +37
Branches 2592 2594 +2
=======================================
+ Hits 179258 179288 +30
- Misses 46279 46285 +6
- Partials 595 596 +1 ☔ View full report in Codecov by Sentry. |
I think this isn't as simple as it seems, because we do also call Polars methods in the Python code internally which are only supported on, for example, the eager engine. |
…-gpu-default-engine
Can you expand on what you mean here? Effectively what this is trying to do is add a global config option for the default value in Why would hooking this up via a config option not work? |
@wence- I have no idea what the full scope of such examples are throughout the codebase, but one example is |
OK, thanks, I think I see. I think there's some amount of tension here between making this "easy" to configure globally and I guess some things breaking at a distance from where the user might expect. |
Ok tests are passing now. Gentle ping for reviews from @ritchie46 @orlp et al |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is useful, but I think we should do it a bit differently.
We don't have to check the flag in python, but rather, check in rust in the main optimizer loop. I also think we should call it `"engine affinity" because we don't guarantee anything.
py-polars/polars/lazyframe/frame.py
Outdated
@@ -2003,6 +2003,8 @@ def collect( | |||
if not (is_config_obj or engine in ("cpu", "gpu")): | |||
msg = f"Invalid engine argument {engine=}" | |||
raise ValueError(msg) | |||
if get_default_engine() == "gpu": # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should dispatch here, but rather in fn optimize
.
I also think we should do all engines at once. Where we call it "engine_affinity" as we cannot guarantee it will on the preferred engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so that would be moving the current python-based engine-selection logic into optimize
? Note how below we need to know here whether or not we have a "gpu" engine to deliver the appropriate post_opt_callback
, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened up #21102 to track dispatching in the main optimizer loop. We can move the discussion over there. I don't think it should block this PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, right this only works for streaming/in-memory engine. For the post-opt-callback
this indeed needs to be in python. I will take a better look to better understand where we can have that logic.
collect
calls
Closes #19797.