Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust,python): Add a config option to specify the default engine to use during lazyframe collect calls #20717

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

Matt711
Copy link

@Matt711 Matt711 commented Jan 14, 2025

Closes #19797.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jan 14, 2025
@ion-elgreco
Copy link
Contributor

Wouldn't it be more practical if you can set any engine then as default?

Copy link

codecov bot commented Jan 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.27%. Comparing base (ebb513c) to head (2922103).
Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #20717   +/-   ##
=======================================
  Coverage   79.27%   79.27%           
=======================================
  Files        1585     1586    +1     
  Lines      226132   226169   +37     
  Branches     2592     2594    +2     
=======================================
+ Hits       179258   179288   +30     
- Misses      46279    46285    +6     
- Partials      595      596    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

py-polars/polars/config.py Outdated Show resolved Hide resolved
crates/polars-core/src/config.rs Outdated Show resolved Hide resolved
@orlp
Copy link
Collaborator

orlp commented Jan 15, 2025

I think this isn't as simple as it seems, because we do also call Polars methods in the Python code internally which are only supported on, for example, the eager engine.

@wence-
Copy link
Collaborator

wence- commented Jan 15, 2025

I think this isn't as simple as it seems, because we do also call Polars methods in the Python code internally which are only supported on, for example, the eager engine.

Can you expand on what you mean here? Effectively what this is trying to do is add a global config option for the default value in LazyFrame.collect(..., engine="cpu") (to make that literal "cpu" string optionally "gpu" instead). The Python-side logic that then perhaps nonetheless turns off the gpu engine still exists.

Why would hooking this up via a config option not work?

@Matt711 Matt711 marked this pull request as ready for review January 15, 2025 20:34
@orlp
Copy link
Collaborator

orlp commented Jan 15, 2025

@wence- I have no idea what the full scope of such examples are throughout the codebase, but one example is pl.align_frames dispatching to other calls. Such a function might suddenly stop working if you change the default engine, because not all engines have the exact same supported feature set.

@wence-
Copy link
Collaborator

wence- commented Jan 16, 2025

OK, thanks, I think I see. I think there's some amount of tension here between making this "easy" to configure globally and I guess some things breaking at a distance from where the user might expect.

@Matt711 Matt711 changed the title feat: Add config to specify GPU polars as the default engine feat(rust,python): Add config to specify GPU polars as the default engine Jan 16, 2025
@Matt711
Copy link
Author

Matt711 commented Jan 23, 2025

Ok tests are passing now. Gentle ping for reviews from @ritchie46 @orlp et al

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is useful, but I think we should do it a bit differently.

We don't have to check the flag in python, but rather, check in rust in the main optimizer loop. I also think we should call it `"engine affinity" because we don't guarantee anything.

@@ -2003,6 +2003,8 @@ def collect(
if not (is_config_obj or engine in ("cpu", "gpu")):
msg = f"Invalid engine argument {engine=}"
raise ValueError(msg)
if get_default_engine() == "gpu": # pragma: no cover
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should dispatch here, but rather in fn optimize.

I also think we should do all engines at once. Where we call it "engine_affinity" as we cannot guarantee it will on the preferred engine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so that would be moving the current python-based engine-selection logic into optimize? Note how below we need to know here whether or not we have a "gpu" engine to deliver the appropriate post_opt_callback, I think.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened up #21102 to track dispatching in the main optimizer loop. We can move the discussion over there. I don't think it should block this PR though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, right this only works for streaming/in-memory engine. For the post-opt-callback this indeed needs to be in python. I will take a better look to better understand where we can have that logic.

py-polars/polars/config.py Outdated Show resolved Hide resolved
py-polars/polars/config.py Outdated Show resolved Hide resolved
crates/polars-python/src/functions/utils.rs Outdated Show resolved Hide resolved
@Matt711 Matt711 changed the title feat(rust,python): Add config to specify GPU polars as the default engine feat(rust,python): Add a config option to specify the default engine to use during lazyframe collect calls Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA/Discussion]: allow specification of default engine via config options
6 participants