Skip to content

TPCH Queries 9 and 10 #407

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 5, 2024
Merged

TPCH Queries 9 and 10 #407

merged 7 commits into from
Jul 5, 2024

Conversation

ugohuche
Copy link
Contributor

@ugohuche ugohuche commented Jul 4, 2024

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below.

TPCH Benchmarking Queries 9 and 10

Copy link
Collaborator

@EdAbati EdAbati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR :)

It looks good, I have just added a couple of comments. Not sure if they were discussed somewhere else.

Copy link
Collaborator

@EdAbati EdAbati Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing to this, the .round(2) is missing after the sum in line 40 of the 3rd cell

In case it wasn't previously discussed to omit it, I think it should be added.

Copy link
Collaborator

@EdAbati EdAbati Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in the other comment, I'd add the round(2) after nw.sum.

Out of curiosity, do we get an error if we don't create the revenue column before the group by?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, the revenue column was created inside the agg with the round(2), but it gave a UserWarning of a "complex group-by expression" from the pandas API and it slows it down significantly for pandas.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @ugohuche , well done! and thanks @EdAbati for the review - I think it's fine to skip round as it would otherwise give the userwarning for the pandas-like dataframes

@MarcoGorelli MarcoGorelli merged commit c997006 into narwhals-dev:main Jul 5, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants