Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement all TPC-H benchmarks in https://github.com/narwhals-dev/narwhals/tree/main/tpch #863

Closed
MarcoGorelli opened this issue Aug 24, 2024 · 11 comments
Labels
good first issue Good for newcomers, but anyone is welcome to submit a pull request! help wanted Extra attention is needed

Comments

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Aug 24, 2024

In https://github.com/narwhals-dev/narwhals/tree/main/tpch/notebooks, we have notebooks with which we can run the TPC-H queries via Narwhals for different backends

We're a bit inconsistent with their format, and are missing a few queries (because when we started, some functionality in Narwhals was missing).

I've made https://github.com/narwhals-dev/narwhals/tree/main/tpch for this purpose. Let's define the queries there, and then we can just import those into the notebooks.

The task is:


Example pull request: narwhals-dev/narwhals-tpch#1

@MarcoGorelli MarcoGorelli added help wanted Extra attention is needed good first issue Good for newcomers, but anyone is welcome to submit a pull request! good-for-euroscipy-2024-sprint labels Aug 24, 2024
@MarcoGorelli

This comment was marked as outdated.

@FBruzzesi
Copy link
Member

Maybe we should make a separate for these, like narwhals-tpch

I like the idea, maybe even benchmarking gets easier. #819 is a bit pointless with the tiny tiny subset of data we have in narwhals itself.

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Aug 26, 2024

Yup! Cool, I've made a subfolder, with a generate_data.py script https://github.com/narwhals-dev/narwhals/tree/main/tpch

We can use this to store queries, and check that they run, then in Kaggle I just clone this repo, import the queries function, and time them (taking the best of 7)

@MarcoGorelli MarcoGorelli changed the title Update notebooks so we can run all TPC-H benchmarks via Narwhals Implement all TPC-H benchmarks in https://github.com/narwhals-dev/narwhals-tpch Aug 26, 2024
@marenwestermann
Copy link
Contributor

Working on q10.

@MarcoGorelli MarcoGorelli changed the title Implement all TPC-H benchmarks in https://github.com/narwhals-dev/narwhals-tpch Implement all TPC-H benchmarks in https://github.com/narwhals-dev/narwhals/tree/main/tpch Aug 30, 2024
@montanarograziano
Copy link
Contributor

Working on query11

@MarcoGorelli
Copy link
Member Author

yay thanks! as a heads-up, I've moved the queries to the Narwhals repo itself (https://github.com/narwhals-dev/narwhals/tree/main/tpch) so we have everything together - I've excluded that folder from the build anyway

@montanarograziano
Copy link
Contributor

yay thanks! as a heads-up, I've moved the queries to the Narwhals repo itself (https://github.com/narwhals-dev/narwhals/tree/main/tpch) so we have everything together - I've excluded that folder from the build anyway

I just read this, after making a PR in the other repo 😅.
Should I make a PR here with the same changes too?

@MarcoGorelli
Copy link
Member Author

😆 sorry about that, my fault

sure, if it's not too much trouble (this repo will just be a bit pickier about linting, but if you look at the other queries in this repo that should help), else we can merge it there and I'll port it over here tomorrow

@MarcoGorelli
Copy link
Member Author

looks like q8 is the only one left, exciting!

@IsaiasGutierrezCruz
Copy link
Contributor

I'll take the q8 c:

@FBruzzesi
Copy link
Member

All implemented 🙌🏼🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers, but anyone is welcome to submit a pull request! help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants