-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Support for Multi-Level Partition Tables #88
test: Support for Multi-Level Partition Tables #88
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain in the PR README how you set up the Postgres tables to match the Parquet partitions? It looks like this is something we'll need to document in the main repo to show users how to do.
Hi @rebasedming, I’ve put together a detailed README for the PR, which you can find here. I aimed to be thorough in capturing all the necessary details. If you find it too lengthy or in need of restructuring, please let me know, and I’ll make the necessary adjustments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put up #91 which brings all our text fixtures into this crate. If possible, I'd appreciate it if you could align your PR with this one once it's merged. For instance, you can now put your fixtures as a new file in the fixtures
crate, and auto_sales
can go under tables
.
tests/common/print_utils.rs
Outdated
} | ||
|
||
// Implement Printable for tuples up to 12 elements | ||
impl_printable_for_tuple!(T1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not bundle this into this PR, it's a separate thing that we can discuss whether it's worth bringing into the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done. I've removed the macros. Let me know if you want tests/common/print_utils.rs
removed as well.
Nice, thank you for the extremely thorough writeup. What I was most interested in is your approach of putting partitioned heap tables in front of foreign tables to pass partition keys to the Parquet file string. I wasn't aware this was possible and was hoping you could elaborate on that. |
Thanks for pointing out the missing critical piece. I've introduced the section Partitioned Table Structure and S3 Integration with the necessary details, and I've also created a TL;DR quick overview. |
ffc9515
to
50cdc5b
Compare
b8c27f9
to
9491b82
Compare
Hi @rebasedming, I've made some changes to address the
These changes were necessary to get the CI build passing. I know this might be outside the scope of our current PR, so let me know if you'd prefer I revert these changes. Thanks for your input on this. |
Can you explain which Clippy warnings you needed a new crate for? I don't think we should complicate the project with an extra crate. We can simply ignore some clippy warnings in the test files if they're causing issues. |
I also took a look at your CI error, and I'm a bit confused by it. I suspect this has to do with the newly introduced crate. If you keep a single crate, it shouldn't find two different PG versions. |
Hi @philippemnoel, I've identified an issue with commit
The problem:
I tried several solutions (e.g., using
|
This code is added by your PR right? Can we instead fix the import path and keep things in the same crate? |
e947e42
to
06f0b36
Compare
Hi @philippemnoel, The PR is ready for the next level of review. I've removed the duplicate code, performed a cleanup, and ensured proper integration with PR #91. Additionally, |
Hi @shamb0. This looks super clean! Thank you for integrating everything properly, I'm very excited about this PR. I believe it should also have documentation, so that the users can know that partitions are supported and know how to use them. Our documentation is stored in https://github.com/paradedb/paradedb/tree/dev/docs. Would you be willing to submit a PR with documentation to that repository as well? Then I think everything will be complete :). I'll let Ming do a more thorough review |
Hi @philippemnoel, I wanted to update you on PR#1568, which includes recent documentation changes. Currently, I’ve placed the new topic under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @shamb0! Thank you for doing this. We appreciate the docs PRs. We'll probably push a commit to it to make it more similar to our existing documentation, and then this should be good. I'll let @rebasedming do a proper review
Sorry, we had to merge a few other PRs to get |
Signed-off-by: shamb0 <r.raajey@gmail.com>
Signed-off-by: shamb0 <r.raajey@gmail.com>
Signed-off-by: shamb0 <r.raajey@gmail.com>
06f0b36
to
3ea0fc0
Compare
Hi @philippemnoel, The rebase is complete, and the PR should now be ready for intake review. Please let me know if you encounter any issues. Thanks again! |
Thank you! Could you please take a look at the failing test? |
Hi @shamb0 , I've spent some time testing this PR and I have some bad news. While the strategy you used of setting partitions to foreign tables works, it comes at a significant performance penalty. In |
Hi @rebasedming, Thank you for the insightful feedback and for highlighting the performance issue with the strategy used in the PR. Based on your comments, I understand that the current approach of using partitioned heap tables in front of foreign tables bypasses the executor hook, which normally pushes queries to DuckDB. This results in slower query execution through PostgreSQL's FDW API, as it ends up performing a sequential scan of the underlying Parquet files. I will investigate this issue further and work on an improved strategy that maintains the performance benefits of DuckDB. I’ll get back to you soon with a better solution. Thanks for your patience! |
We're excited to see the next iteration :) |
Closing this now as per the above discussion |
Closes #56
What
Implements a demonstration test for multi-level partition tables, addressing issue #56.
Why
This demo showcases the
pg_analytics
extension's capability to support multi-level partitioned tables. The implementation organizes data hierarchically, enabling efficient access to context-relevant information.How
pg_analytics
Foreign Data Wrapper (FDW) in PostgreSQL using S3 data.Tests
To run the demonstration test:
Test traces are available in the attached log file :: https://gist.githubusercontent.com/shamb0/2ed909ac9604c610af1d7fa0e87f9a82/raw/02a4203cdc1d675181d9f9700c578c81405becdb/wk2434-pg_analytics-mlp-demo.md