-
Couldn't load subscription status.
- Fork 13
docs: Document that the write_parquet methods use mkdir in polars
#142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #142 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 49 50 +1
Lines 2818 2851 +33
=========================================
+ Hits 2818 2851 +33 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@AndreasAlbertQC I did not consider delta storage in this PR. If there is a more elegant way to fix this, feel free to amend/close this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mh, I thought that write_parquet takes care of this, unfortunately not... 😅 I think we should be consistent with write_parquet from polars though and not create parents (and document it accordingly).
dataframely/_storage/parquet.py
Outdated
| ) -> None: | ||
| file = kwargs.pop("file") | ||
| metadata = kwargs.pop("metadata", {}) | ||
| file.parent.mkdir(parents=True, exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we shouldn't do this here but rather in the specialized functions for collections. Also, I wouldn't set parents=True (see main comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm fair point. I looked into this more. pl.write_parquet has a parameter mkdir (default False). So to get the exact same behavior, I guess we would have to add this. But I see that the additional complexity is not worth it and we can just always not do it.
We currently rely on the fact that write_parquet sets mkdir to True for partitioned writes and creates all parent directories here because we add an extra layer of directories that is then going to be created by polars.
I think if we do not want an mkdir argument, we should not create anything anywhere, except for partitioned writes. That would not require any code changes, just updates to the docs. Is this what you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to just delegate this to the mkdir parameter in polars.
@borchero I just realized that the mkdir parameter is only exposed from polars 1.33 onwards. So I think we should wait with this PR until we bump there anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will bump the version requirement as part of #139, do you want to wait until then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that works for me. I will redraft this for now then.
write_parquet methods use mkdir in polars
Motivation
We would like to clarify in the docs that the
write_parquetmethods use themkdirkwarg to control whether directories should be created.Changes
mkdir)