Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Parquet I/O #122

Merged
merged 15 commits into from
Jun 26, 2024
Merged

Add Parquet I/O #122

merged 15 commits into from
Jun 26, 2024

Conversation

JSchlensok
Copy link
Contributor

PR Checklist

  • This comment contains a description of changes (with reason)
  • Referenced issue is linked
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

Description of changes

Added module for reading/writing Pandas dataframes from/to Parquet files and partitioned datasets.

Technical details

Pretty straightforward. Deviated from the existing functionality for HDF5 slightly to enable support for multiple datasets indexed by name similar to HDF5's key-value structure through Parquet partitioned datasets.

Additional context

@JSchlensok JSchlensok requested a review from picciama June 4, 2024 16:26
@github-actions github-actions bot added the enhancement New feature or request label Jun 4, 2024
@JSchlensok
Copy link
Contributor Author

Will investigate failing checks, some of them are definitely on me for forgetting to run pre-commit hooks 🌚

@JSchlensok
Copy link
Contributor Author

@picciama All tests for Python 3.10 except safety (which reports vulnerabilities in tqdm and jinja2, which are not affected by this PR) run on my device - should I worry about the failing 3.8 test cases?

@picciama
Copy link
Contributor

picciama commented Jun 6, 2024

Oktoberfest will require at least python 3.9 so you can drop support for 3.8. Please change the requirements and classifiers in pyproject.toml accordingly, and update the matrix in the workflow file in .github/workflows/run_tests.yml so that the tests run using python 3.9. This will hopefully fix your issues. Then, please check mypy, there is also some issue there because you don't have a capital L for a list type somewhere I think. Safety issues may disappear when using a newer version of the packages that may not be available in python 3.8, hence it works on your machine with python 3.10 but not with 3.8 here. If safety is still failing, consider updating the dependencies that cause the issue.

@JSchlensok
Copy link
Contributor Author

safety still reports https://data.safetycli.com/v/70612/97c/, for which there is no fix available currently and that is reported to be believed to be invalid by its maintainer. What is the best practice to go about that?

Copy link
Contributor

@picciama picciama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few things require a second thought. Please fix. I fixed all remaining failing tests already.

spectrum_io/file/parquet.py Outdated Show resolved Hide resolved
.flake8 Outdated Show resolved Hide resolved
tests/unit_tests/test_parquet.py Show resolved Hide resolved
tests/unit_tests/test_parquet.py Show resolved Hide resolved
tests/unit_tests/test_parquet.py Show resolved Hide resolved
tests/unit_tests/test_parquet.py Show resolved Hide resolved
tests/unit_tests/test_parquet.py Show resolved Hide resolved
tests/unit_tests/test_parquet.py Show resolved Hide resolved
tests/unit_tests/test_parquet.py Show resolved Hide resolved
tests/unit_tests/test_parquet.py Show resolved Hide resolved
@JSchlensok JSchlensok merged commit 1a9f28d into development Jun 26, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants