Add Parquet I/O #122

JSchlensok · 2024-06-04T16:26:13Z

PR Checklist

This comment contains a description of changes (with reason)
Referenced issue is linked
If you've fixed a bug or added code that should be tested, add tests!
Documentation in docs is updated

Description of changes

Added module for reading/writing Pandas dataframes from/to Parquet files and partitioned datasets.

Technical details

Pretty straightforward. Deviated from the existing functionality for HDF5 slightly to enable support for multiple datasets indexed by name similar to HDF5's key-value structure through Parquet partitioned datasets.

Additional context

JSchlensok · 2024-06-04T16:28:25Z

Will investigate failing checks, some of them are definitely on me for forgetting to run pre-commit hooks 🌚

JSchlensok · 2024-06-05T14:16:53Z

@picciama All tests for Python 3.10 except safety (which reports vulnerabilities in tqdm and jinja2, which are not affected by this PR) run on my device - should I worry about the failing 3.8 test cases?

picciama · 2024-06-06T16:37:14Z

Oktoberfest will require at least python 3.9 so you can drop support for 3.8. Please change the requirements and classifiers in pyproject.toml accordingly, and update the matrix in the workflow file in .github/workflows/run_tests.yml so that the tests run using python 3.9. This will hopefully fix your issues. Then, please check mypy, there is also some issue there because you don't have a capital L for a list type somewhere I think. Safety issues may disappear when using a newer version of the packages that may not be available in python 3.8, hence it works on your machine with python 3.10 but not with 3.8 here. If safety is still failing, consider updating the dependencies that cause the issue.

JSchlensok · 2024-06-11T20:36:56Z

safety still reports https://data.safetycli.com/v/70612/97c/, for which there is no fix available currently and that is reported to be believed to be invalid by its maintainer. What is the best practice to go about that?

…patible inputs

Only import and use the suppress_type_checks context decorator if a typeguard session is run by nox as it is not installed for other sessions

picciama

a few things require a second thought. Please fix. I fixed all remaining failing tests already.

spectrum_io/file/parquet.py

.flake8

tests/unit_tests/test_parquet.py

Add Parquet I/O similar to HDF5 support

98f4f30

JSchlensok requested a review from picciama June 4, 2024 16:26

github-actions bot added the enhancement New feature or request label Jun 4, 2024

Fix mypy and pre-commit tests failing

29f39c0

JSchlensok added 2 commits June 11, 2024 20:24

chore: upgrade Python version from 3.8 to 3.9

b883563

chore: Add exceptions for overly specific flake8 rules

ece1b57

JSchlensok and others added 8 commits June 13, 2024 12:27

chore: suppress typeguard checking on tests intentionally using incom…

9210c32

…patible inputs

chore: fix pytest failing due to typeguard package import

51ebbb7

Only import and use the suppress_type_checks context decorator if a typeguard session is run by nox as it is not installed for other sessions

chore: fix typeguard version

0582608

chore: formatting

53c65d0

updated hash

d55c808

ignored 70612 vulnerability in safety checks

2634852

updated workflow python versions to 3.9

282f31c

fix version number interpretation by quotation

702cf43

picciama requested changes Jun 24, 2024

View reviewed changes

JSchlensok added 3 commits June 25, 2024 09:52

chore: Remove generic typing

c59dbfc

chore: fix flake8 rule B026

b306a8e

chore(parquet): delete output paths after writing

76819fb

picciama approved these changes Jun 26, 2024

View reviewed changes

JSchlensok merged commit 1a9f28d into development Jun 26, 2024
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Parquet I/O #122

Add Parquet I/O #122

JSchlensok commented Jun 4, 2024

JSchlensok commented Jun 4, 2024

JSchlensok commented Jun 5, 2024

picciama commented Jun 6, 2024 •

edited

Loading

JSchlensok commented Jun 11, 2024

picciama left a comment

Add Parquet I/O #122

Add Parquet I/O #122

Conversation

JSchlensok commented Jun 4, 2024

JSchlensok commented Jun 4, 2024

JSchlensok commented Jun 5, 2024

picciama commented Jun 6, 2024 • edited Loading

JSchlensok commented Jun 11, 2024

picciama left a comment

Choose a reason for hiding this comment

picciama commented Jun 6, 2024 •

edited

Loading