tap-parquet
is a Singer tap for Parquet.
Built with the Meltano Tap SDK for Singer Taps.
This is a fork of the ae-nv variant, rebuilt on v0.39.1 of the Meltano sdk cookiecutter template.
Parquet is a portable, type-aware, columnar, compressed, splittable, and cloud-friendly format.
For more information why Parquet is increasingly used in big data applications, see this comparison.
Setting | Required | Default | Description |
---|---|---|---|
paths | True | None | Paths to Parquet Datasets |
A full list of supported settings and capabilities for this tap is available by running:
tap-parquet --about
This Singer tap will automatically import any environment variables within the working directory's
.env
if the --config=ENV
is provided, such that config values will be considered if a matching
environment variable is set either in the terminal context or in the .env
file.
You can easily run tap-parquet
by itself or in a pipeline using Meltano.
tap-parquet --version
tap-parquet --help
tap-parquet --config CONFIG --discover > ./catalog.json
Follow these instructions to contribute to this project.
pipx install poetry
poetry install
Create tests within the tests
subfolder and
then run:
poetry run pytest
You can also test the tap-parquet
CLI interface directly using poetry run
:
poetry run tap-parquet --help
Testing with Meltano
Note: This tap will work in any Singer environment and does not require Meltano. Examples here are for convenience and to streamline end-to-end orchestration scenarios.
Next, install Meltano (if you haven't already) and any needed plugins:
# Install meltano
pipx install meltano
# Initialize meltano within this directory
cd tap-parquet
meltano install
Now you can test and orchestrate using Meltano:
# Test invocation:
meltano invoke tap-parquet --version
# OR run a test `elt` pipeline:
meltano elt tap-parquet target-jsonl
See the dev guide for more instructions on how to use the SDK to develop your own taps and targets.