CSVs are a ubiquitous format for data transfer that are commonly riddled with issues. Most CSV libraries abort with an unhelpful error, CSV GP allows you to pinpoint these common issues with a CSV file, as well as export just the parsable lines from a file.
CSV GP can be used in three ways.
- Install rust
- Clone the repo and navigate into it
- Run
cargo install --path csv_gp
- The
csv-gp
command will now be available to run, please seecsv-gp --help
for usage
Add the following to your Cargo.toml
:
csv-gp = { git = "https://github.com/xelixdev/csv-gp", rev = "<optional git tag>" }
The library is available on PyPI, at https://pypi.org/project/csv-gp/ so you can just run:
pip install csv-gp
- Install rust
- Install (
pip install maturin
) - Clone the repo
- Run
make all
cd csv_gp_python && maturin develop
After installing the binary, the default usage is running csv-gp $FILE
. This will print a diagnosis of the file. The command provides options to change the delimiter and the encoding of the file. See csv-gp -h
for details.
Another option provided is --correct-rows-path
which will export only the correct rows to the provided path.
The python library exposes two main functions, check_file
and get_rows
.
The check file function takes a path to file, the delimiter and the encoding (see https://github.com/xelixdev/csv-gp/blob/0f77c62841509c134a3bbe06ec178426e9c5aa10/csv_gp_python/csv_gp.pyi) and returns an instance of a class CSVDetails
which provides details about the file. See the same file to see all the available attributes and their names/types.
If the valid_rows_output_path
argument is provided to the function, only the correct rows will be exported to that path.
The get_rows once again takes a path to file, the delimiter and the encoding and additionally a list of row numbers. The function will then return the parsed cells for given rows. See the above file for the exact typing of the parameter and returned values.
- Update version numbers in
csv_gp_python/Cargo.toml
andcsv_gp/Cargo.toml
- Run
cargo check
to update the lock files with new versions - Merge this change into main
- Create a new release on GitHub, creating a tag in the form
vX.Y.Z
- The 'Publish' pipeline should begin running, and the new version will be published
Run cargo test
.
Follow the instructions on compiling from source. Then you can run pytest
.