A command-line tool for comparing two data sources (CSV/JSONL) and identifying differences.
Compare two data files (supports CSV and JSONL formats):
uv run data_diff source.csv target.jsonl
- Compare specific columns using a mapping file:
uv run data_diff --mapping column-map.json source.csv target.csv
- Specify ID columns for matching records:
uv run data_diff --id-columns id,email source.csv target.csv
- Output differences to a file:
uv run data_diff --output diff-report.txt source.csv target.csv
- Show all available options:
uv run data_diff --help
The mapping file (JSON) specifies how columns correspond between files:
{
"source_column1": "target_column1",
"source_column2": "target_column2"
}
The tool will show:
- Added records (in target but not source)
- Removed records (in source but not target)
- Modified records (matching IDs but different values)
- Summary statistics of differences found
Run tests:
uv run pytest