Skip to content

bluedrop-learning-networks/data_diff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

data_diff

A command-line tool for comparing two data sources (CSV/JSONL) and identifying differences.

Dependencies

Usage

Compare two data files (supports CSV and JSONL formats):

uv run data_diff source.csv target.jsonl

Options

  • Compare specific columns using a mapping file:
uv run data_diff --mapping column-map.json source.csv target.csv
  • Specify ID columns for matching records:
uv run data_diff --id-columns id,email source.csv target.csv
  • Output differences to a file:
uv run data_diff --output diff-report.txt source.csv target.csv
  • Show all available options:
uv run data_diff --help

Column Mapping Format

The mapping file (JSON) specifies how columns correspond between files:

{
    "source_column1": "target_column1",
    "source_column2": "target_column2"
}

Output

The tool will show:

  • Added records (in target but not source)
  • Removed records (in source but not target)
  • Modified records (matching IDs but different values)
  • Summary statistics of differences found

Development

Run tests:

uv run pytest

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages