- Create project structure and virtual environment using uv
- Set up basic CLI framework using argparse
- Test: Basic CLI argument parsing
- Create pyproject.toml with dependencies
- Set up testing framework (pytest)
- Create initial README.md
- Implement file format detection (CSV/JSONL)
- Test: Format detection for CSV and JSONL files
- Create CSV reader with configurable delimiter
- Test: Reading CSV with different delimiters (test_scenarios.md #7)
- Create JSONL reader
- Test: Reading JSONL files (test_scenarios.md #2)
- Implement chunked reading for large files
- Test: Processing large files (test_scenarios.md #3)
- Add error handling for file access and parsing
- Test: Invalid file formats and access errors
- Create data source abstraction layer
- Test: Common interface for different file types
- Implement automatic column mapping
- Header name similarity matching
- Test: Basic column name matching (test_scenarios.md #1)
- Data content similarity analysis
- Test: Content-based mapping accuracy
- Header name similarity matching
- Create configuration file parser (JSON/YAML)
- Test: Config file parsing and validation
- Implement manual column mapping via config
- Test: Custom mapping configurations
- Add validation for mapping configuration
- Test: Invalid mapping scenarios
- Implement automatic ID column detection
- Column name analysis ("id", "key", etc.)
- Test: ID column detection (test_scenarios.md #3)
- Data uniqueness analysis
- Test: Uniqueness validation
- Column name analysis ("id", "key", etc.)
- Add manual ID column specification
- Test: Custom ID column configuration
- Implement ID column validation
- Test: Invalid ID columns
- Add duplicate ID detection
- Test: Duplicate ID handling
- Implement row-level comparison
- Find rows unique to source 1
- Test: Unique row detection (test_scenarios.md #1, #2)
- Find rows unique to source 2
- Test: Unique row detection (test_scenarios.md #1, #2)
- Detect rows with matching IDs but different values
- Test: Value difference detection (test_scenarios.md #2)
- Find rows unique to source 1
- Implement column-level comparison
- Calculate matching value percentages
- Test: Similarity calculations
- Identify columns with highest/lowest similarity
- Test: Column similarity ranking
- Calculate matching value percentages
- Add support for case-insensitive comparison
- Test: Case sensitivity handling (test_scenarios.md #6)
- Add support for string trimming
- Test: String trimming functionality
- Implement column selection/exclusion
- Test: Column filtering
- Create summary report generator
- Row count differences
- Test: Summary statistics accuracy
- Column similarity statistics
- Test: Statistical calculations
- Row count differences
- Implement detailed diff generation
- Colorized console output
- Test: Console formatting
- Side-by-side comparison
- Test: Comparison display format
- Colorized console output
- Add output format handlers
- Console output formatter
- Test: Console output formatting
- CSV output formatter
- Test: CSV output generation
- JSON output formatter
- Test: JSON output generation
- Console output formatter
- Implement drill-down queries
- Show unique rows by source
- Test: Row filtering
- Show differences for specific IDs
- Test: ID-based filtering
- Show unique rows by source
- Implement memory-efficient processing
- Test: Memory usage with large datasets
- Add progress indicators for large files
- Test: Progress reporting accuracy
- Optimize comparison algorithms
- Test: Performance benchmarks
- Add performance benchmarking
- Test: Benchmark suite execution
- Write detailed API documentation
- Create user guide with examples
- Add command-line help text
- Document configuration file format
- Add contributing guidelines
- Code cleanup and refactoring
- Error message improvements
- Final performance tuning
- Release preparation