Data Validation Pipeline: An Extension of the Luhn Algorithm Project

📖 About the Project

This project originated from the "Learn How to Work with Numbers and Strings by Implementing the Luhn Algorithm" module, part of the freeCodeCamp "Scientific Computing with Python" course.

Going beyond the original exercise, I took the initiative to extrapolate the core concept into a real-world data engineering scenario. Instead of just a function that validates a single number, I built a mini-ETL pipeline that reads a batch of "dirty" records from a CSV file, cleans them, applies the Luhn validation logic, and loads the enriched results into a new file.

This demonstrates a practical approach to solving data quality problems at scale.

✨ Relevance to Data Engineering

This project showcases fundamental skills required for building robust data pipelines:

ETL Process: A complete Extract, Transform, and Load workflow using Python and Pandas.
Data Quality & Validation: Applying a specific business rule (the Luhn algorithm) to programmatically check data integrity.
Data Cleaning: Preprocessing raw data to standardize formats before validation.
Batch Processing: Handling an entire file of records, which simulates a real-world data processing task.

📋 Pipeline Overview

Extract: Reads a CSV file containing card numbers in various formats.
Transform: Cleans each number (removing spaces/hyphens) and applies the Luhn algorithm to check its validity, creating a new is_valid column.
Load: Writes a new CSV file containing the original data, the cleaned data, and the validation result.

🚀 How to Run

Clone the repository.
Create and activate a virtual environment.
Install the required dependencies:
```
pip install -r requirements.txt
```
Run the main script to execute the pipeline:
```
python3 src/main.py
```

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data/input		data/input
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README-PT.md		README-PT.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Validation Pipeline: An Extension of the Luhn Algorithm Project

📖 About the Project

✨ Relevance to Data Engineering

📋 Pipeline Overview

🚀 How to Run

About

Uh oh!

Releases

Packages

Languages

License

zenleonardo/luhn-data-validator

Folders and files

Latest commit

History

Repository files navigation

Data Validation Pipeline: An Extension of the Luhn Algorithm Project

📖 About the Project

✨ Relevance to Data Engineering

📋 Pipeline Overview

🚀 How to Run

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages