Skip to content

A production-ready data pipeline and Streamlit dashboard that automates the cleaning, validation, and visualization of messy sales data. Built with Python, Pandas, and Plotly.

Notifications You must be signed in to change notification settings

quantumstack-labs/data-cleaning-automation-dashboard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Enterprise Data Automation & Analytics Pipeline

Dashboard Preview

🌟 Business Value

Manual data cleaning costs companies hundreds of hours in lost productivity. This project demonstrates a fully automated "Self-Service" pipeline that transforms fragmented, messy CSV exports into a production-ready analytical dashboard.

This solution eliminates human error and provides instant, actionable insights for decision-makers.


πŸ› οΈ The Problem (The "Dirty" Data)

Raw data from CRMs or Sales exports is often "broken." This system is designed to handle:

  • Date Inconsistency: Mixed formats (e.g., "Jan 23", "2023-01-01", "N/A").
  • Financial Formatting: Currency symbols and strings that prevent mathematical analysis.
  • Categorical Noise: Inconsistent regional/product naming (e.g., "north" vs "North").
  • Data Redundancy: Duplicate transactions that skew revenue totals.

βœ… The Solution (The Automation Engine)

I developed a custom Python-based engine that performs:

  1. Automated Validation: Ensures the uploaded file meets business requirements.
  2. Vectorized Cleaning: Uses Pandas to clean thousands of rows in milliseconds.
  3. Regex Sanitization: Advanced text processing to extract numeric values from dirty strings.
  4. Interactive BI: A Streamlit dashboard that allows stakeholders to explore data in real-time.

🧰 Tech Stack

  • Engine: Python, Pandas, NumPy
  • Visualization: Plotly Express (Interactive Charts)
  • Deployment: Streamlit Cloud (SaaS Interface)

πŸ“Š Key Dashboard Features

  • One-Click Upload: Users drag a raw CSV and get results instantly.
  • Smart Metrics: Real-time calculation of Total Revenue and Transaction Volume.
  • Automated Audit: Shows exactly how many duplicates were removed and rows processed.
  • Export Ready: Download the cleaned data immediately for use in other tools.

πŸš€ Quick Start

  1. Live Demo: [INSERT YOUR STREAMLIT URL HERE]
  2. Local Setup:
    • Clone this repository.
    • Install dependencies: pip install -r requirements.txt
    • Run the app: streamlit run app.py

About

A production-ready data pipeline and Streamlit dashboard that automates the cleaning, validation, and visualization of messy sales data. Built with Python, Pandas, and Plotly.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages