📊 Product Propensity Predictor

A comprehensive machine learning system that predicts customer purchase propensity using Brazilian e-commerce data.
The project implements multiple algorithms including Random Forest, LightGBM, and Logistic Regression with automated model comparison and selection capabilities.

📖 Overview

This project analyzes the Olist Brazilian E-commerce dataset to build predictive models that identify customers most likely to make purchases.
The system features modular architecture with separate components for data preprocessing, feature engineering, model training, and evaluation.

🚀 Key Features

Multi-Algorithm Support: Random Forest, LightGBM, Logistic Regression
Automated Feature Engineering: 15+ new features from raw data (customer behavior, purchase history, reviews, etc.)
Modular Architecture: Preprocessing, feature engineering, validation, and training pipelines
Model Export Functionality: Save and reuse trained models for production
Interactive Dashboard: Built with Streamlit for visualization and predictions

📂 Dataset

The project uses the Olist Brazilian E-commerce Public Dataset (2016–2018), containing information about orders, customers, products, and reviews.

🔗 Dataset link: Kaggle - Brazilian E-commerce

Data Sources:

Customer demographics & behavior
Order history & transaction details
Product categories & details
Review scores & feedback patterns

🗂️ Project Structure

propensity-predictor/
├── data/
│   ├── raw/              # Original dataset files
│   └── processed/        # Cleaned and engineered features
├── src/
│   ├── preprocessing.py  # Data cleaning and preparation
│   ├── data_processing.py# Feature engineering and transformation
│   ├── validation.py     # Model validation and evaluation
│   └── model_training.py # Training pipeline for all algorithms
├── models/
│   └── trained/          # Exported trained models
├── notebooks/
│   └── exploratory/      # Data exploration and analysis
├── streamlit_app.py      # Interactive dashboard
├── requirements.txt      # Python dependencies
└── README.md

Installation

Prerequisites

Python 3.8+
pip package manager

Setup Instructions

Clone the repository

git clone https://github.com/yourusername/propensity-predictor.git
cd propensity-predictor

Create virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Dependencies

pandas >= 1.5.0
scikit-learn >= 1.2.0
lightgbm >= 3.3.0
streamlit >= 1.28.0
numpy >= 1.24.0
matplotlib >= 3.6.0
seaborn >= 0.12.0

Usage

Running the Complete Pipeline

Data Preprocessing

python src/preprocessing.py

Feature Engineering

python src/data_processing.py

Model Training

python src/model_training.py

Interactive Dashboard

Launch the Streamlit application for model interaction and visualization:

streamlit run streamlit_app.py

The dashboard provides:

Model performance comparison
Feature importance visualization
Prediction interface for new customers
Model metrics and evaluation results

Model Performance

The system automatically compares three machine learning algorithms and selects the best performer based on evaluation metrics :

| Algorithm            | Precision | Recall | F1-Score | AUC-ROC |
|----------------------|-----------|--------|----------|---------|
| Random Forest        | 0.87      | 0.83   | 0.85     | 0.91    |
| LightGBM             | 0.89      | 0.84   | 0.86     | 0.93    |
| Logistic Regression  | 0.78      | 0.75   | 0.76     | 0.82    |

Features Engineered

The system creates comprehensive features including:

Customer lifetime value metrics
Purchase frequency patterns
Product category preferences
Seasonal buying behavior
Review sentiment indicators
Geographic purchase patterns
Time-based features (days since last purchase, purchase velocity)

Technical Implementation

Data Processing Pipeline

Preprocessing Module: Handles missing values, outlier detection, and data cleaning
Feature Engineering: Creates derived features from raw transactional data
Validation Module: Implements cross-validation and performance evaluation
Model Training: Automated training pipeline with hyperparameter optimization

Model Architecture

Ensemble Approach: Combines multiple algorithms for robust predictions
Automated Selection: System selects best-performing model based on validation metrics
Export Capability: Trained models saved for production deployment

Future Enhancements

Implementation of deep learning models (Neural Networks)
Real-time prediction API development
Advanced feature engineering with time-series analysis
Integration with cloud platforms for scalable deployment
A/B testing framework for model comparison

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Olist for providing the Brazilian E-commerce dataset
Scikit-learn and LightGBM communities for excellent ML libraries
Streamlit team for the intuitive web app framework

Contact

Project Link: https://github.com/yourusername/propensity-predictor

Built with Python, Scikit-learn, LightGBM, and Streamlit

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
analysis_results		analysis_results
data		data
models		models
README.md		README.md
datamanagement.py		datamanagement.py
datapreprocessing.py		datapreprocessing.py
datavalidation.py		datavalidation.py
enhanced_dashboard.py		enhanced_dashboard.py
enhanced_market_basket.py		enhanced_market_basket.py
hyperparameter_tuning.py		hyperparameter_tuning.py
individual_predictor.py		individual_predictor.py
main_pipeline.py		main_pipeline.py
market_basket_analysis.py		market_basket_analysis.py
market_basket_diagnostic.py		market_basket_diagnostic.py
model_evaluation.py		model_evaluation.py
modeltraining.py		modeltraining.py
next_purchase_data_prep.py		next_purchase_data_prep.py
next_purchase_recommendation_system.py		next_purchase_recommendation_system.py
recommendation_dashboard.py		recommendation_dashboard.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📊 Product Propensity Predictor

📖 Overview

🚀 Key Features

📂 Dataset

🗂️ Project Structure

Installation

Prerequisites

Setup Instructions

Dependencies

Usage

Running the Complete Pipeline

The dashboard provides:

Model Performance

Features Engineered

Technical Implementation

Data Processing Pipeline

Model Architecture

Future Enhancements

Contributing

License

Acknowledgments

Contact

About

Uh oh!

Releases

Packages

Languages

Harshitjasuja/Propensity-Predictor

Folders and files

Latest commit

History

Repository files navigation

📊 Product Propensity Predictor

📖 Overview

🚀 Key Features

📂 Dataset

🗂️ Project Structure

Installation

Prerequisites

Setup Instructions

Dependencies

Usage

Running the Complete Pipeline

The dashboard provides:

Model Performance

Features Engineered

Technical Implementation

Data Processing Pipeline

Model Architecture

Future Enhancements

Contributing

License

Acknowledgments

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages