Skip to content

End-to-end ML pipeline for predicting conversion in web sessions: feature engineering, CatBoost+XGBoost+LightGBM ensemble with calibration, FastAPI service, Streamlit UI, Airflow DAG orchestration, Dockerized.

License

Notifications You must be signed in to change notification settings

KonNik88/blendcal-conversion-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BlendCAL — Web-Session Conversion Prediction

Python FastAPI Streamlit Airflow Docker scikit-learn CatBoost XGBoost LightGBM License

Overview

BlendCAL is an end‑to‑end ML project for predicting whether a web session will convert.
The pipeline covers: feature engineering, an ensemble of CatBoost + XGBoost + LightGBM with isotonic calibration, a FastAPI inference service, a Streamlit UI, an Airflow DAG for orchestration, and full Docker setup.

Course project (Skillbox, ML specialization).


Data volume

  • ga_sessions.csv1,860,042 rows
  • ga_hits.csv15,726,470 rows

Project structure

final_project/
├─ dags/                          # Airflow DAG (blendcal_inference.py)
├─ project/
│  ├─ modules/                    # ETL & feature pipeline
│  │  ├─ extract_csv.py
│  │  ├─ prepare.py
│  │  └─ ensemble.py
│  ├─ data/                       # raw / landing / staging / predictions
│  └─ artifacts/                  # prep_params.json, freq_maps.json, models, calibration
├─ api/app/                       # FastAPI: main.py, artifacts_loader.py, preprocessor.py
├─ app/                           # Streamlit UI (streamlit_app.py)
├─ docker_airflow/                # Dockerfile + docker-compose for Airflow
├─ docker-compose.yml             # API + UI
├─ requirements-api.txt
├─ requirements-ui.txt
├─ MODEL_INFO.json
├─ VERSION
└─ docker_airflow/README_airflow.md

Tech stack

  • ML: CatBoost, XGBoost, LightGBM (weighted ensemble + isotonic calibration)
  • Preprocessing: median imputation, quantile clipping, frequency encoding, cyclic time features (sin/cos)
  • API: FastAPI, Pydantic, Uvicorn
  • UI: Streamlit
  • Orchestration: Airflow (PythonOperator, Docker stack)
  • Containerization: Docker, docker‑compose

Quickstart

1) API + UI (Docker)

docker compose up --build

2) Airflow (Docker)

See docker_airflow/README_airflow.md for details. TL;DR:

cd docker_airflow
docker compose down -v
docker compose up airflow-init
docker compose up -d webserver scheduler

Results

  • ROC‑AUC: 0.86
  • F1 macro: 0.75
  • Holdout period: 2021‑11 → 2021‑12

Model metadata & artifacts are recorded in MODEL_INFO.json.


Useful links


Author

  • Konstantin Nikiforov — Skillbox ML specialization (2025)

License

This project is licensed under the MIT License. See LICENSE for details.

About

End-to-end ML pipeline for predicting conversion in web sessions: feature engineering, CatBoost+XGBoost+LightGBM ensemble with calibration, FastAPI service, Streamlit UI, Airflow DAG orchestration, Dockerized.

Topics

Resources

License

Stars

Watchers

Forks