BlendCAL is an end‑to‑end ML project for predicting whether a web session will convert.
The pipeline covers: feature engineering, an ensemble of CatBoost + XGBoost + LightGBM with isotonic calibration, a FastAPI inference service, a Streamlit UI, an Airflow DAG for orchestration, and full Docker setup.
Course project (Skillbox, ML specialization).
ga_sessions.csv— 1,860,042 rowsga_hits.csv— 15,726,470 rows
final_project/
├─ dags/ # Airflow DAG (blendcal_inference.py)
├─ project/
│ ├─ modules/ # ETL & feature pipeline
│ │ ├─ extract_csv.py
│ │ ├─ prepare.py
│ │ └─ ensemble.py
│ ├─ data/ # raw / landing / staging / predictions
│ └─ artifacts/ # prep_params.json, freq_maps.json, models, calibration
├─ api/app/ # FastAPI: main.py, artifacts_loader.py, preprocessor.py
├─ app/ # Streamlit UI (streamlit_app.py)
├─ docker_airflow/ # Dockerfile + docker-compose for Airflow
├─ docker-compose.yml # API + UI
├─ requirements-api.txt
├─ requirements-ui.txt
├─ MODEL_INFO.json
├─ VERSION
└─ docker_airflow/README_airflow.md
- ML: CatBoost, XGBoost, LightGBM (weighted ensemble + isotonic calibration)
- Preprocessing: median imputation, quantile clipping, frequency encoding, cyclic time features (sin/cos)
- API: FastAPI, Pydantic, Uvicorn
- UI: Streamlit
- Orchestration: Airflow (PythonOperator, Docker stack)
- Containerization: Docker, docker‑compose
docker compose up --build- FastAPI Swagger: http://localhost:8000/docs
- Streamlit: http://localhost:8501
See docker_airflow/README_airflow.md for details. TL;DR:
cd docker_airflow
docker compose down -v
docker compose up airflow-init
docker compose up -d webserver scheduler- Airflow UI: http://localhost:8080 (admin / admin)
- ROC‑AUC: 0.86
- F1 macro: 0.75
- Holdout period: 2021‑11 → 2021‑12
Model metadata & artifacts are recorded in MODEL_INFO.json.
- Airflow how‑to:
docker_airflow/README_airflow.md - Model passport:
MODEL_INFO.json
- Konstantin Nikiforov — Skillbox ML specialization (2025)
This project is licensed under the MIT License. See LICENSE for details.