Predict whether a hotel reservation will be cancelled or not using a full MLOps pipeline:
- Data from Google Cloud Storage (GCS)
- Processing & training (LightGBM)
- Packaging with Docker
- CI/CD with Jenkins
- Deployment to Cloud Run
-
End-to-End ML Pipeline
- Data ingestion from GCS β
artifacts/raw/ - Preprocessing (encoding, skew handling, SMOTE, feature selection) β
artifacts/processed/ - Model training (LightGBM + RandomizedSearchCV) β
artifacts/models/lgbm_model.pkl - MLflow logging (datasets, params, metrics, model)
- Data ingestion from GCS β
-
Web App
- Flask server with a clean HTML/CSS form
- Real-time prediction using the saved model
-
Dockerized & Cloud Native
- Production image
- Deployed to Cloud Run (serverless)
-
CI/CD with Jenkins
- Clone β setup venv β train (optional) β build & push to GCR β deploy to Cloud Run
sami-codeai-hotel\_reservation\_prediction/
βββ application.py # Flask app (serves predictions)
βββ Dockerfile # Container image for Cloud Run
βββ Jenkinsfile # Pipeline: build β push β deploy
βββ requirements.txt # Python deps
βββ setup.py # Editable install
βββ README.md # (this file)
β
βββ config/
β βββ config.yaml # Ingestion & processing config (bucket, columns, etc.)
β βββ model\_params.py # LightGBM + RandomSearch params
β βββ paths\_config.py # Canonical paths for artifacts
β
βββ pipeline/
β βββ training\_pipeline.py # Orchestrates ingestion β processing β training
β
βββ src/
β βββ data\_ingestion.py # Download from GCS + train/test split
β βββ data\_preprocessing.py # Encoding, skew handling, SMOTE, feature selection
β βββ model\_training.py # Train, evaluate, save model, MLflow logs
β βββ logger.py # Daily rotating file logs
β βββ custom\_exception.py # Exception wrapper with context
β
βββ templates/index.html # UI
βββ static/style.css # Styles
βββ utils/common\_functions.py # read\_yaml(), load\_data()
flowchart TD
A[Upload CSV to GCS] --> B[Run training_pipeline.py]
B --> C[artifacts/raw: raw/train/test]
C --> D[Preprocess: encode, skew, SMOTE, select features]
D --> E[artifacts/processed: processed_train/test]
E --> F[Train LightGBM + RandomizedSearchCV]
F --> G[Evaluate + MLflow log]
G --> H[Save model: artifacts/models/lgbm_model.pkl]
H --> I[Docker build image]
I --> J[Push to gcr.io/PROJECT/ml-project:latest]
J --> K[Deploy to Cloud Run]
K --> L[Flask serves predictions]
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install --upgrade pip
pip install -e .-
A Google Cloud project ID (e.g.,
hidden-phalanx-464505-h0) -
Enable APIs:
- Cloud Run Admin API
- Cloud Build API (optional for Cloud Build)
- Container Registry API (or Artifact Registry API if you migrate)
- Cloud Storage API
-
Create Cloud Storage bucket (example):
my_buckethotelUpload dataset:Hotel_Reservations.csv
# Example using gcloud
gsutil mb -l us-central1 gs://my_buckethotel/
gsutil cp Hotel_Reservations.csv gs://my_buckethotel/Hotel_Reservations.csvCreate a service account (e.g., mlops-ci@PROJECT_ID.iam.gserviceaccount.com) and grant roles (min set for CI/CD and data access):
roles/storage.objectViewer(read dataset)roles/storage.admin(if you need to manage buckets/objects)roles/run.admin(deploy to Cloud Run)roles/iam.serviceAccountUser(act-as for Cloud Run deploy)roles/storage.objectAdmin(for Container Registry images if needed)- (If using Artifact Registry)
roles/artifactregistry.writer
Download the JSON key for local dev and Jenkins:
mlops-ci-key.json
Set locally:
export GOOGLE_APPLICATION_CREDENTIALS="/absolute/path/to/mlops-ci-key.json"
# Windows PowerShell: $env:GOOGLE_APPLICATION_CREDENTIALS="C:\path\mlops-ci-key.json"Ensure:
data_ingestion:
bucket_name: "my_buckethotel"
bucket_file_name: "Hotel_Reservations.csv"
train_ratio: 0.8python pipeline/training_pipeline.pyOutputs:
artifacts/raw/βraw.csv,train.csv,test.csvartifacts/processed/βprocessed_train.csv,processed_test.csvartifacts/models/βlgbm_model.pklmlruns/(MLflow local artifacts folder)
python application.py
# App listens on port 8080 (http://127.0.0.1:8080)Important fix: In your Dockerfile, change
EXPOSE 5000βEXPOSE 8080(your Flask app runs on 8080).
Build:
docker build -t hotel-ml:latest .Run:
docker run -p 8080:8080 hotel-ml:latest- Docker engine available (Jenkins user in
dockergroup) - Google Cloud SDK (
gcloud) available on PATH or use a Jenkins agent image with Cloud SDK preinstalled - Credentials configured in Jenkins:
| ID | Type | What it is |
|---|---|---|
Hotel_Reservation |
Username/Password or Token | GitHub credentials for the repo |
gcp-key |
Secret file | The service account JSON key (mlops-ci-key.json) |
Your repo also contains
custom_jenkins/Dockerfilewhich installs Docker inside the Jenkins image. You still need to install the Google Cloud SDK in the Jenkins container or use an image that already has it.
Pipeline stages provided:
- Clone the GitHub repository
- Setup venv & install deps
- Build & Push Docker image to
gcr.io/${GCP_PROJECT}/ml-project:latest - Deploy to Cloud Run (region
us-central1)
Tip (training): training currently runs during Docker build (see Dockerfile). This requires credentials inside the Docker build to download from GCS, which is not ideal. Prefer training before build (Jenkins stage), then the Dockerfile just copies the
artifacts/folder.
Option A (Simpler & Secure) β Train in Jenkins, then build image
-
Remove this line from Dockerfile:
RUN python pipeline/training_pipeline.py -
Add training stage in Jenkins before docker build:
stage('Train model') { steps { withCredentials([file(credentialsId: 'gcp-key', variable: 'GOOGLE_APPLICATION_CREDENTIALS')]) { sh ''' . ${VENV_DIR}/bin/activate python pipeline/training_pipeline.py ''' } } }
-
Ensure
artifacts/models/lgbm_model.pklexists in workspace so Docker can copy it.
Option B β Keep training in Docker build (advanced)
-
Use Docker BuildKit secrets to mount the key during build:
DOCKER_BUILDKIT=1 docker build \ --secret id=gcpkey,src=/path/to/mlops-ci-key.json \ -t gcr.io/$GCP_PROJECT/ml-project:latest .
-
Update Dockerfile:
# syntax=docker/dockerfile:1.4 RUN --mount=type=secret,id=gcpkey,target=/tmp/key.json \ export GOOGLE_APPLICATION_CREDENTIALS=/tmp/key.json && \ python pipeline/training_pipeline.py
-
Do not bake the key into the image.
# Expose the port that Flask will run on
EXPOSE 8080Your Jenkinsfile sets:
GCLOUD_PATH = "D:/Softwares/google-cloud-sdk/bin" // Windows pathIf your Jenkins agent runs on Linux, set it to something like:
GCLOUD_PATH = "/usr/bin" // or leave PATH as-is if gcloud is already availableThe Jenkinsfile already runs:
gcloud auth activate-service-account --key-file=${GOOGLE_APPLICATION_CREDENTIALS}
gcloud config set project ${GCP_PROJECT}
gcloud auth configure-docker --quiet
docker build -t gcr.io/${GCP_PROJECT}/ml-project:latest .
docker push gcr.io/${GCP_PROJECT}/ml-project:latestThe Jenkinsfile runs:
gcloud run deploy ml-project \
--image=gcr.io/${GCP_PROJECT}/ml-project:latest \
--platform=managed \
--region=us-central1 \
--allow-unauthenticatedEnsure Cloud Run picks up the correct port (8080).
Cloud Run uses the PORT env variable; Flask is already set to port 8080. Youβre good.
- Categoricals: label encoding (per
config.yaml) - Numericals: skewness correction (
np.log1p) for skew >skewness_threshold - Imbalance: SMOTE on training data
- Feature selection: top N features by RandomForest feature_importances_ (
no_of_features)
- LightGBM classifier with RandomizedSearchCV
- Metrics:
accuracy,precision,recall,f1 - Artifacts & metrics logged to MLflow (local
mlruns/)
- Loads
artifacts/models/lgbm_model.pklat startup - Accepts numeric inputs from form and makes predictions
- Runs on port 8080
# from repo root (will detect ./mlruns)
pip install mlflow
mlflow ui --port 5001
# open http://127.0.0.1:5001-
Local training creates:
artifacts/models/lgbm_model.pklartifacts/processed/processed_test.csv
-
Docker build succeeds and runs:
curl http://localhost:8080/returns the form HTML
-
Jenkins:
gcloudfound in PATH- Credentials
gcp-keyrecognized - Image pushed to
gcr.io/PROJECT/ml-project:latest
-
Cloud Run:
- Service reachable without auth (if
--allow-unauthenticated) - App responds on 8080
- Service reachable without auth (if
-
Port mismatch
- Dockerfile originally
EXPOSE 5000but Flask uses 8080 β set toEXPOSE 8080.
- Dockerfile originally
-
Training inside Docker build
- Requires GCS credentials inside build. Prefer training before build (Jenkins stage), then just copy artifacts.
-
GCP SDK on Jenkins
- The pipeline uses
gcloud. Ensure Cloud SDK is installed on the Jenkins agent or use an image that includes it.
- The pipeline uses
-
Dataset file path
-
config.yamlmust match your bucket + CSV name:my_buckethotel,Hotel_Reservations.csv
-
-
Categorical mappings at inference
- Training uses
LabelEncoderwithout persisting encoders. - The web form hard-codes integers for categories. Ensure these integer codes match the label encoding learned during training.
- Recommended improvement: persist the encoding mappings during training and apply them in
application.pybefore prediction.
- Training uses
- Persist and load encoders/pipelines (sklearn
ColumnTransformer+Pipeline) to guarantee consistent inference - Use Artifact Registry (instead of deprecated Container Registry) for images
- Add unit tests and a Makefile
- Add structured logging + Cloud Logging integration
- Remote MLflow Tracking Server (+ GCS backend store)
- Canary deploys / rollbacks via Cloud Deploy or GitHub Actions
-
Fork and clone
-
Create a branch:
feat/your-feature -
Commit changes, open a PR
-
Ensure:
black/flake8(style)pytest(if tests added)