Lightweight Flask microservice serving a pre-trained CatBoost regression model for traffic volume & congestion prediction.
🧐 About the "Smart Context" Model
- Purpose: Predicts traffic volume and congestion at a fine-grained road/intersection level using spatial coordinates + time context.
- Strength: The model was trained with CatBoost, which natively handles categorical features (notably road/intersection names). This makes the model robust with minimal manual feature engineering for roads and intersection names.
- Input signals: Current time-of-day, day-of-week and the nearest known road/intersection (mapped from Lat/Lng).
Pure AI principle (Important): The service relies strictly on the CatBoost model's learned mapping from data — there are no hardcoded event multipliers or handcrafted scaling factors to simulate events. All event-like impacts must emerge from the training data and the model. The only deterministic post-processing is conversion from a daily raw model output into an hourly estimate and a small calibrated mapping to human-friendly congestion labels (see pipeline below).
⚙️ How It Works (The Logic Pipeline)
This service implements a concise 3-step prediction pipeline inside app.py (function: predict()):
- Spatial Lookup — mapping coordinates to a Road Name (Nearest Neighbor) 📍
- The service keeps a small knowledge base
KNOWN_LOCATIONS(inapp.py) mapping human-friendly road/intersection names to representative coordinates. - When a request provides latitude & longitude,
get_real_road_name(lat, lng)computes squared Euclidean distance to each known coordinate and returns the closest name. - This returned name is used as the categorical feature
Road_Intersection_Name— CatBoost natively consumes this categorical label without manual encoding.
- Baseline Prediction — model inference with CatBoost 🧠
- The server assembles a single-row DataFrame with features:
hour,day_of_week,is_weekend, andRoad_Intersection_Name. - It loads the CatBoost model file
data/catboost_traffic_model.cbmand callsmodel.predict(...)producing a raw daily volume estimate (the model was trained to predict total daily vehicle volume for the given road/time context). - Raw prediction is clipped to be >= 0.
- Smart Calibration — hourly mapping & human-friendly congestion levels ⚖️
- The code converts the model's daily estimate into an hourly estimate by dividing by 12.0:
hourly_volume = max(0, raw_pred) / 12.0
- Then it maps
hourly_volumeinto congestion buckets using calibrated thresholds (derived from logged data):- Default: Low —
level = 0.2,label = "Low",avgSpeed = 55 - Moderate:
hourly_volume > 1600->level = 0.5,label = "Moderate",avgSpeed = 40 - High:
hourly_volume > 2100->level = 0.8,label = "High",avgSpeed = 25 - Severe:
hourly_volume > 2600->level = 0.95,label = "Severe",avgSpeed = 10
- Default: Low —
- The service also returns a simple
featureImportanceobject in the response for transparency (labels:Road,Time of Day,Event Impact— example weights are used:[0.6, 0.4, 0]).
Notes:
- The mapping thresholds and averages are deterministic post-processing chosen to convert statistical model outputs into operationally meaningful categories — they do not inject event multipliers.
- The
Event Impactimportance slot is present for product-level UX but is unused in the core model (0 weight in the shipped model metadata).
🔌 API Documentation
Base URL: http://<host>:8003/ (app listens on 0.0.0.0:8003 by default)
- POST
/predict/— predict congestion & hourly volume- Content-Type:
application/json - Body (required): JSON with
coordinatesobject containinglatandlng.
- Content-Type:
Sample Request (JSON):
{
"coordinates": {
"lat": 12.9552,
"lng": 77.6984
},
"event": "concert_nearby"
}eventis accepted in the request for compatibility with upstream gateways, but note: the current microservice does not apply ad-hoc event multipliers — event signals must be reflected in the model's training data to impact predictions (Pure AI principle).
Sample Response (JSON):
{
"predictions": {
"congestion": { "level": 0.5, "label": "Moderate" },
"avgSpeed": 40,
"predictedVolume": 1700
},
"featureImportance": {
"labels": ["Road", "Time of Day", "Event Impact"],
"data": [0.6, 0.4, 0]
},
"mappedLocation": "Marathahalli Bridge"
}Fields explained:
predictions.congestion.level: numeric severity [0..1]predictions.congestion.label: human-friendly labelpredictions.avgSpeed: estimated average speed (km/h) for congestion labelpredictions.predictedVolume: rounded hourly vehicle count estimatefeatureImportance: a lightweight explanation objectmappedLocation: the canonical road/intersection name selected fromKNOWN_LOCATIONS
📐 Architecture Notes
- This repository is a focused microservice providing a single purpose: mapping (coordinates + time) → congestion prediction.
- Intended Usage: called by an upstream Node.js Gateway (or API Gateway) which is responsible for authentication, routing, higher-level orchestration, and event enrichment. The microservice is NOT intended to be called directly by end users or browsers.
- Runtime: lightweight Flask application that loads a CatBoost model from
data/catboost_traffic_model.cbmon startup.
🛠️ Setup & Installation
- Clone repository
git clone https://github.com/PrajwalShetty-114/CatBoost-Model.git
cd CatBoost-Model- Python virtual environment (recommended
.venv) — create & activate
# Create virtualenv (Unix/Windows powershell/cmd all supported)
python -m venv .venv
# Bash / Git Bash (Windows):
source .venv/Scripts/activate
# Or (Unix/WSL):
# source .venv/bin/activate
# PowerShell on Windows:
# .\.venv\Scripts\Activate.ps1- Git LFS (important) — model binary is large (
.cbm) and should be stored with Git LFS
# Install Git LFS (system step, once per machine):
# https://git-lfs.github.com/ -> follow your OS installer
git lfs install
git lfs track "data/*.cbm"
git add .gitattributes
# After this any .cbm you `git add` will be committed to LFS- Install Python dependencies
pip install -r requirements.txt- Verify model file is present
ls -l data/catboost_traffic_model.cbm- Run the development server
python app.py
# or (production with gunicorn):
# gunicorn --bind 0.0.0.0:8003 app:app🐳 Docker & Deployment (Render)
This project includes a Dockerfile (simple Docker instructions below). The service is suitable for container deployment (for example, Render or other container hosts).
Build & Run locally with Docker:
# Build image (from repo root)
docker build -t smart-context-predictor:latest .
# Run container forwarding port 8003
docker run -p 8003:8003 --env FLASK_ENV=production smart-context-predictor:latestRender deployment notes:
- Use a private or public repo containing the
Dockerfileanddata/catboost_traffic_model.cbm(tracked with Git LFS). - Configure Render to build the Docker image and expose port
8003.
Example Dockerfile (reference):
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app
EXPOSE 8003
CMD ["python", "app.py"]🔒 Security & Operational Notes
- The model file is binary and must be handled via Git LFS to avoid repository bloat.
- The microservice intentionally performs minimal input validation — in production place a gateway in front (Node.js Gateway) to sanitize input, rate-limit, and authenticate requests.
- Monitor memory usage on startup: loading CatBoost models can be memory-heavy; provision accordingly.
🧪 Testing & Validation
- Use the sample POST request above to validate and confirm responses.
- If the model file is missing, the service prints an error on startup and some endpoints may return 500.
🧾 Development Tips
- To change or extend
KNOWN_LOCATIONS, editapp.pyand keep the coordinate names consistent with the model's training labels. - If you need to account for events or temporary incidents not present in training data, consider: 1) implementing an upstream event-enrichment service that provides historical examples for the model retraining, or 2) retraining the CatBoost model with event flags so that the model learns event impacts directly (preferred — keeps service Pure AI).
Credits & Contact
- Built as part of the Traffic Flow Prediction tooling.
- Maintainer: Prajwal Shetty — see repository for contact & issues.
Made with ❤️ · Smart Context Predictor · CatBoost powered
