A lightweight FastAPI service that serves a trained Random Forest model to predict road segment speed. This repository contains the API entrypoint, model artifact path, and preprocessing utilities used to prepare inputs for the model.
Project snapshot
- Purpose: Provide predicted average speed values for map junctions / intersections using a Random Forest model.
- Main app:
main.py(FastAPI) - Model:
data/rf_speed_model.pkl(loaded withjoblib) - Preprocessing:
data/preprocessing_speed.py(containspreprocess_data_speedandALL_INTERSECTION_NAMES)
Repository structure
main.py- FastAPI application exposing/and/predict/endpoints.requirements.txt- pinned dependencies used by the project.build.sh- simple script to upgrade pip and install requirements.data/- data utilities and model artifact:preprocessing_speed.py- preprocessing helper(s) and constants.rf_speed_model.pkl- serialized random forest model (expected path).
Quickstart (development)
Prerequisites: Python 3.10+ (use the version that matches your environment and the wheels in requirements.txt). Recommended to use a virtual environment.
- Create and activate a virtual environment (bash):
python -m venv .venv
source .venv/Scripts/activate- Install dependencies (you can use the included
build.shon bash):
# Option A: run the build script (recommended for consistent binary wheels)
./build.sh
# Option B: direct install
python -m pip install --upgrade pip setuptools wheel
python -m pip install -r requirements.txt- Run the API server locally with
uvicorn:
uvicorn main:app --host 0.0.0.0 --port 8000- Health check:
curl http://127.0.0.1:8000/API: /predict/
Endpoint: POST /predict/
Request payload (JSON) follows the Pydantic model in main.py:
{
"model": "randomforest",
"coordinates": { "lat": 12.9716, "lng": 77.5946 },
"predictionTime": "Next Hour",
"event": null
}Example curl request (replace coordinates as needed):
curl -s -X POST "http://127.0.0.1:8000/predict/" \
-H "Content-Type: application/json" \
-d '{"model":"randomforest","coordinates":{"lat":12.9716,"lng":77.5946},"predictionTime":"Next Hour"}'Response format (example):
{
"predictions": {
"congestion": { "level": 0.0, "label": "Unknown" },
"avgSpeed": 45.5
},
"alternativeRoute": null
}Notes:
avgSpeedis the number used by the consuming frontend to display predicted speed.- The current
main.pyimplementation maps coordinates to aJunctionNameusing a hardcoded placeholder; see Developer Notes below.
Developer notes & TODOs
-
Coordinate mapping:
main.pycurrently hardcodes'Intersection_Trinity Circle'forJunctionName. You should replace the placeholder mapping with a geospatial nearest-neighbor lookup that maps(lat, lng)to one of the known junction names exported bydata/preprocessing_speed.py(e.g.,ALL_INTERSECTION_NAMES). Consider usingscipy.spatial.cKDTreeorgeopy.distancefor this. -
Model artifact: Ensure
data/rf_speed_model.pklexists and matches the preprocessing pipeline ofpreprocess_data_speed. If the model was trained with a specific set of feature columns, the runtime preprocessing must produce the same feature set (order and names) or you'll receive afeature_names mismatcherror from scikit-learn. -
Logging:
main.pyusesloggingat INFO level. Examine logs for detailed error messages when predictions fail. -
Exception handling:
main.pydifferentiates between preprocessingValueErrorand other exceptions; expand this as needed for better error codes in the API.
Troubleshooting
-
Model fails to load:
- Confirm
data/rf_speed_model.pklexists and is a valid joblib pickle. - Make sure the Python environment uses compatible scikit-learn and joblib versions (see
requirements.txt).
- Confirm
-
Feature mismatch or shape errors during
predict:- Validate
preprocess_data_speedreturns the same columns as used when training the model. - Print
processed_df.columns.tolist()to inspect column names;main.pyalready logs this.
- Validate
-
Dependency issues on Windows:
- If installation of
numpy/scipyor other binary packages fails, try installing prebuilt wheels or use the--prefer-binaryoption (the includedbuild.shdoes this).
- If installation of
Testing tips
- Add unit tests for
preprocess_data_speedthat check expected columns for a variety of sampleJunctionNameinputs and datetimes. - Add integration tests that start a test FastAPI client and POST to
/predict/usingfastapi.testclient.
Deployment
- For production, consider running the app with a process manager (gunicorn + uvicorn workers) and behind a reverse-proxy. Example (gunicorn + uvicorn workers):
gunicorn -k uvicorn.workers.UvicornWorker main:app -b 0.0.0.0:8000 -w 4- Ensure the model file is available in the deployment image or volume.
Next steps / improvements
- Implement geospatial nearest-neighbour mapping from coordinates ->
JunctionName. - Add model versioning and an API field to request/inspect model metadata.
- Add CI checks and tests, and optionally a tiny OpenAPI-based frontend or Swagger examples.