Using SARIMAX for Time Series Forecasting on Seasonal Data that is influenced by Exogenous variables
Data Provided: Traffic Data (refer train.csv
for more)
Data description
Columns |
Description |
date_time |
Date, time, and hour of the data that is collected in the local IST time |
is_holiday |
Categorical Indian national holidays combined with regional holidays |
air_pollution_index |
Air Quality Index (10-300) |
humidity |
Numeric humidity in Celcius |
wind_speed |
Numeric wind speed in miles per hour |
wind_direction |
Cardinal wind direction (0-360 degree) |
visibility_in_miles |
Visibility of distance in miles |
dew_point |
Numeric dew point in Celcius |
temperature |
Numeric average temperature in Kelvin |
rain_p_h |
Numeric amount in mm of rain that occurred in the hour |
snow_p_h |
Numeric amount in mm of snow that occurred in the hour |
clouds_all |
Numeric percentage of cloud cover |
weather_type |
Categorical short textual description of the current weather |
weather_description |
Categorical longer textual description of the current weather |
traffic_volume |
Numeric hourly traffic volume bound in a specific direction |
traffic_volume
attribute has to be forecasted on the basis of the time series data provided, taking the exogenous variables into account
Approach used: SARIMAX (Seasonal Autoregressive Integrated Moving Average with eXogeneous variables)
Reason: The data provided is seasonal, and it is a time series data with multiple exogeneous variables influencing the result. Hence, the optimal statistical model that can be applied to this task is SARIMAX
Main Modules Used:
statsmodel
package in Python