The project is to forecast the "Sales" column for the test set.
The Dataset are provided with historical sales data for 1,115 Rossmann stores, some stores in the dataset were temporarily closed for refurbishment
-
Rossmann Store Sales is from Kaggle:
-
Dataset Features
Id, StoreID, Sales, Customer, Open, StateHoliday, SchoolHoliday, StoreType, Assortment, CompetitionDistance, CompetitionOpenSince[Month/Year], Promo, Promo2, Promo2Since[Year/Week], PromoInterval
-
The
Sales
is the label we want to predict
-
String to numerical
-
Label Encode
-
Imputate missing value
-
feature mode
-
KNN model
-
-
Random Forest
-
Xgboost
-
LightGBM
- Final test result is 0.122