The project is to forecast the "Sales" column for the test set.
The Dataset are provided with historical sales data for 1,115 Rossmann stores, some stores in the dataset were temporarily closed for refurbishment
Rossmann Store Sales is from Kaggle:
Dataset Features
Id, StoreID, Sales, Customer, Open, StateHoliday, SchoolHoliday, StoreType, Assortment, CompetitionDistance, CompetitionOpenSince[Month/Year], Promo, Promo2, Promo2Since[Year/Week], PromoInterval
is the label we want to predict
String to numerical
Label Encode
Imputate missing value
feature mode
KNN model
Random Forest
- Final test result is 0.122