This repo is the work summary of Corporación Favorita Grocery Sales Forecasting.
Under report
folder, contain source code for reporting.
report/README.Rmd
: R markdown to provide data insight of Corporación Favorita Grocery Sales data.
Further report detail under report folder.
Under process
folder, contain source code for data processing.
model/data_integration.R
: R Script to integratedtrain.csv
andtest.csv
withstores.csv
,items.csv
andtransactions.csv
.
Further processing detail under process folder.
Under model
folder, contain source code for data modeling.
Further modeling detail under model folder.
Data source from Corporación Favorita, access the data from this competition Corporación Favorita Grocery Sales Forecasting in Kaggle.
That is 7 data files:
-
train.csv
: Training data, includes the targetunit_sales
bydate
,store_nbr
, anditem_nbr
and a uniqueid
to label rows,onpromotion
column tells whether thatitem_nbr
was on promotion for a specifieddate
andstore_nbr
. -
test.csv
: Test data, with thedate
,store_nbr
,item_nbr
combinations that are to be predicted, along with theonpromotion
information. -
stores.csv
: Store metadata, includingcity
,state
,type
andcluster
(grouping of similar stores). -
items.csv
: Item metadata, includingfamily
,class
, andperishable
(have a score weight of1.25
; otherwise, the weight is1.0
). -
transactions.csv
: The count of sales transactions for eachdate
,store_nbr
combination. Only included for the training data timeframe. -
oil.csv
: Daily oil price, includes values during both the train and test data timeframe. -
holidays_events.csv
: Holidays and Events, with metadata.