This data set contains about 120k lines of data about hotel booking information.
- Predict if a hotel booking will be canceled using existing variables.
This is a data dictionary, containing notes for each variable.
This is the jupyter notebook that contains all the code and analysis.
This is the main data set used in this project.
Random forest does a better job at predicting cancellation, compared to logistic regression and decision tree, with 0.86 precision, 0.77 recall and 0.87 accuracy.