In this kaggle challenge by Airbnb, we are provided with a list of users along with their demographics, web session records, and some summary statistics. We were asked to predict which country a new user's first booking destination will be.
There are 12 possible outcomes of the destination country: 'US', 'FR', 'CA', 'GB', 'ES', 'IT', 'PT', 'NL','DE', 'AU', 'NDF' (no destination found), and 'other'. Please note that 'NDF' is different from 'other' because 'other' means there was a booking, but is to a country not included in the list, while 'NDF' means there wasn't a booking.
- Data Visulalisation and Anlaysis of the entire dataset
- Data preprocessing, which includes using One Hot Encoding to create binary labels for different countries present in the country_destination column. Out of these newly created variables we have used USA for our binary classification.
- Implemetation of different models, These include: -
- Naive Bayes
- K - Nearest Neighbours (KNN)
- Artificial Neural Network (ANN)
- C50
- Random Forest
- Xgboost (Extreme Gradient Descent) for multi-classifictaion.