About the Dataset:
- Hotel booking demand datasets
- A data set has information about bookings for two hotels, a city hotel and a resort hotel. The information includes when the booking was made, how long the stay was, how many people were staying, and if parking was available.
- Source: Dataset taken from kaggle website
The data set has one file that compares information about bookings for a city hotel and a resort hotel.
Attributes:
Data columns (total 32 columns):
No. Column
0. hotel
- is_canceled
- lead_time
- arrival_date_year
- arrival_date_month
- arrival_date_week_number
- arrival_date_day_of_month
- stays_in_weekend_nights
- stays_in_week_nights
- adults
- children
- babies
- meal
- country
- market_segment
- distribution_channel
- is_repeated_guest
- previous_cancellations
- previous_bookings_not_canceled
- reserved_room_type
- assigned_room_type
- booking_changes
- deposit_type
- agent
- company
- days_in_waiting_list
- customer_type
- adr
- required_car_parking_spaces
- total_of_special_requests
- reservation_status
- reservation_status_date
List of Questions to help project goals
- How Market Segment Of Booking Affecting Cancellation ?
- How long do people stay at the hotels?
- Which are the most busy months?
- What Are The Other Factors that affecting cancellation of booking ?
- Which countries do customers come from?
- What types of customers are most common in each hotel?
What machine learning algorithm that has the highest accuracy when it comes predicting hotel booking cancellations ?
- Data Cleaning :
Imputing missing value with mean Dropping rows with abnormal values: 0 Total guests / adults in the booking
- Exploratory Data Analysis :
Feature Engineering Aggregating Columns - the agg function refers to the aggregation operation that is being performed on the data. Visualization Insight & Conclusion
- Feature Selection for machine learning process
Label encoding for certain columns that needs to be encoded
- Model Building
- Train Test Split
- Using pipeline for model building * scaling for numerical features
-
- Creating base model with few algorithm
* Logistic Regression,
* K Neighbors Classifier,
* Decision Tree Classifier,
* Random Forest classifier
- Creating base model with few algorithm
- Checking evaluation matrix
- Comparing the model with the best accuracy score
- 0.1
- Initial Release
* Normalizing for numerical features
@Kaaviasudhan