This repository contains an analysis of hotel bookings data using R. The analysis includes visualizations created using various R libraries to explore different trends, patterns, and distributions related to hotel bookings.
The dataset used in this analysis is hotel_bookings.csv
, which contains information about hotel bookings, such as hotel type, booking cancellation status, lead time, and average daily rates (ADR). The dataset is loaded directly in the R script and cleaned to handle missing values.
The following R libraries are required to run the analysis:
- ggplot2: For creating elegant data visualizations.
- dplyr: For data manipulation.
- treemap: For creating treemap visualizations.
- plotly: For adding interactivity to the plots.
Before starting the analysis, we perform some basic data cleaning:
- Missing values in the
agent
andcompany
columns are replaced with0
.
Below is a list of the visualizations included in this analysis:
- A bar plot that shows the number of canceled and non-canceled bookings for each hotel type (City Hotel or Resort Hotel).
- Plot type: Bar Plot (Grouped)
- Libraries used: ggplot2, dplyr
- A treemap that visualizes the distribution of bookings across different market segments.
- Plot type: Treemap
- Libraries used: treemap, dplyr
- A bar plot that shows the average ADR (Average Daily Rate) for different customer types.
- Plot type: Bar Plot
- Libraries used: ggplot2, dplyr
- A histogram that displays the distribution of lead times (number of days between booking and arrival).
- Plot type: Histogram
- Libraries used: ggplot2
- A bar plot that shows the distribution of booking changes.
- Plot type: Bar Plot
- Libraries used: ggplot2
- A stacked bar plot that displays the proportion of cancellations for each market segment.
- Plot type: Stacked Bar Plot (Proportion)
- Libraries used: ggplot2, dplyr
- A bar plot showing the average lead time for each customer type.
- Plot type: Bar Plot
- Libraries used: ggplot2, dplyr
- A line plot showing the trend of ADR (Average Daily Rate) over time.
- Plot type: Line Plot with Points
- Libraries used: ggplot2, dplyr
-
Ensure you have the required libraries installed. You can install the necessary packages by running:
install.packages(c("ggplot2", "dplyr", "treemap", "plotly"))
-
Load the dataset (
hotel_bookings.csv
) into your working directory. -
Run the R script (
hotel_bookings_analysis.R
) to generate the visualizations. The script will load the data, clean it, and generate the above-mentioned plots.
- R (version 3.6 or higher)
- ggplot2
- dplyr
- treemap
- plotly
This analysis was created by [Mohamed El-Baz]. Feel free to reach out with any questions or suggestions.