Skip to content

AnnaCalderon/Cluster_Sentiment_Reviews_AA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Cluster and Sentiment Analyses on Customer Reviews

  • Extracted customer sentiment from reviews using text mining
  • Applied cluster analysis to identify segments with differentiating needs and characteristics
  • Provided insights regarding which services need the most improvement

Dataset and Resources Used

R Version: 1.4.1717
Packages:
Data: Raw Data

Data Preparation

  1. The raw dataset includes 17 variables
  2. A subset of 8 relevant variables was selected
    • Review (free text)
    • Overall Rating (numerical)
    • Food and Beverage Rating (numerical)
    • Cabin Service Rating (numerical)
    • Seat Comfort Rating (numerical)
    • Entretainment Rating (numerical)
    • Value for Money Rating (numerical)
    • Ground Service Rating (numerical)
  3. Imputation of Missing Values

Initial Sentiment Analysis Using the Overall Rating

  • Exploration of the proportion of positive and negative words for each rating (1-10)
  • Three general-purpose lexicons were evaluated to against ratings: bing, ncr, and AFINN
  • The bing lexicon, as it presented a significant correlation with the overall rating
  • Exploration also indicated a prevalence of passenger dissatisfaction

Proportion of Sentiment by Rating

Dimension Reduction with Factor Analysis

  • Through factor analysis, we found that using two components would explain 74% of the six rated categories

Factor Analysis

Cluster Analyses

  • To determine the optimal number of clusters multiple methods
  • Results from Analyses suggest a three-cluster solution

Hierarchical Clustering

Clusters 1

K-Means Clustering

Clusters 2

Sentiment by Clusters

  • The highest prevalence of positive reviews was in Cluster 3
  • The highest prevalence of negative reviews was in Cluster 1

Sentiment by Clusters

Clusters Profiling

Profile Segments by Needs

Cluster 1:

  • Represent the most unsatisfied passengers overall
  • Reviews suggest that negative ratings are largely driven by:
    • Poor customer service
    • Flight delays
    • Overall poor experience across pre-flight(PA2), in-flight(PA1), and post-flight categories(PA2)

Cluster 2:

  • More likely to have negative pre-flight experiences, such as flight delays or cancellations
  • In comparison to Cluster 2, they are slightly more satisfied with the in-flight experience

Cluster 3:

  • Represent the most satisfied passengers overall
  • Provided relatively positive text reviews and ratings across all measures.
  • Reviews are largely positive and primarily attributed to the in-flight experience (PA1)

Profiling Across Factors
Profile by needs - factors
Profiling Across Individual Variables
Profile - K-means
Common Words on Reviews
Seg 1 and 3

Profile Segments by Characteristics

Cluster 1:

  • Least likely to travel in first or business class
  • The least likely to have experienced a flight delay/cancellation or layover

Cluster 2:

  • The most likely to have experienced a flight delay, consistent with common words
  • The most likely to have a connecting flight

Cluster 3:

  • Higher percentage of business or first class passengers
  • They appear less likely to travel for leisure purposes, compared to Cluster 1 and 2

Profiling on Observable Variables

About

Exploring Consumer Reviews to Improve Passenger Experience

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages