Skip to content

Latest commit

 

History

History
36 lines (27 loc) · 1.83 KB

File metadata and controls

36 lines (27 loc) · 1.83 KB

Multivariate Analysis on World Happiness Report- Unsupervised Learning

This project focuses on application of Multivariate Analysis techniques on World Happiness Report(2015) to create insights on data, discover structures and patterns in high-dimensional data, and discover the latent variables behind a set of variables.

Multivariate techniques used:

  1. Data Visulization & Data Cleaning
  2. Dimension Reduction( PCA, CCA & MDS)
  3. Cluster Analysis (Hierarchical, K-Means, Model-Based)
  4. Exploratory and Confirmatory Factor Analysis

Findings:

Based on our analysis, we can assume the data is a multinormal distribution. The happiness report can be explained by two principle components that represents overall reality of our data. Model-based clustering is the most precise clustering technique. Model-based clustering produced three groups: North America/Western Europe/Australia, Latin America/North Africa/Northeast Asia, and South Asia/Africa. Confirmatory Factor Analysis confirmed two latent factors that impact happiness: household and societal happiness. In other words, happiness might be explained by factors inside the household and factors outside the household.

As a recommendation, further research might better explain the data to avoid an ecological fallacy to better explain individual happiness while controlling for education and environmental factors. It is very difficult to use aggregate data with countries as the unit of analysis to explain individual happiness.

Data Source: Kaggle.com

Dataset:

  1. World Happiness Report(2015)

Instruction to Run Code:

  1. Download Dataset: Happiness_Report_Data.csv
  2. Value of variable “path” should be changed to location where all dataset is downloaded.
  3. Install and Load Packages: corrplot, CCA, mclust, maptools, lavaan and semplot
  4. Run the "r_code" on R studio.