This project conducted a comprehensive multivariate analysis of H1B visa application data to uncover significant patterns and predictive insights.
The data has columns that include details related to the applicant's application date, whether the H1B was approved, the date when it was approved, the location of the employer (state, zip code), and other specific details related to the job role such as title of the role, category it falls into. The dataset has been taken from Kaggle
The approach employed for this analysis involves the following steps:
Exploratory Data Analysis
- Created box plots, scatter plots, and histograms to visualize the distribution and relationships between key variables.Principal Component Analysis
-Reduced the dimensionality of the dataset while retaining significant variance. This was achieved by determining the optimal number of PCs.Cluster Analysis
- Grouped job titles into distinct clusters based on their characteristics and roles.Factor Analysis
- Grouped variables into underlying factors to simplify complex relationships.Predictive Modeling
-Multiple Regression
- Analyzed the influence of continuous variables on application status.Logistic Regression
- Modeled the probability of application approval based on categorical and continuous predictors.Linear Discriminant Analysis (LDA)
Provided the best predictive performance with an accuracy of 83.33%.
Assess the status of applications for each visa class, incorporating additional factors and larger datasets.
The cleaned data, Rmd (R), and HTML files are available to play with!