Project 2 - Group C
Presentation Date: January 15, 2022
Prepared by: Andrew Crawford, Margee Lancaster Rachel Pierce, Jinhyeong Park
Predicting FinTech Bootcamp Graduate Salaries
- Andrew Crawford
- Margee Lancaster
- Rachel Pierce
- Jinhyeong Park
Our objective was to research and learn what salaries are possible with skills obtained in this FinTech Bootcamp. We used the Monster India API to retrieve data regarding job descriptions, skills, and salaries to determine the following:
- What jobs require skills from this bootcamp?
- What are the potential jobs and salaries for graduates with these skills?
- Can we predict an accurate salary range based on these skills?
Our hypothesis is that the more key skills you possess, the higher the salary.
As part of the U of MN FinTech Bootcamp, we have obtained the necessary skills to automate and improve financial services using cutting-edge technology. Skills gained at the conclusion of Bootcamp include the following:
We selected key skills from this course to use in our analysis.
We used data obtained from the Monster India API.
We selected the following skills obtained during this FinTech Bootcamp to use in our model:
- Database
- Python
- API
- Algorithm
- Cloud
- Forecast
- Big data
- Dashboard
- Project Management
- Finance
We also used like-terms in our model to capture these skills using various terminology.
Our process consisted of data exploration, data preparation, and data cleanup.
- Data Exploration: Google seaches, API searches, Monster India API, FinTech Bootcamp Curriculum
- Data Preparation: Searching through variables in Spyder, selecting key data, narrowing down jobs, determining key skills
- Data Cleanup: Dropping unnecessary information, searching the data to find jobs that include at least one key skill
Data Issues:
- Most salaries were in Indian rupees, not USD, so conversion was necessary.
- Some salaries may be skewed because there are some outliers from different countries/currencies in the data obtained from the API.
- There were likely job postings/salaries posted multiple times.
- The job posting "skills" section did not include the information we needed, so we had to rely on the job description to find job skills.
- Determining like-words for key skills was subjective, which may not capture all.
- As shown in the graph below, the most broad salary range includes at least one skill.
We ran a variety of machine learning algorithms to determine the best model to use:
- Linear Regression
- Extra Trees
- Lasso
- Random Forest
- Ridge
- Stochastic Gradient Descent
Techniques used to evaluate the model performance included running various machine learning algorithms to determine the best model. Overall, we determined the Linear Regression model was the best fit for our project because it resulted in an R2 value closest to 1 and the lowest Mean Squared Error and Root Mean Squared Error scores the most times when re-running the model.
Our model was not able to accurately predict salaries due to the limited data set. Predicted salaries appear to be more of an average across the population. Data implications, as discussed above, resulted in less reliable results, but could be improved with more data and a more relevant API.
Based on our analysis, we were not able to determine salary ranges for FinTech Bootcamp graduates. However, the job salaries that included skills we have learned in this bootcamp were up to $80,000 (converted from Indian rupees, of course!)
Difficulties/Implications:
Overall, the API used was not the best and resulted in various data implications as discussed previously. We could have spent $200 for access to the US Monster API, but we were cheap and perhaps will consider it for another future project based on the frustration of our data outcome!
Potential Next Steps: Continue searching for an API with US-based jobs in USD, or pay the fee for the Monster US API.
- Jupyter Notebook Creation
- Data Exploration, Preparation and Cleanup
- Running various machine learning models
- Training/testing and model evaluation
- Analysis and Conclusion
- Readme preparation
- html presentation
Please note that most salaries are in rupees and not U.S. dollars. We converted the salaries to US dollars after data cleanup, but keep in mind these salaries are based in India and may be lower than those in the US.
Please refer to the following workpapers in GitHub:
- This ReadMe file (Includes a summary of the project)
- Project2_Presentation.html file (Class Presentation)
- project2code.ipynb file (Includes all code details)
- starter_code.py file (This is the Spyder file we used to dig into the data)
- monster_india_latest_jobs_free_dataset.json file (Includes dataset file)
- Images folder (Includes various images included in our project)