Stackoverflow_survey_data

Installations The project is on predicting the salary of programmers based on the StackOverFlow survey done in 2017.

The libraries used are:

pandas

numpy

seaborn

random

sklearn

matplotlib

missingno

Project Motivation The project is part of Udacity data Science nano-degree and is directed towards answering the following three questions:

In Which countries people earned more than the rest.
What is the relation between salary and eduction in these counries.
How to forecast the salary based on the survey data considering categrical data like country and eduction level?

File description

The analysis will follow the CRISP-DM steps:

Business Understanding
Data Understanding (access and explore)
Data cleaning and preparation of some categotrical columns into dummies to go into a linear model
Linear modeling
Evaluation
Deployment

Takeaways and summary fo teh analysis

1- The majority of respondents to the survey came from 10 countries; United States, India, United Kingdom, Germany, Canada, France, Poland, Australia, Russian Federation, and Spain. Programers in the United States earned the most while India and Russia earned the lowest.

2- The salary of a programmer is not related to education level as many earned more through a university degree. This could highlight the role of online education in building competent skillsets.

3- Our linear regression model that relied mainly on education, country and company size did not manage to predict salaried above $50,000 with good accuracy. More feature engineering work is required to enhance model performance.

Author: Mustafa Adel Amer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!