-
Use Stack Overflow Annual Developer Survey to train model. Download the dataset at here
-
For Salary prediction purpose, use 9 features
RemoteWork
,EdLevel
,YearsCodePro
,DevType
,LanguageHaveWorkedWith
,PlatformHaveWorkedWith
,ToolsTechHaveWorkedWith
,Country
,Age
to predictConvertedCompYearly
as Salary. -
Data preprocessing: preprocess.py
-
Data visualization: visual.ipynb
-
Ultilize Ensemble Method to train model.
-
Experiment with Decision Tree, RandomForest, Bagging, AdaBoost, GradientBoosting.
-
Detail training and testing process in main.ipynb
-
Apply GridSearchCV found best hyperparameter for
Gradient Boosting
.
Metrics | Values |
---|---|
RMSE | 37068.786 |
MAE | 25121.021 |
R2-score | 0.619 |
- Install dependencies
pip install -r requirements.txt
- Download dataset & extract zip file
wget <link-to-data>
unzip stack-overflow-developer-survey-2022.zip -d data
- Run streamlit web app
I build a streamlit app to easily view the data and predict the salary.
Run code:
streamlit run web.py