A common problem in human resources analytics is devising a means to retain the high performing employees as losing them has a negative effect on the company performance as well as their growth. Apart from the lower productivity from a replacement, the newer employee needs to familiarise themselves with the company operations, the process involved in searching for a new employee, interviewing and training (both formal and informal) the newly employed person makes employees’ turnover undesirable. The cost of replacing an employee can vary based on the employee’s skill level; however, the cost is usually very high for a highly skilled professional in comparison to an entry-level job. This project presents insights on reasons behind emloyees turnover as well as predictive models on employee turnover. The model employed include: logistic regression, random forest and gradient boosting.
This repo contains the underlisted files:
- Employee_Attrition_HR_Analytics.ipynb : Contains scripts on the the exploration analysis, testing 3 models and makes prediction
- Employee_Attrition_HR_Analytics_Model.ipynb: presents a clean model that automate the process in the script above
- Predicting Employees Attrition.pptx: Power point presentation
- Classification_report.txt : classification report from the best model (i.e with the highest precision)
- Python
- Markdown==3.1.1
- matplotlib==3.1.1
- pandas==0.25.1
- scikit-learn==0.21.3
- seaborn==0.9.0
Employee turnover remains a key challenge in HR analytics. The cost of replacing a highly skilled professional in terms of searching for a replacement, interviewing, and training the replacement is higher than working around retaining them. Here a machine learning model using logistic regression, random forest classifier as well as gradient boosting were used to predict employees’ turnover. The top 5 predictive features derived from the model are:
- Overtime
- Stock option level
- Total working years
- Job satisfaction
- Job involvement
Furthermore, the probabilities for each employee staying or leaving were presented and it is a key deliverable that can inform devising a retention scheme for the employees that are likely to turn over.
- Thanks to Kaggle for providing free access to the dataset
For more information about the model outcomes check this medium story while insights from the exploratory data analsis can be found here
- John, S., 2002. Job-to-job turnover and job-to-non-employment movement. A case study investigation. Personnel Review, 31(6), pp.710–721.
- Ongori, H. (2007), A review of the literature on employee turnover. African Journal of Business Management, 1(3), 49–54.
- Handling sampling imbalance
- Synthetic minority oversampling technique (smote)
- Beyond accuracy, precision and recall