This project is focused on identifying the root cause for the breakdown of heavy Scania trucks, whether they are caused by failure in Air Pressure system (APS) or not. The dataset presents a binary classification problem with many challenges including extreme class imbalance, large data size (over 150 features) and the confidential nature of the dataset, the high proportion of missing values, zeroes and outliers, and the need to minimise Total Cost to Scania, by minimising misclassification. Several machine learning algorithms such as Logistic Regression, Linear Discriminant Analysis (LDA), Support Vector Machines (SVM), Decision Trees, Random Forests and XGBoost were used.
In order to open the project plan and final report, please download the html files and open them on your browser.