Skip to content

Latest commit

 

History

History
7 lines (5 loc) · 881 Bytes

File metadata and controls

7 lines (5 loc) · 881 Bytes

Classification Models on A Large Dataset

Sony Jufri

This project is focused on identifying the root cause for the breakdown of heavy Scania trucks, whether they are caused by failure in Air Pressure system (APS) or not. The dataset presents a binary classification problem with many challenges including extreme class imbalance, large data size (over 150 features) and the confidential nature of the dataset, the high proportion of missing values, zeroes and outliers, and the need to minimise Total Cost to Scania, by minimising misclassification. Several machine learning algorithms such as Logistic Regression, Linear Discriminant Analysis (LDA), Support Vector Machines (SVM), Decision Trees, Random Forests and XGBoost were used.

How to open reports

In order to open the project plan and final report, please download the html files and open them on your browser.