With constant innovation in the field of technology and data science, it is not surprising that education institutions are interested in understanding the performances’ of their students. The best indicator to measure a student’s performance is through their grades, but institutions are more interested in the factors that affect these scores. These institutions are looking to develop tools to enhance the quality of education and ensure a high success rate amongst students, to be facilitated by business intelligence and data mining techniques.
Through this project we will assess the right technique to be applied to our problem dataset and optimize this model. In this research we are able to model a student’s final grade in a particular subject and link it directly to certain relevant features that influence the outcome. We use the C5.0 decision tree technique to model the data.
The dataset is a public dataset available on the UCI Machine Learning repository (Archive.ics.uci.edu., 2018). The dataset contains 1043 instances of student data for the two courses – Mathematics and Portuguese. Our target variable to be analyzed is the categorical variable final grade – G3.
- Data Types
- Correlation amongst features
- Rank of Important Features (Random Forest)
- Recursive Feature Elimination Algorithm
- C5.0 Decision Tree Model
Decision tree model developed using the C5.0 algorithm was effective in correctly classifying 80.84% instances while only considering 5 features of the dataset. This intelligence will help the school to take measures in order to ensure that the atrisk students get attention and are able to cope up.