This repository contains the solution for the Kaggle Playground Series "Predicting Student Test Scores". The objective was to predict student standardized test scores based on demographic data and study habits.
By integrating Educational Psychology theories (e.g., Yerkes-Dodson Law) into feature engineering and utilizing a Hybrid Ensemble Strategy (Gradient Boosting + Deep Learning), this solution achieved a top-tier ranking in the competition.
- Ranking: 315 / 3459 (Top 9.1%)
- Evaluation Metric: RMSE (Root Mean Squared Error)
- Final Score: 8.557
Unlike standard statistical feature generation, this project focused on "Psychometric Feature Construction" to capture latent behavioral patterns:
- Cognitive Efficiency: Calculated as the ratio of study hours to break frequency, modeling the efficiency of learning sessions.
- Study Intensity: A weighted composite feature derived from study duration and session frequency.
- Resource Interaction: Interaction terms combining parental involvement levels with access to educational resources.
To balance bias and variance, a heterogeneous ensemble approach was implemented:
- Component A (Structured Learning): XGBoost Regressor optimized for tabular data interactions.
- Component B (Representation Learning): TabM / SENet (Squeeze-and-Excitation Network) to capture non-linear and latent data representations.
- Fusion Strategy: A weighted averaging technique (Linear Blending) was applied to the predictions of Component A and Component B to produce the final output.
Kaggle_Student_Score_Project/
├── Code/
│ └── student-scores-tabm-xgb-advanced-fe.ipynb # Main notebook containing FE and Modeling logic
├── Data_Output/
│ ├── My_Psych_XGB.csv # Predictions from the XGBoost model
│ ├── Deep_Learning.csv # Predictions from the Neural Network model
│ └── Final_Fusion_Conservative.csv # Final submission file (Ensemble result)
└── RankingTop 9%.PNG # Leaderboard proof