Universities often identify struggling students only after midterms or course failure, when intervention is already too late. This project builds an early-warning system that predicts academic risk weeks in advance using only passive engagement data from learning management systems (LMS).
The goal is not to penalize students, but to enable earlier, supportive intervention by academic advisors.
Students frequently fail or withdraw from courses due to disengagement that goes unnoticed in the early weeks of a semester. Traditional indicators such as midterm grades arrive too late to prevent academic damage.
This project asks: How early can academic risk be detected using behavioral signals alone?
- Open University Learning Analytics Dataset (OULAD)
- Over 30,000 students across multiple course presentations
- Weekly interaction logs, anonymized and privacy-safe
Only passive engagement signals were used:
- Weekly click activity
- Number of active days per week
No grades, demographics, or surveys were included.
- Aggregated LMS interaction data into weekly engagement features per student
- Constructed time-aware features without future data leakage
- Trained a logistic regression model using balanced class weights
- Evaluated performance at multiple early-semester cutoffs (Week 2, 4, 6)
Key features:
- Total engagement
- Average engagement
- Engagement trend over time
- Active days per week
Early academic risk was detectable surprisingly early:
| Week | ROC AUC | Recall (At-Risk) |
|---|---|---|
| Week 2 | 0.65 | ~60% |
| Week 4 | 0.67 | ~61% |
| Week 6 | 0.71 | ~61% |
Week 4 emerged as the best tradeoff between prediction accuracy and intervention lead time.
The following figures illustrate the accuracy–timeliness tradeoff:
These plots show diminishing returns when delaying intervention beyond Week 4.
- No demographic or sensitive personal data was used
- Predictions are intended to support, not punish
- Advisors remain in the loop for all decisions
- False positives are treated as opportunities for check-ins, not sanctions
- Academic risk can be detected as early as Week 2 using behavioral data alone
- Waiting longer improves accuracy marginally but reduces intervention time
- Simple, interpretable models can be effective for early warning systems
- Compare with non-linear models (e.g., gradient boosting)
- Simulate intervention strategies and outcomes
- Extend to real-time advisor dashboards
- Python
- pandas, scikit-learn
- matplotlib
- Clone the repository
- Install dependencies:
pip install -r requirements.txt
- Preprocess the data and build features:
python src/preprocess.py python src/build_features.py
- Train the baseline model:
python src/train_baseline.py
- Compare early detection performance:
python src/compare_weeks.py
- Generate performance plots:
python src/plot_results.py

