Skip to content

Early academic risk detection using student engagement data (OULAD) with time-aware machine learning features.

Notifications You must be signed in to change notification settings

zayedalmheiri/early-risk-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Early Academic Risk Detection Using Student Engagement Data

Overview

Universities often identify struggling students only after midterms or course failure, when intervention is already too late. This project builds an early-warning system that predicts academic risk weeks in advance using only passive engagement data from learning management systems (LMS).

The goal is not to penalize students, but to enable earlier, supportive intervention by academic advisors.


Problem Statement

Students frequently fail or withdraw from courses due to disengagement that goes unnoticed in the early weeks of a semester. Traditional indicators such as midterm grades arrive too late to prevent academic damage.

This project asks: How early can academic risk be detected using behavioral signals alone?


Dataset

  • Open University Learning Analytics Dataset (OULAD)
  • Over 30,000 students across multiple course presentations
  • Weekly interaction logs, anonymized and privacy-safe

Only passive engagement signals were used:

  • Weekly click activity
  • Number of active days per week

No grades, demographics, or surveys were included.


Methodology

  1. Aggregated LMS interaction data into weekly engagement features per student
  2. Constructed time-aware features without future data leakage
  3. Trained a logistic regression model using balanced class weights
  4. Evaluated performance at multiple early-semester cutoffs (Week 2, 4, 6)

Key features:

  • Total engagement
  • Average engagement
  • Engagement trend over time
  • Active days per week

Results

Early academic risk was detectable surprisingly early:

Week ROC AUC Recall (At-Risk)
Week 2 0.65 ~60%
Week 4 0.67 ~61%
Week 6 0.71 ~61%

Week 4 emerged as the best tradeoff between prediction accuracy and intervention lead time.


Visualizations

The following figures illustrate the accuracy–timeliness tradeoff:

  • ROC AUC vs Week ROC AUC vs Week
  • Recall of at-risk students vs Week Recall vs Week

These plots show diminishing returns when delaying intervention beyond Week 4.


Ethical Considerations

  • No demographic or sensitive personal data was used
  • Predictions are intended to support, not punish
  • Advisors remain in the loop for all decisions
  • False positives are treated as opportunities for check-ins, not sanctions

Key Takeaways

  • Academic risk can be detected as early as Week 2 using behavioral data alone
  • Waiting longer improves accuracy marginally but reduces intervention time
  • Simple, interpretable models can be effective for early warning systems

Future Work

  • Compare with non-linear models (e.g., gradient boosting)
  • Simulate intervention strategies and outcomes
  • Extend to real-time advisor dashboards

Technologies Used

  • Python
  • pandas, scikit-learn
  • matplotlib

How to Run

  1. Clone the repository
  2. Install dependencies:
    pip install -r requirements.txt
  3. Preprocess the data and build features:
    python src/preprocess.py
    python src/build_features.py
    
  4. Train the baseline model:
    python src/train_baseline.py
    
  5. Compare early detection performance:
    python src/compare_weeks.py
    
  6. Generate performance plots:
    python src/plot_results.py

About

Early academic risk detection using student engagement data (OULAD) with time-aware machine learning features.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages