Early Academic Risk Detection Using Student Engagement Data

Overview

Universities often identify struggling students only after midterms or course failure, when intervention is already too late. This project builds an early-warning system that predicts academic risk weeks in advance using only passive engagement data from learning management systems (LMS).

The goal is not to penalize students, but to enable earlier, supportive intervention by academic advisors.

Problem Statement

Students frequently fail or withdraw from courses due to disengagement that goes unnoticed in the early weeks of a semester. Traditional indicators such as midterm grades arrive too late to prevent academic damage.

This project asks: How early can academic risk be detected using behavioral signals alone?

Dataset

Open University Learning Analytics Dataset (OULAD)
Over 30,000 students across multiple course presentations
Weekly interaction logs, anonymized and privacy-safe

Only passive engagement signals were used:

Weekly click activity
Number of active days per week

No grades, demographics, or surveys were included.

Methodology

Aggregated LMS interaction data into weekly engagement features per student
Constructed time-aware features without future data leakage
Trained a logistic regression model using balanced class weights
Evaluated performance at multiple early-semester cutoffs (Week 2, 4, 6)

Key features:

Total engagement
Average engagement
Engagement trend over time
Active days per week

Results

Early academic risk was detectable surprisingly early:

Week	ROC AUC	Recall (At-Risk)
Week 2	0.65	~60%
Week 4	0.67	~61%
Week 6	0.71	~61%

Week 4 emerged as the best tradeoff between prediction accuracy and intervention lead time.

Visualizations

The following figures illustrate the accuracy–timeliness tradeoff:

ROC AUC vs Week
Recall of at-risk students vs Week

These plots show diminishing returns when delaying intervention beyond Week 4.

Ethical Considerations

No demographic or sensitive personal data was used
Predictions are intended to support, not punish
Advisors remain in the loop for all decisions
False positives are treated as opportunities for check-ins, not sanctions

Key Takeaways

Academic risk can be detected as early as Week 2 using behavioral data alone
Waiting longer improves accuracy marginally but reduces intervention time
Simple, interpretable models can be effective for early warning systems

Future Work

Compare with non-linear models (e.g., gradient boosting)
Simulate intervention strategies and outcomes
Extend to real-time advisor dashboards

Technologies Used

Python
pandas, scikit-learn
matplotlib

How to Run

Clone the repository
Install dependencies:
```
pip install -r requirements.txt
```

Preprocess the data and build features:

python src/preprocess.py
python src/build_features.py

Train the baseline model:
```
python src/train_baseline.py
```
Compare early detection performance:
```
python src/compare_weeks.py
```
Generate performance plots:
```
python src/plot_results.py
```

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Early Academic Risk Detection Using Student Engagement Data

Overview

Problem Statement

Dataset

Methodology

Results

Visualizations

Ethical Considerations

Key Takeaways

Future Work

Technologies Used

How to Run

About

Uh oh!

Releases

Packages

Languages

zayedalmheiri/early-risk-detection

Folders and files

Latest commit

History

Repository files navigation

Early Academic Risk Detection Using Student Engagement Data

Overview

Problem Statement

Dataset

Methodology

Results

Visualizations

Ethical Considerations

Key Takeaways

Future Work

Technologies Used

How to Run

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages