This repository contains code for I-HOPE — Interpretable Hierarchical mOdel for Personalized mEntal Health Prediction. I-HOPE is a mental health prediction system that employs a two-stage hierarchical model to map raw behavioral features to mental health status (PHQ-4 categories). It does so by leveraging five defined behavioral categories, referred to as interaction labels. This work utilizes the CES dataset(https://www.kaggle.com/datasets/subigyanepal/college-experience-dataset).
The project follows a two-stage hierarchical model as shown below:
- Stage 1: Feature Mapping to Interaction Labels
35 chosen raw behavioral features are transformed into five interaction labels:- Leisure
- Me Time
- Phone Time
- Sleep
- Social Time
This is achieved by data cleaning, feature engineering, clustering (using KMeans), and personalized feature importance analysis (using Random Forests).
| # | Feature Name | Relevant Labels[0: Leisure, 1: MeTime, 2: Phone, 3: Sleep , 4: SocialInt] |
|---|---|---|
| 1 | act_on_bike_ep_0 | [0,1] |
| 2 | act_on_foot_ep_0 | [0,1,4] |
| 3 | act_running_ep_0 | [0,1] |
| 4 | act_still_ep_0 | [1,3] |
| 5 | act_walking_ep_0 | [0,1,4] |
| 6 | audio_convo_duration_ep_0 | [0,2,4] |
| 7 | (call_in_num + call_out_num) / (call_in_duration + call_out_duration) | [0,2] |
| 8 | loc_food_audio_voice | [4] |
| 9 | loc_home_audio_voice | [1,2,3] |
| 10 | loc_social_audio_voice | [0,2,4] |
| 11 | loc_other_dorm_audio_voice | [0,4] |
| 12 | loc_self_dorm_audio_voice | [1,2] |
| 13 | loc_study_audio_voice | [1,4] |
| 14 | loc_food_convo_duration | [4] |
| 15 | loc_home_convo_duration | [1,2,3] |
| 16 | loc_other_dorm_convo_duration | [0,4] |
| 17 | loc_social_convo_duration | [0,4] |
| 18 | loc_study_convo_duration | [1,4] |
| 19 | loc_self_dorm_convo_duration | [1,2] |
| 20 | loc_home_dur | [1,3] |
| 21 | loc_leisure_dur | [0,4] |
| 22 | loc_other_dorm_dur | [4] |
| 23 | loc_self_dorm_dur | [1,3] |
| 24 | loc_social_dur | [4] |
| 25 | loc_study_dur | [1,4] |
| 26 | loc_workout_dur | [0,1] |
| 27 | loc_home_unlock_num / loc_home_unlock_duration | [0,1,2] |
| 28 | loc_other_dorm_unlock_num / loc_other_dorm_unlock_duration | [0,4,2] |
| 29 | loc_self_dorm_unlock_num / loc_self_dorm_unlock_duration | [1,2] |
| 30 | loc_social_unlock_num / loc_social_unlock_duration | [2,4] |
| 31 | loc_study_unlock_num / loc_study_unlock_duration | [1,2,3] |
| 32 | sleep_duration | [3] |
| 33 | sleep_end - sleep_start | [3] |
| 34 | unlock_num_ep_0 / unlock_duration_ep_0 | [2] |
| 35 | sleep_heathkit_dur | [3] |
- Stage 2: Prediction Using Interaction Labels
The computed interaction label scores are then used as inputs to build personalized neural network models to predict the PHQ-4 mental health categories.
- Github_code.ipynb -- Notebook with the code
- Csvs used -
- Initial dataset: Got after running Section 2(a) of https://github.com/bill-wei-xuan/Unlocking-MentalHealth/blob/main/notebooks/DigitalWellbeing.ipynb
- Manual labels : CHASE_Labeled.csv
- All the original data can be downloaded from https://www.kaggle.com/datasets/subigyanepal/college-experience-dataset