CSE343-ML-Project - Suicide Ideation Prediction from Social Media Conversations

Course Project for CSE-343 (Machine Learning) - Monsoon 2023

Project Overview

In the face of growing concerns over mental health and the alarming rise in suicide rates, our project aims to detect and address suicide ideation by analyzing social media conversations. Utilizing advanced machine learning techniques, we've developed a robust model capable of identifying individuals at heightened risk based on their online activities. Our solution includes a real-world application through a Reddit Bot, designed to flag posts with potential suicide ideation risks.

Team Members

Medha Hira - medha21265@iiitd.ac.in
Arnav Goel - arnav21519@iiitd.ac.in
Siddharth Rajput - siddharth21102@iiitd.ac.in

Introduction

The project addresses the critical need for effective suicide prevention strategies by leveraging social media as a platform for early detection of suicide ideation. With a 36% increase in suicide rates from 2000 to 2021, our predictive model seeks to provide timely intervention, potentially saving lives by identifying at-risk individuals through their digital footprints. Recognizing the pivotal role social media plays in modern communication, our system is designed to detect suicide ideation through analysis of Reddit posts. Our approach utilizes a comprehensive dataset from the r/SuicideWatch subreddit, applying machine learning algorithms to identify early signs of suicidal thoughts.

Dataset and Preprocessing

We employed the University of Maryland Reddit Suicidality Dataset, conducting rigorous data preprocessing to clean and prepare text data for analysis. Techniques included removal of non-ASCII characters, URLs, usernames, and punctuation, as well as stopwords and lowercasing for standardization.

Methodology

Our methodology encompasses a diverse range of machine learning models, including Logistic Regression, SVM, Naive Bayes, Decision Trees, and Random Forest, among others. We also explored ensemble methods and neural networks for enhanced predictive performance. Evaluation metrics such as accuracy, precision, and recall were employed to assess model effectiveness.

Results and Analysis

Our findings indicate that models like LDA, Logistic Regression, and the SVM classifier perform best, with notable improvements using Word2Vec embeddings. Ensemble methods and a Multilayer Perceptron (MLP) classifier also showed promising results, demonstrating the efficacy of our approach in detecting suicide ideation with high accuracy.

Results for Machine Learning Models:

Results for Ensemble Method and a MLP Classifier:

Model Deployment

Reddit Bot Demo: YouTube Link

The culmination of our project is the deployment of a Reddit Bot, integrating our most effective machine learning model to actively scan and flag posts for suicide ideation on Reddit. This bot aims to bridge the gap between at-risk individuals and timely mental health support.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
Data		Data
Models		Models
Plots and Figures		Plots and Figures
RedditBot		RedditBot
embeddings		embeddings
.gitignore		.gitignore
Group9_CSE343_EndsemReport.pdf		Group9_CSE343_EndsemReport.pdf
Group9_CSE343_Presentation.pptx		Group9_CSE343_Presentation.pptx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSE343-ML-Project - Suicide Ideation Prediction from Social Media Conversations

Project Overview

Team Members

Introduction

Dataset and Preprocessing

Methodology

Results and Analysis

Model Deployment

About

Releases

Packages

Languages

Rajputsiddharth/CSE343-ML-Project

Folders and files

Latest commit

History

Repository files navigation

CSE343-ML-Project - Suicide Ideation Prediction from Social Media Conversations

Project Overview

Team Members

Introduction

Dataset and Preprocessing

Methodology

Results and Analysis

Model Deployment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages