This repository contains the solution for automating the customer support ticket system for a financial company through natural language processing (NLP) techniques.
Customer complaints are vital indicators of a financial company's product and service shortcomings. This project aims to automate the classification of customer complaints into relevant categories, enabling swift issue resolution and enhancing customer satisfaction.
The primary objective is to build a model capable of classifying customer complaints based on the products/services they pertain to. By employing non-negative matrix factorization (NMF) for topic modeling, the project aims to identify patterns and recurring words within tickets, allowing for efficient segregation into five categories:
- Credit card / Prepaid card
- Bank account services
- Theft/Dispute reporting
- Mortgages/loans
- Others
The dataset comprises 78,313 customer complaints in .json format, encompassing various features. The initial step involves converting this dataset into a DataFrame for further processing.
- Data Loading: Loading the provided dataset.
- Text Preprocessing: Cleaning and preparing text data for analysis.
- Exploratory Data Analysis (EDA): Understanding the dataset's characteristics.
- Feature Extraction: Extracting relevant features from the text data.
- Topic Modelling: Employing NMF for topic modeling to classify complaints.
- Model Building using Supervised Learning: Utilizing labeled data for training.
- Model Training and Evaluation: Training and evaluating models, including logistic regression, naive Bayes, decision tree, and random forest.
- Model Inference: Using the trained model for classifying new customer support tickets.
Upon finalizing complaint clusters/categories, a labeled dataset was created for supervised learning model training. Evaluation metrics have been used to determine the best-performing model among logistic regression, naive Bayes, decision tree, and random forest.