Automatic Ticket Classification using NLP

Create a solution that will help in identifying the type of complaint ticket raised by the customers of a multinational bank

Overview

This project aims to classify customer complaints into different topics using machine learning techniques. The dataset contains text data related to customer complaints, and the goal is to predict the category or topic of each complaint. Various models such as Logistic Regression, Decision Tree, and Random Forest have been evaluated for their effectiveness in classifying the complaints accurately.

Dataset Description

The dataset consists of customer complaints categorized into several topics:

Bank account services
Credit Card/Prepaid Card
Mortgages/loans
Theft/Dispute reporting
Others

Each complaint is associated with text data that describes the issue faced by the customer.

Approach

Data Loading & Preprocessing: Conversion of .json data to a dataframe, followed by text cleaning and preprocessing.
Exploratory Data Analysis (EDA): Detailed analysis to understand the data distribution and extract meaningful insights.
Feature Extraction & Topic Modeling: Use of Non-Negative Matrix Factorization (NMF) to identify patterns and categorize complaints.
Model Building: Training multiple supervised learning models including logistic regression, decision tree, random forest, and naive Bayes.
Model Evaluation: Comparison of models based on accuracy and other evaluation metrics to select the best-performing model.

Key Methodology

Data Preprocessing

Text cleaning: Tokenization, removing stopwords, punctuation, and stemming/lemmatization.
Vectorization: Transforming text data into numerical features using TF-IDF vectorization.

Model Building

Logistic Regression
- Model trained using LogisticRegression from scikit-learn.
- Evaluation metrics: Accuracy, Confusion Matrix, Classification Report.
Decision Tree
- Model trained using DecisionTreeClassifier with and without hyperparameter tuning.
- Hyperparameters tuned: max_depth, min_samples_leaf, criterion.
- Evaluation metrics: Accuracy, Confusion Matrix, Classification Report.
Random Forest
- Model trained using RandomForestClassifier with and without hyperparameter tuning.
- Hyperparameters tuned: n_estimators, max_depth, min_samples_leaf, max_features.
- Evaluation metrics: Accuracy, Confusion Matrix, Classification Report.
Naive Bayes (Optional)
- Model trained using MultinomialNB.
- Evaluation metrics: Accuracy, Confusion Matrix, Classification Report.
- Hyperparameter tuning: Alpha parameter for Laplace smoothing.

Model Evaluation

Several machine learning models were evaluated for their effectiveness in classifying complaints:

Model Selection

Logistic Regression performed the best overall with the highest test accuracy of 88.37%, indicating robust performance in classifying customer complaints.
Decision Tree showed improvement after hyperparameter tuning but did not match Logistic Regression's performance.
Random Forest and Naive Bayes models demonstrated lower accuracies compared to Logistic Regression and Decision Tree.

Key Libraries and Frameworks

• Pandas: For data manipulation and analysis.

• NumPy: For numerical computations.

• Scikit-learn: For machine learning algorithms, including logistic regression, decision tree, random forest, and naive Bayes.

• NLTK: For natural language processing tasks such as text preprocessing.

• SpaCy: For advanced NLP tasks and text processing.

• Matplotlib & Seaborn: For data visualization and exploratory data analysis.

• Scikit-learn: For feature extraction, model building, and evaluation.

• Non-Negative Matrix Factorization (NMF): For topic modeling and identifying patterns in text data.

Conclusion

Based on the evaluation results, Logistic Regression is recommended for predicting customer complaint topics due to its superior performance in both training and test sets. Further optimizations could involve:

Exploring additional text preprocessing techniques.
Collecting more diverse complaint data to enhance model generalization.
Experimenting with ensemble methods or deep learning architectures for potentially better performance.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
[v1] Automatic_Ticket_Classification_NLP_By_Anupam.ipynb		[v1] Automatic_Ticket_Classification_NLP_By_Anupam.ipynb
complaints-2021-05-14_08_16.json		complaints-2021-05-14_08_16.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Ticket Classification using NLP

Overview

Dataset Description

Approach

Key Methodology

Data Preprocessing

Model Building

Model Evaluation

Model Selection

Key Libraries and Frameworks

Conclusion

About

Releases

Packages

Languages

dynamicanupam/Classification_of_customer_complaints_using_NLP

Folders and files

Latest commit

History

Repository files navigation

Automatic Ticket Classification using NLP

Overview

Dataset Description

Approach

Key Methodology

Data Preprocessing

Model Building

Model Evaluation

Model Selection

Key Libraries and Frameworks

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages