Evaluating The Effectiveness of Capsule Neural Network in Toxic Comment Classification using Pre-trained BERT Embeddings
Md Habibur Rahman Sifat1, Noor Hossain Nuri Sabab2, Tashin Ahmed3
1The Hong Kong Polytechnic University, 2Department of CSE, UIU, 3Smart Studios Malta
Cite from here (IEEE Xplore) - https://ieeexplore.ieee.org/document/10322429
Large language models (LLMs) have attracted considerable interest in the fields of natural language understanding (NLU) and natural language generation (NLG) since their introduction. In contrast, the legacy of Capsule Neural Networks (CapsNet) appears to have been largely forgotten amidst all of this excitement. This project’s objective is to reignite interest in CapsNet by reopening the previously closed studies and conducting a new research into CapsNet’s potential. We present a study where CapsNet is used to classify toxic text by leveraging pre-trained BERT embeddings (bert-baseuncased) on a large multilingual dataset. In this experiment, CapsNet was tasked with categorizing toxic text. By comparing the performance of CapsNet to that of other architectures, such as DistilBERT, Vanilla Neural Networks (VNN), and Convolutional Neural Networks (CNN), we were able to achieve an accuracy of 90.44%. This result highlights the benefits of CapsNet over text data and suggests new ways to enhance their performance so that it is comparable to DistilBERT and other reduced architectures.
Presented CapsNet architecture. The model takes word IDs as input and employs pre-trained BERT embeddings to extract context from text. A spatial dropout layer is applied to the BERT embeddings to prevent overfitting. The capsule layer receives the modified embeddings and learns to represent the input text as a collection of capsules, where each capsule represents a particular characteristic or attribute of the text. The capsule outputs are then fed into dense layers in order to learn higher-level text representations. The final dense layer generates the output prediction, which indicates the classification or label of the input text.
A straightforward general structure of the experiment that have been performed on the text data.
Metrics to assess the readabilty or ease of understanding texts. A: Dale-Chall readability; B: Automated readability; C: Flesch reading ease. D: Non English language percentages; E: English and non-English language count; F: Toxic class counts.
Sentiments scores. A: Neutrality; B: Compound; C: Positivity; D: Negativity. Comparative analysis against toxicity. (E: Neutrality, F: Compound, G: Positivity, H: Negativity) vs Toxicity
Basic architecture of all the models (VNN, CNN, CapsNet, DistilBERT) that tested till now. Common layer for every architecture are the BERT embeddings and classifier.
@INPROCEEDINGS{10322429,
author={Rahman Sifat, Habibur and Nuri Sabab, Noor Hossain and Ahmed, Tashin},
booktitle={TENCON 2023 - 2023 IEEE Region 10 Conference (TENCON)},
title={Evaluating the Effectiveness of Capsule Neural Network in Toxic Comment Classification Using Pre-Trained BERT Embeddings},
year={2023},
volume={},
number={},
pages={42-46},
doi={10.1109/TENCON58879.2023.10322429}}