D-Guard-NLP is an anti-fraud text classification project that leverages NLP techniques, specifically BERT-based models, for text classification. The main objective of this project is to develop a system capable of identifying fraudulent texts and classifying them into specific categories. The system focuses on the detection and classification of fraud-related phone call texts.
- Added CBLoss & Focal Loss: Incorporated CBLoss and Focal Loss functions. These loss functions help in handling class imbalance and focusing on hard examples during the training process.
- Added Sphere2 & AAM-sofxmax & AM-softmax: In addition to the existing models, we have introduced Sphere2, AAM-softmax, and AM-softmax as alternative architectures for the BERT-based models. These architectures enhance the discriminative power of the models and improve their ability to capture subtle differences in fraudulent text patterns.
- Added Large Margin Fine-tuning: To enhance the model's ability to separate different classes.
The D-Guard-NLP project comprises several key components:
- Data Preprocessing: This step involves cleaning and preprocessing the text data to remove noise and irrelevant information, ensuring high-quality input for the classification models.
- BERT-based Model Development: The project incorporates the powerful BERT model for training and fine-tuning. BERT-based models are implemented to effectively capture semantic meaning and contextual information in the text data.
- Text Classification: The trained BERT models are employed to classify text data into either fraud or non-fraud categories. In cases of fraud-related phone call texts, the system further identifies specific fraud categories.
- Evaluation: The performance of the text classification models is evaluated using appropriate metrics to assess their effectiveness in accurately detecting and classifying fraudulent texts.
- Deployment: Once the models demonstrate satisfactory performance, the D-Guard-NLP system can be deployed in a production environment, enabling real-time fraud detection and classification of incoming texts.