AI vs Human Text Classification with K-Fold Cross-Validation

pallasite99 released this 09 Dec 11:42

· 49 commits to main since this release

978d3c6

Version 0.1.0: AI vs Human Text Classification with K-Fold Cross-Validation

Features

K-Fold Cross-Validation Implementation:
- Added support for 5-Fold Cross-Validation to validate the performance of the Logistic Regression model.
- Evaluates metrics like Accuracy, Precision, Recall, and ROC AUC for each fold.
- Provides mean and standard deviation for all metrics to ensure model robustness.
Logistic Regression Optimization:
- Trains the model with TF-IDF vectorized text features.
- Final model trained on the full dataset after cross-validation.

Visualizations

Histogram of Decision Function Scores:
- A histogram visualizing decision scores and the classification decision boundary.
Top Features for AI vs Human Classification:
- Bar charts showing the top 10 positive (AI-indicative) and negative (Human-indicative) features.
Precision-Recall Curve:
- A plot demonstrating the trade-off between precision and recall for Logistic Regression.

Code Optimizations

Streamlined preprocessing functions to clean text efficiently without external dependencies.
Optimized hyperparameters for Logistic Regression for better performance during cross-validation.

Getting Started

Clone the repository:
```
git clone <repository-url>
```

Assets 2