Skip to content

AI vs Human Text Classification with K-Fold Cross-Validation

Compare
Choose a tag to compare
@pallasite99 pallasite99 released this 09 Dec 11:42
· 49 commits to main since this release
978d3c6

Version 0.1.0: AI vs Human Text Classification with K-Fold Cross-Validation

Features

  1. K-Fold Cross-Validation Implementation:

    • Added support for 5-Fold Cross-Validation to validate the performance of the Logistic Regression model.
    • Evaluates metrics like Accuracy, Precision, Recall, and ROC AUC for each fold.
    • Provides mean and standard deviation for all metrics to ensure model robustness.
  2. Logistic Regression Optimization:

    • Trains the model with TF-IDF vectorized text features.
    • Final model trained on the full dataset after cross-validation.

Visualizations

  • Histogram of Decision Function Scores:

    • A histogram visualizing decision scores and the classification decision boundary.
  • Top Features for AI vs Human Classification:

    • Bar charts showing the top 10 positive (AI-indicative) and negative (Human-indicative) features.
  • Precision-Recall Curve:

    • A plot demonstrating the trade-off between precision and recall for Logistic Regression.

Code Optimizations

  • Streamlined preprocessing functions to clean text efficiently without external dependencies.
  • Optimized hyperparameters for Logistic Regression for better performance during cross-validation.

Getting Started

  1. Clone the repository:
    git clone <repository-url>