Skip to content

Validate 99%+ accuracy: Overfitting analysis #23

@iAmGiG

Description

@iAmGiG

Problem

The classification model achieves 99.85% accuracy and anomaly detection has 90.5% true negative rate. While this validates the original research, 99%+ accuracy is often a red flag for overfitting or data leakage that we haven't detected yet.

Questions to Answer

  1. Is the model actually overfitting?

    • Are training and validation accuracies diverging?
    • Does performance vary significantly across different data splits?
  2. Are the patterns genuinely distinctive?

    • Can the model generalize to completely unseen IoT devices?
    • Is accuracy maintained with fewer features?
  3. Is there hidden data leakage?

    • Does the model rely too heavily on 1-2 features?
    • Are there temporal patterns that shouldn't be learned?

Proposed Tests

Test 1: Learning Curves Analysis

  • Plot training vs validation accuracy over epochs
  • Look for divergence indicating overfitting
  • Check if validation accuracy plateaus or oscillates

Test 2: Cross-Validation with Multiple Seeds

  • Test with 5-10 different random splits
  • Calculate mean and std dev of accuracy
  • High variance suggests overfitting to specific splits

Test 3: Feature Importance via Ablation

  • Remove each feature one at a time
  • Measure accuracy drop
  • If removing one feature crashes performance, possible data leakage

Test 4: Dropout Regularization Test

  • Train with various dropout rates (0.2, 0.3, 0.5)
  • If dropout significantly hurts performance, model may not be overfitted
  • If dropout helps, original model was overfitted

Test 5: Cross-Device Generalization

  • Train on 8 devices, test on 1 held-out device
  • Repeat for all 9 devices (leave-one-out)
  • True generalization test - different devices have different patterns

Test 6: Feature Perturbation Robustness

  • Add small noise to features
  • Check if accuracy remains stable
  • Overfitted models are fragile to perturbations

Test 7: Reduced Training Data

  • Train with 10%, 25%, 50%, 75%, 100% of data
  • Plot learning curve
  • If 10% achieves 99%, patterns may be too easy/leaked

Implementation

Created analysis/overfitting_analysis.py with Tests 1-4 implemented. Need to:

  1. Move script to modernization branch
  2. Run comprehensive testing with all tests
  3. Document results in new analysis report
  4. Add cross-device testing (Test 5)
  5. Add perturbation testing (Test 6)
  6. Add data scaling tests (Test 7)

Expected Outcomes

If NOT overfitted:

  • Train-val gap < 1%
  • Cross-validation std < 2%
  • All features contribute roughly equally
  • Dropout doesn't significantly hurt performance
  • Cross-device accuracy > 95%

If overfitted:

  • Train-val gap > 5%
  • Cross-validation std > 5%
  • Removing 1-2 features crashes performance
  • Dropout improves generalization
  • Cross-device accuracy < 80%

Priority

HIGH - This is critical for scientific integrity and portfolio presentation. Better to know now if there are issues than to find out later.

Branch

Run this analysis on modernization branch with modern TensorFlow, not on archive branches.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestmodernizationModernizing code, dependencies, and structure

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions