Validate 99%+ accuracy: Overfitting analysis

## Problem

The classification model achieves 99.85% accuracy and anomaly detection has 90.5% true negative rate. While this validates the original research, 99%+ accuracy is often a red flag for overfitting or data leakage that we haven't detected yet.

## Questions to Answer

1. **Is the model actually overfitting?**
   - Are training and validation accuracies diverging?
   - Does performance vary significantly across different data splits?

2. **Are the patterns genuinely distinctive?**
   - Can the model generalize to completely unseen IoT devices?
   - Is accuracy maintained with fewer features?

3. **Is there hidden data leakage?**
   - Does the model rely too heavily on 1-2 features?
   - Are there temporal patterns that shouldn't be learned?

## Proposed Tests

### Test 1: Learning Curves Analysis
- Plot training vs validation accuracy over epochs
- Look for divergence indicating overfitting
- Check if validation accuracy plateaus or oscillates

### Test 2: Cross-Validation with Multiple Seeds
- Test with 5-10 different random splits
- Calculate mean and std dev of accuracy
- High variance suggests overfitting to specific splits

### Test 3: Feature Importance via Ablation
- Remove each feature one at a time
- Measure accuracy drop
- If removing one feature crashes performance, possible data leakage

### Test 4: Dropout Regularization Test
- Train with various dropout rates (0.2, 0.3, 0.5)
- If dropout significantly hurts performance, model may not be overfitted
- If dropout helps, original model was overfitted

### Test 5: Cross-Device Generalization
- Train on 8 devices, test on 1 held-out device
- Repeat for all 9 devices (leave-one-out)
- True generalization test - different devices have different patterns

### Test 6: Feature Perturbation Robustness
- Add small noise to features
- Check if accuracy remains stable
- Overfitted models are fragile to perturbations

### Test 7: Reduced Training Data
- Train with 10%, 25%, 50%, 75%, 100% of data
- Plot learning curve
- If 10% achieves 99%, patterns may be too easy/leaked

## Implementation

Created `analysis/overfitting_analysis.py` with Tests 1-4 implemented. Need to:

1. Move script to modernization branch
2. Run comprehensive testing with all tests
3. Document results in new analysis report
4. Add cross-device testing (Test 5)
5. Add perturbation testing (Test 6)  
6. Add data scaling tests (Test 7)

## Expected Outcomes

**If NOT overfitted:**
- Train-val gap < 1%
- Cross-validation std < 2%
- All features contribute roughly equally
- Dropout doesn't significantly hurt performance
- Cross-device accuracy > 95%

**If overfitted:**
- Train-val gap > 5%
- Cross-validation std > 5%
- Removing 1-2 features crashes performance
- Dropout improves generalization
- Cross-device accuracy < 80%

## Priority

**HIGH** - This is critical for scientific integrity and portfolio presentation. Better to know now if there are issues than to find out later.

## Branch

Run this analysis on **modernization branch** with modern TensorFlow, not on archive branches.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate 99%+ accuracy: Overfitting analysis #23

Problem

Questions to Answer

Proposed Tests

Test 1: Learning Curves Analysis

Test 2: Cross-Validation with Multiple Seeds

Test 3: Feature Importance via Ablation

Test 4: Dropout Regularization Test

Test 5: Cross-Device Generalization

Test 6: Feature Perturbation Robustness

Test 7: Reduced Training Data

Implementation

Expected Outcomes

Priority

Branch

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Validate 99%+ accuracy: Overfitting analysis #23

Description

Problem

Questions to Answer

Proposed Tests

Test 1: Learning Curves Analysis

Test 2: Cross-Validation with Multiple Seeds

Test 3: Feature Importance via Ablation

Test 4: Dropout Regularization Test

Test 5: Cross-Device Generalization

Test 6: Feature Perturbation Robustness

Test 7: Reduced Training Data

Implementation

Expected Outcomes

Priority

Branch

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions