forked from sergts/botnet-traffic-analysis
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or requestmodernizationModernizing code, dependencies, and structureModernizing code, dependencies, and structure
Description
Problem
The classification model achieves 99.85% accuracy and anomaly detection has 90.5% true negative rate. While this validates the original research, 99%+ accuracy is often a red flag for overfitting or data leakage that we haven't detected yet.
Questions to Answer
-
Is the model actually overfitting?
- Are training and validation accuracies diverging?
- Does performance vary significantly across different data splits?
-
Are the patterns genuinely distinctive?
- Can the model generalize to completely unseen IoT devices?
- Is accuracy maintained with fewer features?
-
Is there hidden data leakage?
- Does the model rely too heavily on 1-2 features?
- Are there temporal patterns that shouldn't be learned?
Proposed Tests
Test 1: Learning Curves Analysis
- Plot training vs validation accuracy over epochs
- Look for divergence indicating overfitting
- Check if validation accuracy plateaus or oscillates
Test 2: Cross-Validation with Multiple Seeds
- Test with 5-10 different random splits
- Calculate mean and std dev of accuracy
- High variance suggests overfitting to specific splits
Test 3: Feature Importance via Ablation
- Remove each feature one at a time
- Measure accuracy drop
- If removing one feature crashes performance, possible data leakage
Test 4: Dropout Regularization Test
- Train with various dropout rates (0.2, 0.3, 0.5)
- If dropout significantly hurts performance, model may not be overfitted
- If dropout helps, original model was overfitted
Test 5: Cross-Device Generalization
- Train on 8 devices, test on 1 held-out device
- Repeat for all 9 devices (leave-one-out)
- True generalization test - different devices have different patterns
Test 6: Feature Perturbation Robustness
- Add small noise to features
- Check if accuracy remains stable
- Overfitted models are fragile to perturbations
Test 7: Reduced Training Data
- Train with 10%, 25%, 50%, 75%, 100% of data
- Plot learning curve
- If 10% achieves 99%, patterns may be too easy/leaked
Implementation
Created analysis/overfitting_analysis.py with Tests 1-4 implemented. Need to:
- Move script to modernization branch
- Run comprehensive testing with all tests
- Document results in new analysis report
- Add cross-device testing (Test 5)
- Add perturbation testing (Test 6)
- Add data scaling tests (Test 7)
Expected Outcomes
If NOT overfitted:
- Train-val gap < 1%
- Cross-validation std < 2%
- All features contribute roughly equally
- Dropout doesn't significantly hurt performance
- Cross-device accuracy > 95%
If overfitted:
- Train-val gap > 5%
- Cross-validation std > 5%
- Removing 1-2 features crashes performance
- Dropout improves generalization
- Cross-device accuracy < 80%
Priority
HIGH - This is critical for scientific integrity and portfolio presentation. Better to know now if there are issues than to find out later.
Branch
Run this analysis on modernization branch with modern TensorFlow, not on archive branches.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestmodernizationModernizing code, dependencies, and structureModernizing code, dependencies, and structure