forked from sergts/botnet-traffic-analysis
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Goal
Generate synthetic IoT traffic data for controlled testing and validation.
Motivation
Why synthetic data?
- Controlled experiments: Test specific attack patterns
- Class balancing: Generate minority class samples
- Edge case testing: Create rare but important scenarios
- Privacy: Shareable data without privacy concerns
- Research contribution: Synthetic IoT traffic generation is valuable
Approach
Phase 1: Tool Selection
Options (2025):
- SDV (Synthetic Data Vault): Best for tabular data
- CTGAN: GAN-based tabular synthesis
- SMOTE: Classic oversampling (simple, works well)
Recommendation: Start with SDV and CTGAN
Phase 2: Experiments
Experiment 1: Data Augmentation (synthetic + real)
Experiment 2: Generalization (train synthetic, test real)
Experiment 3: Edge Case Testing
Experiment 4: Privacy-preserving dataset publication
Deliverables
- Evaluate synthetic data tools (SDV, CTGAN, SMOTE)
- Implement data generation pipeline
- Validate synthetic data quality (statistical tests)
- Run augmentation experiments
- Run generalization experiments
- Document results
- Optional: Publish synthetic dataset
Timeline
6 weeks total (stretch goal)
Priority
STRETCH GOAL - High research value, not critical for core modernization
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request