forked from sergts/botnet-traffic-analysis
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
archiveRelated to archiving old research codeRelated to archiving old research codetechnical-debtTechnical debt and code qualityTechnical debt and code quality
Description
Problem
Both training and testing scripts use the same random split with random_state=17, which could lead to testing on training data if not managed carefully.
Code:
x_train, x_opt, x_test = np.split(df.sample(frac=1, random_state=17), ...)Appears in:
train_og.py:26-27test.py:37-38
Concern
If test.py is run on the same combined dataset used during training, it will test on data the model has already seen during training.
Recommendation
Use proper train/test split methodology:
- Separate hold-out test set
- Time-based split for network traffic
- Or different random seeds
Priority
MODERATE - Could affect validity of test results
Metadata
Metadata
Assignees
Labels
archiveRelated to archiving old research codeRelated to archiving old research codetechnical-debtTechnical debt and code qualityTechnical debt and code quality