Skip to content

Latest commit

 

History

History
105 lines (85 loc) · 5.26 KB

README.md

File metadata and controls

105 lines (85 loc) · 5.26 KB

EDA

https://www.kaggle.com/khairulislam/unsw-nb15-eda

Data preprocessing

https://www.kaggle.com/khairulislam/data-preprocessing

Feature Importance

https://www.kaggle.com/khairulislam/unsw-nb15-feature-importance Filename: importance.csv

Model selection

Using ten-fold cross validation on popular machine learning models to find the best one. https://www.kaggle.com/khairulislam/ten-fold-cross-validation-with-different-models

Model Accuracy F1
LogisticRegression 0.9354286717347297 0.9542239342896803
GradientBoostingClassifier 0.9458426691403649 0.9611062449808958
DecisionTreeClassifier 0.9498805218353074 0.9631526109252336
RandomForestClassifier 0.9607678800687012 0.9713978736211478
LighGBM 0.961811555768474 0.9721410918894631

Hyper-tuning

Experiments

Links :

Train performance

Performance of the model on the same dataset on which it was trained.

Ten-fold cross validation

Using Stratified kfold cross validation to validate model performance.

Train

Test

Combined (train + test)

Train test validation

Training the model on train dataset. Then testing on separate test dataset.

Papers that weren't compared

Citation

@INPROCEEDINGS{9315049,
  author={Islam, Md. Khairul and Hridi, Prithula and Hossain, Md. Shohrab and Narman, Husnu S.},
  booktitle={2020 30th International Telecommunication Networks and Applications Conference (ITNAC)}, 
  title={Network Anomaly Detection Using LightGBM: A Gradient Boosting Classifier}, 
  year={2020},
  volume={},
  number={},
  pages={1-7},
  doi={10.1109/ITNAC50341.2020.9315049}}