https://www.kaggle.com/khairulislam/unsw-nb15-eda
https://www.kaggle.com/khairulislam/data-preprocessing
https://www.kaggle.com/khairulislam/unsw-nb15-feature-importance Filename: importance.csv
Using ten-fold cross validation on popular machine learning models to find the best one. https://www.kaggle.com/khairulislam/ten-fold-cross-validation-with-different-models
Model | Accuracy | F1 |
---|---|---|
LogisticRegression | 0.9354286717347297 | 0.9542239342896803 |
GradientBoostingClassifier | 0.9458426691403649 | 0.9611062449808958 |
DecisionTreeClassifier | 0.9498805218353074 | 0.9631526109252336 |
RandomForestClassifier | 0.9607678800687012 | 0.9713978736211478 |
LighGBM | 0.961811555768474 | 0.9721410918894631 |
Links :
- https://www.kaggle.com/khairulislam/unsw-nb15-lightgbm
- https://www.kaggle.com/khairulislam/unsw-nb15-witih-randomforest
Performance of the model on the same dataset on which it was trained.
Using Stratified kfold cross validation to validate model performance.
Training the model on train dataset. Then testing on separate test dataset.
-
Deep Learning Approach for Intelligent Intrusion Detection System : DNN (4 layers) acc 0.765, pre 0.946, rec 0.695, f1 .801 . RF acc .903, pre .988, rec 0.867, f1 0.924.
-
Feasibility of Supervised Machine Learning for Cloud Security: Logistic Regression acc 89.26%, TP 93.7% TN 95.7% at prediction threshold 0.5. Increasing prediction threshold to 0.7-0.8 TP 97%, but TN 80%.
-
A Two-Stage Classifier Approach using RepTree Algorithm for Network Intrusion Detection: Used a decision tree named REPTree (Reduced Error Pruning Tree) to get accuracy 88.95%.
-
Network Intrusion Detection in Big Dataset Using Spark: Using Dimension Reduction (LDA) on the dataset using spark then the performance of REP Tree acc 93.56%, FPR 2.3%, prec 83.3%, rec 83.2%, roc 90.5%.
-
Deep Learning Approach for Cyberattack Detection: The two datasets are randomly splitted using the same rule. 80% of data was used to fit DFEL and get the pre-trained model. The remaining 20% of the data was randomly split into 70%/30% as training/testing data for classifiers.
-
Building an Effective Intrusion Detection SystemUsing the Modified Density Peak ClusteringAlgorithm and Deep Belief Networks: Multiclass classification
-
A New Generalized Deep Learning Framework Combining Sparse Autoencoder and Taguchi Method for Novel Data Classification and Processing: only worked with DDoS datasete.
-
An Empirical Evaluation of Deep Learningfor Network Anomaly Detection: Mentioned 100% result of all metrics (acc, pre, rec, f1) for NSL-KDD, KYOTO-HONEYPOT, UNSW-NB15, IDS2017. Used seq2sep model. For unsw-nb15 used train as test and test as train.
-
Intrusion Detection Using Big Data and Deep Learning Techniques: Used the big dataset of UNSW-NB15 with five fold cross validation.
-
An Effective Deep Learning Based Scheme forNetwork Intrusion Detection: In this dataset, there are 2.54 million samples in total, containing 9 types of attack samples and 2.2 million normal samples. Each sample has 47 features. We randomly assign them into two sets for training and testing, respectively, each of which contains 1.905 million and 0.635 million samples. The ratios of normal vs. attack samples of both sets are 6.9, remaining the same as in the original dataset.
-
An Ensemble Intrusion Detection Technique based on proposed Statistical Flow Features for Protecting Network Traffic of Internet of Things: Only used DNS and HTTP protocol data.
-
Analysis of Lightweight Feature Vectors for Attack Detection in Network Traffic: Did on complete unsw-nb15 dataset. Feature scaling, selection, one hot encoding, pca, 5 fold cross validation
@INPROCEEDINGS{9315049,
author={Islam, Md. Khairul and Hridi, Prithula and Hossain, Md. Shohrab and Narman, Husnu S.},
booktitle={2020 30th International Telecommunication Networks and Applications Conference (ITNAC)},
title={Network Anomaly Detection Using LightGBM: A Gradient Boosting Classifier},
year={2020},
volume={},
number={},
pages={1-7},
doi={10.1109/ITNAC50341.2020.9315049}}