EDA

https://www.kaggle.com/khairulislam/unsw-nb15-eda

Data preprocessing

https://www.kaggle.com/khairulislam/data-preprocessing

Feature Importance

https://www.kaggle.com/khairulislam/unsw-nb15-feature-importance Filename: importance.csv

Model selection

Using ten-fold cross validation on popular machine learning models to find the best one. https://www.kaggle.com/khairulislam/ten-fold-cross-validation-with-different-models

Model	Accuracy	F1
LogisticRegression	0.9354286717347297	0.9542239342896803
GradientBoostingClassifier	0.9458426691403649	0.9611062449808958
DecisionTreeClassifier	0.9498805218353074	0.9631526109252336
RandomForestClassifier	0.9607678800687012	0.9713978736211478
LighGBM	0.961811555768474	0.9721410918894631

Hyper-tuning

Experiments

Links :

https://www.kaggle.com/khairulislam/unsw-nb15-lightgbm
https://www.kaggle.com/khairulislam/unsw-nb15-witih-randomforest

Train performance

Performance of the model on the same dataset on which it was trained.

Ten-fold cross validation

Using Stratified kfold cross validation to validate model performance.

Train

Test

Combined (train + test)

Train test validation

Training the model on train dataset. Then testing on separate test dataset.

Deep Learning Approach for Intelligent Intrusion Detection System : DNN (4 layers) acc 0.765, pre 0.946, rec 0.695, f1 .801 . RF acc .903, pre .988, rec 0.867, f1 0.924.
Feasibility of Supervised Machine Learning for Cloud Security: Logistic Regression acc 89.26%, TP 93.7% TN 95.7% at prediction threshold 0.5. Increasing prediction threshold to 0.7-0.8 TP 97%, but TN 80%.
A Two-Stage Classifier Approach using RepTree Algorithm for Network Intrusion Detection: Used a decision tree named REPTree (Reduced Error Pruning Tree) to get accuracy 88.95%.
Network Intrusion Detection in Big Dataset Using Spark: Using Dimension Reduction (LDA) on the dataset using spark then the performance of REP Tree acc 93.56%, FPR 2.3%, prec 83.3%, rec 83.2%, roc 90.5%.

Papers that weren't compared

Deep Learning Approach for Cyberattack Detection: The two datasets are randomly splitted using the same rule. 80% of data was used to fit DFEL and get the pre-trained model. The remaining 20% of the data was randomly split into 70%/30% as training/testing data for classifiers.
Building an Effective Intrusion Detection SystemUsing the Modified Density Peak ClusteringAlgorithm and Deep Belief Networks: Multiclass classification
A New Generalized Deep Learning Framework Combining Sparse Autoencoder and Taguchi Method for Novel Data Classification and Processing: only worked with DDoS datasete.
An Empirical Evaluation of Deep Learningfor Network Anomaly Detection: Mentioned 100% result of all metrics (acc, pre, rec, f1) for NSL-KDD, KYOTO-HONEYPOT, UNSW-NB15, IDS2017. Used seq2sep model. For unsw-nb15 used train as test and test as train.
Intrusion Detection Using Big Data and Deep Learning Techniques: Used the big dataset of UNSW-NB15 with five fold cross validation.
An Effective Deep Learning Based Scheme forNetwork Intrusion Detection: In this dataset, there are 2.54 million samples in total, containing 9 types of attack samples and 2.2 million normal samples. Each sample has 47 features. We randomly assign them into two sets for training and testing, respectively, each of which contains 1.905 million and 0.635 million samples. The ratios of normal vs. attack samples of both sets are 6.9, remaining the same as in the original dataset.
An Ensemble Intrusion Detection Technique based on proposed Statistical Flow Features for Protecting Network Traffic of Internet of Things: Only used DNS and HTTP protocol data.
Analysis of Lightweight Feature Vectors for Attack Detection in Network Traffic: Did on complete unsw-nb15 dataset. Feature scaling, selection, one hot encoding, pca, 5 fold cross validation

Citation

@INPROCEEDINGS{9315049,
  author={Islam, Md. Khairul and Hridi, Prithula and Hossain, Md. Shohrab and Narman, Husnu S.},
  booktitle={2020 30th International Telecommunication Networks and Applications Conference (ITNAC)}, 
  title={Network Anomaly Detection Using LightGBM: A Gradient Boosting Classifier}, 
  year={2020},
  volume={},
  number={},
  pages={1-7},
  doi={10.1109/ITNAC50341.2020.9315049}}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

EDA

Data preprocessing

Feature Importance

Model selection

Hyper-tuning

Experiments

Train performance

Ten-fold cross validation

Train

Test

Combined (train + test)

Train test validation

Papers that weren't compared

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

EDA

Data preprocessing

Feature Importance

Model selection

Hyper-tuning

Experiments

Train performance

Ten-fold cross validation

Train

Test

Combined (train + test)

Train test validation

Papers that weren't compared

Citation