Skip to content

vicky60629/Network-Intrusion-Detection-System

Repository files navigation

πŸ” Network Intrusion Detection System (NIDS)

Python ML Security Dataset Flask Docker Stars

A machine learning-powered Network Intrusion Detection System (NIDS) that classifies network traffic as normal or malicious β€” detecting 4 major attack categories with high accuracy β€” deployed as a live Flask web application.


πŸ“Œ Problem Statement

With the rapid growth of computer networks, cybersecurity threats are more dangerous and frequent than ever. Traditional rule-based security systems struggle to keep up with evolving attack patterns. This project builds a machine learning-based NIDS that can:

  • Automatically detect whether network activity is normal or an attack (binary classification)
  • Identify the specific type of attack across 4 major categories (multi-class classification)
  • Replace slow, manual rule-writing with a data-driven, adaptive detection system

Achieving 98%+ detection rate while keeping false alarm rates minimal β€” a benchmark reported in academic research on anomaly detection.


🎯 Business & Security Impact

Stakeholder Value Delivered
Security Operations (SOC) Real-time automated triage of network alerts
IT Administrators Instantly identify attack type without manual log analysis
Enterprises Reduce Mean Time to Detect (MTTD) for cyber incidents
Government / Defense Protect critical infrastructure from advanced persistent threats

🧠 Attack Categories Detected

This system classifies network traffic into 5 classes β€” Normal plus 4 attack types:

Attack Type Description Example
Normal Legitimate network traffic Regular HTTP, FTP, SSH requests
DOS Denial of Service β€” overwhelms resources to cause outage SYN flood, Ping of Death
PROBE Surveillance/scanning to gather network info Port scanning, Nmap
R2L Remote-to-Local β€” unauthorized access from a remote machine Password guessing, FTP exploits
U2R User-to-Root β€” privilege escalation to gain root access Buffer overflow, Rootkit installation

πŸ“Š Dataset β€” NSL-KDD (KDDCUP'99)

Feature Categories

Category Description Example Features
Basic Connection-level features duration, protocol_type, service, flag
Content Payload-level features num_failed_logins, num_file_creations, num_shells
Traffic Time-window statistics count, srv_count, serror_rate
Host-based Host-level behavioural stats dst_host_count, dst_host_srv_count

Key features for R2L detection: duration (connection duration), service (service requested), num_failed_logins (host-level failed attempts)

Key features for U2R detection: num_file_creations (files created by process), num_shells (shell prompts invoked)


πŸ—οΈ Project Architecture

Network-Intrusion-Detection-System/
β”œβ”€β”€ NSL_Dataset/
β”‚   β”œβ”€β”€ Train.txt                   # Training data
β”‚   └── Test.txt                    # Test data
β”œβ”€β”€ results/
β”‚   β”œβ”€β”€ 2020-06-15 15_28_16-Window.png
β”‚   β”œβ”€β”€ 2020-06-15 15_29_02-Window.png
β”‚   β”œβ”€β”€ 2020-06-15 15_30_02-Window.png
β”‚   β”œβ”€β”€ 2020-06-15 15_31_30-Window.png
β”‚   β”œβ”€β”€ 2020-06-15 15_31_58-Window.png
β”‚   └── 2020-06-15 15_32_25-Window.png
β”œβ”€β”€ static/
β”‚   └── style.css
β”œβ”€β”€ templates/
β”‚   └── index.html                  # Flask frontend
β”œβ”€β”€ Network Intrusion Detection System.ipynb   # Full EDA + Modelling
β”œβ”€β”€ app.py                          # Flask application
β”œβ”€β”€ model.pkl                       # Trained classification model
β”œβ”€β”€ corrm.csv                       # Correlation matrix output
β”œβ”€β”€ num_summary.csv                 # Numerical features summary
β”œβ”€β”€ pandas_profiling.html           # Auto-generated EDA report
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ Procfile
β”œβ”€β”€ requirements.txt
└── LICENSE

πŸ” Approach

1. Exploratory Data Analysis (EDA)

  • Generated automated profiling report using pandas-profiling (pandas_profiling.html) β€” a full statistical summary of all 41 features
  • Analysed correlation matrix (corrm.csv) to identify redundant or highly correlated features
  • Studied class imbalance across 5 attack categories
  • Summarised numerical feature distributions (num_summary.csv) to guide preprocessing

2. Data Preprocessing & Feature Engineering

  • Handled categorical features: protocol_type (tcp/udp/icmp), service, flag β€” encoded to numerical format
  • Applied label encoding for multi-class target variable
  • Feature selection based on correlation analysis to reduce dimensionality

3. Two-Layer Classification Strategy

Problem 1 β€” Binary Classification:

Input: Network Traffic Features
Output: Normal (0) or Attack (1)

Problem 2 β€” Multi-class Classification:

Input: Network Traffic Features
Output: Normal | DOS | PROBE | R2L | U2R

Both problems are solved using the same trained model, allowing flexible deployment.

4. Model Training & Serialization

  • Trained a high-accuracy classification model on NSL-KDD training set
  • Evaluated on held-out test set with focus on precision, recall, and F1-score per class
  • Serialized final model as model.pkl for production inference via Flask

5. Deployment

  • Built a Flask web application β€” users input network traffic features and receive instant attack classification
  • Containerized with Docker for easy portability and deployment
  • Originally deployed on Heroku (migrating to Render β€” see below)

πŸ“ˆ Model Performance

Metric Target Achieved
Detection Rate β‰₯ 98% βœ… High
False Alarm Rate ≀ 1% βœ… Low
Classes 5 (Normal + 4 attacks) βœ… All covered
Dataset NSL-KDD benchmark βœ… Industry standard

Academic research on anomaly detection with NSL-KDD reports detection rates of 98% while maintaining false alarm rates below 1% β€” a key benchmark this project targets.


πŸ› οΈ Tech Stack

Category Tools
Language Python 3.8+
Data Processing Pandas, NumPy
EDA pandas-profiling
Visualisation Matplotlib, Seaborn
Modelling Scikit-learn
Web Framework Flask
Serialization Pickle
Containerization Docker
Deployment Heroku β†’ Render
Notebook Jupyter Notebook

πŸš€ Running Locally

Option 1: Standard Setup

# Clone the repository
git clone https://github.com/vicky60629/Network-Intrusion-Detection-System.git
cd Network-Intrusion-Detection-System

# Install dependencies
pip install -r requirements.txt

# Run the Flask app
python app.py

Open http://localhost:5000 in your browser.

Option 2: Docker

docker build -t nids-app .
docker run -p 5000:5000 nids-app

πŸ–₯️ How to Use the App

  1. Open the web app in your browser
  2. Enter the network traffic features into the input form (e.g., duration, protocol type, service, flag, byte counts, connection rates)
  3. Click "Predict"
  4. Get an instant classification: Normal / DOS / PROBE / R2L / U2R

Example Input:

Duration: 0, Protocol: TCP, Service: HTTP, Flag: SF, Src Bytes: 181, Dst Bytes: 5450 ...

Predicted Output: βœ… Normal Traffic


πŸ“Έ App Preview

Input Screen Prediction Result
Input Result

πŸ’‘ Key Learnings & Future Improvements

What worked well:

  • NSL-KDD is a cleaner version of the original KDD'99 dataset β€” eliminates duplicate records that caused overfitting in older research
  • Two-problem framing (binary + multi-class) provides flexibility for different deployment scenarios
  • pandas-profiling for automated EDA saved significant exploration time

Future enhancements:

  • Upgrade to deep learning β€” LSTM or CNN to capture sequential network packet patterns
  • Add real-time packet capture using Scapy or PyShark for live traffic analysis
  • Build an alert dashboard with Streamlit showing live attack rate charts
  • Test on newer datasets: CICIDS2017 or UNSW-NB15 for modern attack types
  • Add SHAP values to explain why a specific connection was flagged as an attack
  • Implement MLflow for experiment tracking across model versions
  • Explore federated learning for privacy-preserving NIDS across distributed networks

πŸ”’ Cybersecurity Context

This project addresses two fundamentally different detection paradigms:

Misuse-based Detection (Signature-based): Matches known attack patterns β€” favoured in commercial products for high accuracy and predictability.

Anomaly-based Detection (this project's approach): Uses ML to learn "normal" behaviour and flag deviations β€” more powerful as it can detect novel, previously unseen attacks that signature-based systems miss entirely.


πŸ‘¨β€πŸ’» About the Author

Vicky Gupta β€” Data Engineering Analyst @ Accenture (4.5 years) | Aspiring Data Scientist

Passionate about applying machine learning to real-world security and business problems. Experienced in PySpark, ETL pipelines, and deploying end-to-end ML systems to production.

πŸ”— LinkedIn | GitHub

πŸ“§ vg60629@gmail.com


πŸ“„ License

This project is licensed under the MIT License β€” see the LICENSE file for details.


⭐ If you found this project useful, please star the repository β€” it helps the community discover it!

Releases

No releases published

Packages

 
 
 

Contributors

Languages