A machine learning-powered Network Intrusion Detection System (NIDS) that classifies network traffic as normal or malicious β detecting 4 major attack categories with high accuracy β deployed as a live Flask web application.
With the rapid growth of computer networks, cybersecurity threats are more dangerous and frequent than ever. Traditional rule-based security systems struggle to keep up with evolving attack patterns. This project builds a machine learning-based NIDS that can:
- Automatically detect whether network activity is normal or an attack (binary classification)
- Identify the specific type of attack across 4 major categories (multi-class classification)
- Replace slow, manual rule-writing with a data-driven, adaptive detection system
Achieving 98%+ detection rate while keeping false alarm rates minimal β a benchmark reported in academic research on anomaly detection.
| Stakeholder | Value Delivered |
|---|---|
| Security Operations (SOC) | Real-time automated triage of network alerts |
| IT Administrators | Instantly identify attack type without manual log analysis |
| Enterprises | Reduce Mean Time to Detect (MTTD) for cyber incidents |
| Government / Defense | Protect critical infrastructure from advanced persistent threats |
This system classifies network traffic into 5 classes β Normal plus 4 attack types:
| Attack Type | Description | Example |
|---|---|---|
| Normal | Legitimate network traffic | Regular HTTP, FTP, SSH requests |
| DOS | Denial of Service β overwhelms resources to cause outage | SYN flood, Ping of Death |
| PROBE | Surveillance/scanning to gather network info | Port scanning, Nmap |
| R2L | Remote-to-Local β unauthorized access from a remote machine | Password guessing, FTP exploits |
| U2R | User-to-Root β privilege escalation to gain root access | Buffer overflow, Rootkit installation |
- Source: NSL-KDD Dataset β University of New Brunswick
- Type: Widely used benchmark dataset for network-based anomaly detection research
- Files:
Train.txt,Test.txt
| Category | Description | Example Features |
|---|---|---|
| Basic | Connection-level features | duration, protocol_type, service, flag |
| Content | Payload-level features | num_failed_logins, num_file_creations, num_shells |
| Traffic | Time-window statistics | count, srv_count, serror_rate |
| Host-based | Host-level behavioural stats | dst_host_count, dst_host_srv_count |
Key features for R2L detection:
duration(connection duration),service(service requested),num_failed_logins(host-level failed attempts)Key features for U2R detection:
num_file_creations(files created by process),num_shells(shell prompts invoked)
Network-Intrusion-Detection-System/
βββ NSL_Dataset/
β βββ Train.txt # Training data
β βββ Test.txt # Test data
βββ results/
β βββ 2020-06-15 15_28_16-Window.png
β βββ 2020-06-15 15_29_02-Window.png
β βββ 2020-06-15 15_30_02-Window.png
β βββ 2020-06-15 15_31_30-Window.png
β βββ 2020-06-15 15_31_58-Window.png
β βββ 2020-06-15 15_32_25-Window.png
βββ static/
β βββ style.css
βββ templates/
β βββ index.html # Flask frontend
βββ Network Intrusion Detection System.ipynb # Full EDA + Modelling
βββ app.py # Flask application
βββ model.pkl # Trained classification model
βββ corrm.csv # Correlation matrix output
βββ num_summary.csv # Numerical features summary
βββ pandas_profiling.html # Auto-generated EDA report
βββ Dockerfile
βββ Procfile
βββ requirements.txt
βββ LICENSE
- Generated automated profiling report using pandas-profiling (
pandas_profiling.html) β a full statistical summary of all 41 features - Analysed correlation matrix (
corrm.csv) to identify redundant or highly correlated features - Studied class imbalance across 5 attack categories
- Summarised numerical feature distributions (
num_summary.csv) to guide preprocessing
- Handled categorical features:
protocol_type(tcp/udp/icmp),service,flagβ encoded to numerical format - Applied label encoding for multi-class target variable
- Feature selection based on correlation analysis to reduce dimensionality
Problem 1 β Binary Classification:
Input: Network Traffic Features
Output: Normal (0) or Attack (1)
Problem 2 β Multi-class Classification:
Input: Network Traffic Features
Output: Normal | DOS | PROBE | R2L | U2R
Both problems are solved using the same trained model, allowing flexible deployment.
- Trained a high-accuracy classification model on NSL-KDD training set
- Evaluated on held-out test set with focus on precision, recall, and F1-score per class
- Serialized final model as
model.pklfor production inference via Flask
- Built a Flask web application β users input network traffic features and receive instant attack classification
- Containerized with Docker for easy portability and deployment
- Originally deployed on Heroku (migrating to Render β see below)
| Metric | Target | Achieved |
|---|---|---|
| Detection Rate | β₯ 98% | β High |
| False Alarm Rate | β€ 1% | β Low |
| Classes | 5 (Normal + 4 attacks) | β All covered |
| Dataset | NSL-KDD benchmark | β Industry standard |
Academic research on anomaly detection with NSL-KDD reports detection rates of 98% while maintaining false alarm rates below 1% β a key benchmark this project targets.
| Category | Tools |
|---|---|
| Language | Python 3.8+ |
| Data Processing | Pandas, NumPy |
| EDA | pandas-profiling |
| Visualisation | Matplotlib, Seaborn |
| Modelling | Scikit-learn |
| Web Framework | Flask |
| Serialization | Pickle |
| Containerization | Docker |
| Deployment | Heroku β Render |
| Notebook | Jupyter Notebook |
# Clone the repository
git clone https://github.com/vicky60629/Network-Intrusion-Detection-System.git
cd Network-Intrusion-Detection-System
# Install dependencies
pip install -r requirements.txt
# Run the Flask app
python app.pyOpen http://localhost:5000 in your browser.
docker build -t nids-app .
docker run -p 5000:5000 nids-app- Open the web app in your browser
- Enter the network traffic features into the input form (e.g., duration, protocol type, service, flag, byte counts, connection rates)
- Click "Predict"
- Get an instant classification: Normal / DOS / PROBE / R2L / U2R
Example Input:
Duration: 0, Protocol: TCP, Service: HTTP, Flag: SF, Src Bytes: 181, Dst Bytes: 5450 ...
Predicted Output: β Normal Traffic
| Input Screen | Prediction Result |
|---|---|
![]() |
![]() |
What worked well:
- NSL-KDD is a cleaner version of the original KDD'99 dataset β eliminates duplicate records that caused overfitting in older research
- Two-problem framing (binary + multi-class) provides flexibility for different deployment scenarios
- pandas-profiling for automated EDA saved significant exploration time
Future enhancements:
- Upgrade to deep learning β LSTM or CNN to capture sequential network packet patterns
- Add real-time packet capture using Scapy or PyShark for live traffic analysis
- Build an alert dashboard with Streamlit showing live attack rate charts
- Test on newer datasets: CICIDS2017 or UNSW-NB15 for modern attack types
- Add SHAP values to explain why a specific connection was flagged as an attack
- Implement MLflow for experiment tracking across model versions
- Explore federated learning for privacy-preserving NIDS across distributed networks
This project addresses two fundamentally different detection paradigms:
Misuse-based Detection (Signature-based): Matches known attack patterns β favoured in commercial products for high accuracy and predictability.
Anomaly-based Detection (this project's approach): Uses ML to learn "normal" behaviour and flag deviations β more powerful as it can detect novel, previously unseen attacks that signature-based systems miss entirely.
Vicky Gupta β Data Engineering Analyst @ Accenture (4.5 years) | Aspiring Data Scientist
Passionate about applying machine learning to real-world security and business problems. Experienced in PySpark, ETL pipelines, and deploying end-to-end ML systems to production.
π§ vg60629@gmail.com
This project is licensed under the MIT License β see the LICENSE file for details.
β If you found this project useful, please star the repository β it helps the community discover it!

