Skip to content

Latest commit

 

History

History
74 lines (37 loc) · 2.1 KB

README.md

File metadata and controls

74 lines (37 loc) · 2.1 KB

FYP-MaliciousURLSDetection

Project Image

Project Overview

This repository contains the dataset and code for the Malicious URLs Detection project as part of the CGEB4323 Project 2 CS course at UNITEN.

Dataset

Version 1.00

  • malicious_phish.csv: Dataset containing information about malicious phishing URLs.

  • README.md: Information about the dataset version 1.00.

Version 1.01

  • README.md: Information about the dataset version 1.01.

  • updated_urls.csv: Dataset containing updated URLs.

Version 1.02

  • README.md: Information about the dataset version 1.02.

  • split_urls.csv: Dataset containing split URLs.

Code

Functions

  • get_headers.ipynb: Jupyter Notebook for extracting headers.

  • test_sklearn_model.ipynb: Jupyter Notebook for testing the scikit-learn model.

  • test_tf_model.ipynb: Jupyter Notebook for testing the TensorFlow model.

  • test_xgb_model.ipynb: Jupyter Notebook for testing the XGBoost model.

Neural Network

  • tk_tf_nn.ipynb: Jupyter Notebook for tokenizer with the TensorFlow Neural Network.

Traditional ML

  • cv_lr.ipynb: Jupyter Notebook for count vectorizer with Logistic Regression.

  • cv_rf.ipynb: Jupyter Notebook for count vectorizer with Random Forest.

  • cv_svm.ipynb: Jupyter Notebook for count vectorizer with Support Vector Machine.

  • cv_xgb.ipynb: Jupyter Notebook for count vectorizer with XGBoost.

  • tf-idf_lr.ipynb: Jupyter Notebook for TF-IDF with Logistic Regression.

  • tf-idf_rf.ipynb: Jupyter Notebook for TF-IDF with Random Forest.

  • tf-idf_svm.ipynb: Jupyter Notebook for TF-IDF with Support Vector Machine.

  • tf-idf_xgb.ipynb: Jupyter Notebook for TF-IDF with XGBoost.

Label Encoders

  • label_encoder_cv_xgb.pkl: Pickle file for Label Encoder used in count vectorizer with XGBoost.

Models

  • cv_xgb.pkl: Pickle file for the XGBoost model used in count vectorizer.

Vectorizers

  • vectorizer_cv_xgb.pkl: Pickle file for the count vectorizer used with XGBoost.