Skip to content

Malicious URL detector built with deep exploration on feature engineering.

License

Notifications You must be signed in to change notification settings

juliorodrigues07/url_detection

Repository files navigation

Python 3.10.12 Flask Jupyter Notebook Colab

Vue.js Bootstrap JavaScript HTML CSS

URL Detector

Malicious URL Detector built utilizing several data mining, machine learning and data science concepts, techniques and algorithms (PAs 1 and 2 from Applied Data Mining course - DCOMP - UFSJ).

Requirements

All the project dependencies are listed is this section (languages, libraries, package managers, frameworks, ...), as well as the instructions to install each of of them.

To install all dependencies

./install_dependencies.sh

Languages and package managers

  • Python3 and pip package manager:

    sudo apt install python3 python3-pip build-essential python3-dev
    
  • Node.JS package manager - npm (Optional):

    sudo apt-get install npm
    

Data Mining

Data Visualization

  • Matplotlib library:

    pip install matplotlib
    
  • seaborn library:

    pip install seaborn
    
  • numpy library:

    pip install numpy
    

Web Scraping (Optional)

GUI (Graphical User Interface - Optional)

Inside url-detector directory

  • To install all GUI dependencies:

    npm i
    
  • Vue.js framework:

    npm install -g @vue/cli
    
  • Bootstrap framework:

    npm install bootstrap@4.6.0 --save
    
  • axios library:

    npm i axios
    
  • Font Awesome tool kit:

    npm i --save @fortawesome/free-solid-svg-icons && npm i --save @fortawesome/vue-fontawesome@latest-2
    

Execution

All the instructions for exploring the project functionalities are listed in this section, as well as the commands to execute each application.

Data Mining

You can explore all functionalities (different models, datasets, ...) by just modifying (or uncommenting) few parts of the source code.

python3 main.py

Web Scraping

python3 phishing_scraper.py

Application

CLI (Command Line Interface Mode)

  • Inside src directory, execute the command using the following template: python3 predict.py cli <url> <algorithm>.

  • Example with a phishing URL:

    python3 predict.py cli https://bujhanginamfb.github.io/taelasos/update-recovry/ XGB
    

GUI (Graphical User Interface Mode)

  • Open two terminal instances and execute the following commands in each one of them, respectively.

  • Terminal 1 - Back-end (inside src directory):

    python3 predict.py server
    
  • Terminal 2 - Front-end (inside url-detector directory):

    npm run serve
    
  • You should receive two URLs as outputs (http://localhost:<port number>). To visualize it, just open any of them in a browser of your choice. The front-end server (GUI) should be running at:

    http://localhost:8080
    
  • Finally, feel free to test the model with your own URLs! 🍾

Main Screen

Main Screen

Outro

Due to model training with the Kaggle dataset, the model reliability can suffer a lot depending on the user's inputted URL format. Most of the URLs present in the Kaggle dataset doesn't have its communication protocol specified (HTTP, HTTPS, ...), which could introduce large bias on the results and models trained, making the classifications quite unstable.