URL Detector

Malicious URL Detector built utilizing several data mining, machine learning and data science concepts, techniques and algorithms (PAs 1 and 2 from Applied Data Mining course - DCOMP - UFSJ).

Requirements

All the project dependencies are listed is this section (languages, libraries, package managers, frameworks, ...), as well as the instructions to install each of of them.

To install all dependencies

./install_dependencies.sh

Languages and package managers

Python3 and pip package manager:

sudo apt install python3 python3-pip build-essential python3-dev

Node.JS package manager - npm (Optional):
```
sudo apt-get install npm
```

Data Mining

scikit-learn library:
```
pip install -U scikit-learn
```
xgboost library:
```
pip install xgboost
```
mlxtend library:
```
pip install mlxtend
```
imbalanced-learn library:
```
pip install imbalanced-learn
```
pandas library:
```
pip install pandas
```
joblib library:
```
pip install joblib
```

Data Visualization

Matplotlib library:
```
pip install matplotlib
```
seaborn library:
```
pip install seaborn
```
numpy library:
```
pip install numpy
```

Web Scraping (Optional)

Beautiful Soup library:
```
pip install beautifulsoup4
```
mechanize library:
```
pip install mechanize
```
Random User Agents library:
```
pip install random_user_agent
```
PyCryptodome library:
```
pip install pycryptodomex
```

GUI (Graphical User Interface - Optional)

Inside url-detector directory

To install all GUI dependencies:
```
npm i
```
Vue.js framework:
```
npm install -g @vue/cli
```
Bootstrap framework:
```
npm install bootstrap@4.6.0 --save
```
axios library:
```
npm i axios
```

Font Awesome tool kit:

npm i --save @fortawesome/free-solid-svg-icons && npm i --save @fortawesome/vue-fontawesome@latest-2

Execution

All the instructions for exploring the project functionalities are listed in this section, as well as the commands to execute each application.

Data Mining

You can explore all functionalities (different models, datasets, ...) by just modifying (or uncommenting) few parts of the source code.

python3 main.py

Web Scraping

python3 phishing_scraper.py

Application

CLI (Command Line Interface Mode)

Inside src directory, execute the command using the following template: python3 predict.py cli <url> <algorithm>.

Example with a phishing URL:

python3 predict.py cli https://bujhanginamfb.github.io/taelasos/update-recovry/ XGB

GUI (Graphical User Interface Mode)

Open two terminal instances and execute the following commands in each one of them, respectively.
Terminal 1 - Back-end (inside src directory):
```
python3 predict.py server
```
Terminal 2 - Front-end (inside url-detector directory):
```
npm run serve
```
You should receive two URLs as outputs (http://localhost:<port number>). To visualize it, just open any of them in a browser of your choice. The front-end server (GUI) should be running at:
```
http://localhost:8080
```
Finally, feel free to test the model with your own URLs! 🍾

Main Screen

Outro

Due to model training with the Kaggle dataset, the model reliability can suffer a lot depending on the user's inputted URL format. Most of the URLs present in the Kaggle dataset doesn't have its communication protocol specified (HTTP, HTTPS, ...), which could introduce large bias on the results and models trained, making the classifications quite unstable.

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
datasets		datasets
docs		docs
frontend		frontend
models		models
notebook		notebook
plots		plots
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install_dependencies.sh		install_dependencies.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

URL Detector

Requirements

To install all dependencies

Languages and package managers

Data Mining

Data Visualization

Web Scraping (Optional)

GUI (Graphical User Interface - Optional)

Inside url-detector directory

Execution

Data Mining

Web Scraping

Application

CLI (Command Line Interface Mode)

GUI (Graphical User Interface Mode)

Main Screen

Outro

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

juliorodrigues07/url_detection

Folders and files

Latest commit

History

Repository files navigation

URL Detector

Requirements

To install all dependencies

Languages and package managers

Data Mining

Data Visualization

Web Scraping (Optional)

GUI (Graphical User Interface - Optional)

Inside url-detector directory

Execution

Data Mining

Web Scraping

Application

CLI (Command Line Interface Mode)

GUI (Graphical User Interface Mode)

Main Screen

Outro

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages