Skip to content

Tongzhongkai/phishing_dataset

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

phishing_dataset

500 phishing websites are collected from PhishTank, and 500 legitimate websites are gathered from Alexa. The dataset is split as 70% for training and 30% for testing.

No Feature Feature Group Description
1 Domain name similarity URL Similarity (based on Ratcliff-Obershelp's algorithm [43]) between the domain name of the visited website and the URL domain name obtained from Alexa or PhishTank
2 URL length URL Number of all characters in a URL
3 HTTP protocol URL HTTP protocol type: standard (0) or secure (1)
4 # '.' symbol URL Number of dot symbols in a URL
5 # '/' symbols URL Number of slash symbols in a URL
6 # '//' symbols URL Number of double slash symbols in a URL
7 # '-' symbols URL Number of dash symbols in a URL
8 # '_' symbols URL Number of underscore symbols in a URL
9 # '=' symbols URL Number of equal symbols in a URL
10 # '(' and ')' symbols URL Number of parenthesis symbols in a URL
11 # '{' and '}' symbols URL Number of curly brackets symbols in a URL
12 # '[' and ']' symbols URL Number of square brackets symbols in a URL
13 # '<' and '>' symbols URL Number of less than and greater than symbols in a URL
14 # '~' symbols URL Number of tilde symbols in a URL
15 # '*' symbols URL Number of asterisk symbols in a URL
16 # '+' symbols URL Number of plus symbols in a URL
17 Inclusion of ‘@' symbol URL URL includes an at symbol (1) or not (0)
18 Inclusion of IP address URL URL includes an IP address (1) or not (0)
19 # tags HTML Number of tags in a website, used to create hyperlinks or anchor links and is an essential element for linking one webpage to another, linking to different sections within the same page, or linking to external resources
20 # tags HTML Number of tags in a website, used to create various types of interactive form elements
21 # tags HTML Number of tags in a website, used to create a clickable button for triggering actions, submitting forms, or performing other interactive functions
22 # tags HTML Number of tags in a website, used to link external resources, such as stylesheets, icons, and other documents, to an HTML document
23 # <iFrame> tags HTML Number of <iFrame> tags in a website, used to embed an external resource, such as another HTML document, a video, or a web page, within the current document
24 HTTP response history HTTP HTTP response code returned by a server to indicate the outcome of a client's request made to the server.
25 Redirect HTTP Website redirects to another site (1) or not (0), detected with HTTP redirection response codes

Please cite the paper: Kapan, S.; Sora Gunal, E. Improved Phishing Attack Detection with Machine Learning: A Comprehensive Evaluation of Classifiers and Features. Appl. Sci. 2023, 13, 13269. https://doi.org/10.3390/app132413269

About

Phishing dataset prepared for master thesis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published