Adwan
Yasin and Abdelmunem Abuhasan
College
of Engineering and Information Technology, Arab American University, Palestine
ABSTRACT
Phishing attacks are one of the
trending cyber-attacks that apply socially engineered messages that are communicated to people from
professional hackers aiming at fooling users to reveal their sensitive information, the most popular
communication channel to those messages is through users’ emails. This paper presents an intelligent
classification model for detecting phishing emails using knowledge discovery,
data mining and text processing techniques. This paper introduces the concept
of phishing terms weighting which evaluates the weight of phishing terms in
each email. The pre-processing phase is enhanced by applying text stemming and
Word Net ontology to enrich the model with word synonyms. The model applied the
knowledge discovery procedures using five popular classification algorithms and
achieved a notable enhancement in classification accuracy; 99.1% accuracy was
achieved using the Random Forest algorithm and 98.4% using J48, which is –to
our knowledge- the highest accuracy rate for an accredited data set. This paper
also presents a comparative study with similar proposed classification
techniques.
KEYWORDS
Phishing, data mining, email
classification, Random Forest, J48.
No comments:
Post a Comment