SOFTWARE ENGINEERING & SECURITY GEEK: IDENTIFICATION OF IMAGE SPAM BY USING LOW LEVEL & METADATA FEATURES

Friday, July 26, 2019

IDENTIFICATION OF IMAGE SPAM BY USING LOW LEVEL & METADATA FEATURES

IDENTIFICATION OF IMAGE SPAM BY USING LOW LEVEL & METADATA FEATURES

Anand Gupta1, Chhavi Singhal2 and Somya Aggarwal1

1Department of Computer Engineering,

2Department of Electronic and Communication Engineering Netaji Subhas Institute of Technology, New Delhi, India

ABSTRACT

Spammers are constantly evolving new spam technologies, the latest of which is image spam. Till now research in spam image identification has been addressed by considering properties like colour, size, compressibility, entropy, content etc. However, we feel the methods of identification so evolved have certain limitations due to embedded obfuscation like complex backgrounds, compression artifacts and wide variety of fonts and formats .To overcome these limitations, we have proposed 2 methodologies(however there can be more). Each methodology has 4 stages. Both the methodologies are almost similar except in the second stage where methodology I extracts low level features while the other extracts metadata features. Also a comparison between both the methodologies is shown. The method works on images with and without noise separately. Colour properties of the images are altered so that OCR (Optical Character Recognition) can easily read the text embedded in the image. The proposed methods are tested on a dataset of 1984 spam images and are found to be effective in identifying all types of spam images having (1) only text, (2) only images or (3) both text and images. The encouraging experimental results show that the methodology I achieves an accuracy of 92% while the other achieves

an accuracy of 93.3%.

KEYWORDS

Low level feature, anti obfuscation technique, noise & entropy

ORIGINAL SOURCE URL :http://airccse.org/journal/nsa/0312nsa13.pdf

http://airccse.org/journal/jnsa12_current.html