SPAM EMAIL DETECTION SCHEME BASED ON RANDOM FOREST ALGORITHM

Authors

  • A.M. Oyelakin Department of Computer Science, Al-Hikmah University, Ilorin, Nigeria
  • Ibrahim T.T. Salau Department of Computer Science, Al-Hikmah University, Ilorin, Nigeria
  • B.S. Ogidan Adjunct Lecturer, Department of Computer Science, Al-Hikmah University, Ilorin, Nigeria.
  • H.I. Olufadi Part Time Lecturer, Department of Computer Science, Al-Hikmah University, Ilorin, Nigeria
  • S.A. Yusuf Part Time Lecturer, Department of Computer Science, Al-Hikmah University, Ilorin, Nigeria
  • I.A. Adeinji Lecturer, Center for Part Time and Professional Studies, Al-Hikmah University, Ilorin, Nigeria

Keywords:

Email, Spam Email Attacks, Detection Accuracy, Ensemble Algorithm

Abstract

Emails are used for communication purposes in different sectors of the economy such as education, health, businesses, manufacturing, agriculture. People with malicious intent have been using emails accounts for different spam email attacks. Spam email refers to as unsolicited bulk email. It is the practice of sending large frequent, unwanted e-mail messages with commercial content to indiscriminate set of recipients. Spam emails expose users to challenges such as time wastage, high usage of computing resources and stealing of valuable information. Machine learning approaches have been widely accepted to be better than traditional approaches for the identification of spam emails. For this reason, several machine learning techniques have been proposed in the literature for the classification of spams in emails. This paper proposed a Random Forest-based scheme for email spam detection. A fairly large spam email dataset named spam base was collected from UCI machine learning repository. The dataset was pre-processed based on the feature encoding. Then, promising features were selected using feature importance technique. The feature selection yielded 12-feature subsets that were arrived at based on the feature scores. The Random Forest (RF) spam email detection model that was built achieved 99.65% Accuracy, 99.21% Precision, 99.46% of Recall and F1-score of 99.33%. The study concluded that the RF-based spam email detection model performed better than some of the approaches in similar studies.

Published

2023-08-01

How to Cite

Oyelakin, A., Salau , I. T., Ogidan, B., Olufadi, H., Yusuf, S., & Adeinji, I. (2023). SPAM EMAIL DETECTION SCHEME BASED ON RANDOM FOREST ALGORITHM. LAUTECH JOURNAL OF COMPUTING AND INFORMATICS , 3(1), 87-97. Retrieved from http://laujci.lautech.edu.ng/index.php/laujci/article/view/72