Please use this identifier to cite or link to this item:
Title: Study on rough set and chi square statistic feature selection for spam classification
Authors: Juniarto Samsudin.
Keywords: DRNTU::Engineering::Systems engineering
Issue Date: 2007
Abstract: Spam messages waste time and resources to the recipients. This dissertation presents the effectiveness of feature selections, particularly,rough set and chi square statistic feature selection methods in combination with J48 decision tree classifier for e-mail classification. Experiments were performed on SpamAssassin corpus, with features selected using word's age, chi square statistic and rough set attribute reduction. Performance is measured based on 10 fold cross validation in terms of Area Under Receiving Operating Characteristic Curve (AUC), precision and recall. The results show feature selection not only can improve the performance of the classifier, but also is a very essential step in e-mail classification. The experiments also reveal that e-mail messages contain a great deal of noise and bad features, which should be removed to increase the performance of the classifier.
Description: 64 p.
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:MAE Theses

Files in This Item:
File Description SizeFormat 
  Restricted Access
4.53 MBAdobe PDFView/Open

Page view(s) 10

checked on Oct 22, 2020

Download(s) 10

checked on Oct 22, 2020

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.