Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/39727
Title: Efficient text classification
Authors: Tan, Cheryl Qian Ru.
Keywords: DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Issue Date: 2010
Abstract: As the digital age pushes forward, data and document size have been increasing rapidly. A more efficient and accurate method of sampling data for training text classifiers is required. We require good samples and not just blind samples from Simple Random Sampling, therefore we experimented on a new proposed sampling algorithm – CONCISE. It is a novel sampling algorithm that is proposed for selecting training documents for text classification and experiments showed that it works particularly well with small sampling ratio. Experiments were conducted on the 20 Newsgroup corpus and Reuters 21578 document set using two classifiers SVM and Naïve Bayes classifier. CONCISE is compared with SRS in all experiments and results showed that CONCISE is consistent in accuracy no matter which classifier is used. In all experiments, CONCISE outperforms SRS in all sampling ratios and the accuracy with CONCISE is higher. However, CONCISE requires more running time but the trade off is small compared to the increase in accuracy.
URI: http://hdl.handle.net/10356/39727
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
SCE09-0316.pdf
  Restricted Access
945.83 kBAdobe PDFView/Open

Page view(s)

235
Updated on Nov 25, 2020

Download(s)

9
Updated on Nov 25, 2020

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.