Please use this identifier to cite or link to this item:
Title: Clustering techniques for web mining
Authors: Qiu, Siyuan.
Keywords: DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Issue Date: 2012
Abstract: With more and more high-dimensional data becoming prevalent, feature selection has been widely applied in data mining, machine learning and some other fields. The goal of feature selection is removing unneeded features because they might degrade the quality of discovered patterns. As a result, data mining process can be applied much quicker and more accurately. Various feature selection approaches in text categorization have been proposed in the literature. In this project, a Multitype Features Coselection for Web Document Clustering (MFCC) approach has been researched and implemented. MFCC is designed to improve identifying the most discriminative and remove the noisy features. In this project, other than the implementation of MFCC, we have also done the data processing which transforms the raw web documents to the format that can be used in MFCC JAVA program. Afterwards, several simulations have been conducted to test the accuracy and efficiency of MFCC.
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
  Restricted Access
Main article1.5 MBAdobe PDFView/Open

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.