Please use this identifier to cite or link to this item:
Title: Discover underlying concepts from real data
Authors: Patwardhan, Shree Balwant.
Keywords: DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Issue Date: 2012
Abstract: With advances of digital technology and signal acquisition tools, data in various forms have been generated and exchanged at an explosive rate. This creates tremendous needs and good opportunities for developing techniques that can systematically and timely discover the underlying concepts from large amounts of real data in an effective manner. Since real data are often unevenly distributed with both majority concepts (concept with large amount of data) and minority concepts present, this adds another dimension of challenge for comprehensive data mining and learning since both the majority and the minority concepts could carry equal importance in practice. This poses a problem in data mining and machine learning. The fundamental problem of using imbalanced datasets with most existing, standard machine learning algorithms is the significantly compromised performance of these algorithms. Existing algorithms have been designed assuming balanced data sets as input. When confronted with such imbalanced datasets as alluded to previously, there is significant degradation of performance. Therefore, it is crucial that the imbalance in datasets be corrected in order to ensure the efficacy of existing algorithms in learning from such datasets. This project aims to develop a RapidMiner tool to correct the ‘class imbalance’ problem in machine learning using the Structure Preserving Oversampling (SPO) algorithm. In the popular RapidMiner platform, this tool can be used to synthetically generate samples belonging to the minority class, in order to create a balanced dataset for learning algorithms. The algorithm is implemented as a module in a commercially available machine learning environment known as RapidMiner.
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
  Restricted Access
FYP Report2.25 MBAdobe PDFView/Open

Page view(s)

checked on Sep 30, 2020


checked on Sep 30, 2020

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.