Please use this identifier to cite or link to this item:
Title: Topic classification and association rule mining for Chinese Mathematics questions
Authors: Tan, Huicheng.
Keywords: DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Issue Date: 2011
Abstract: This project is about data mining for patterns in mathematics questions which are used in China College Entrance Exam. These questions contain both Chinese words and formulas. Thus, three types of data mining are carried out which include Sub-topic Classification, Association Rule Mining in Textual Data and Association Rule Mining in Formulas. All three types of data mining starts from keywords (terms) identification which are all conducted manually. After data preprocessing, all questions are transformed into ARFF file format. As for the Sub-topic Classification, three widely used algorithms including Support Vector Machine, Decision Tree and Random Forest are compared in terms of the classification performance. Based on the experimental results, Random Forest outperformed the other two algorithms. For the two types of association rule mining, both of them apply the FP-Growth algorithm and an user feedback is conducted to evaluate the usefulness of the generated association rules. As the feedback shows 80% rules generated from textual data with min_support and min_confidence set as 0.115 and 0.8 respectively are useful. The percentage of useful rules mined from formulas is 82.8% with min_support and min_confidence set to 0.07 and 0.9 respectively.
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
  Restricted Access
1.86 MBAdobe PDFView/Open

Page view(s)

Updated on Nov 25, 2020


Updated on Nov 25, 2020

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.