Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/145449
Title: Question classification via machine learning techniques
Authors: Ho, Mun Kit
Keywords: Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Engineering::Electrical and electronic engineering
Issue Date: 2020
Publisher: Nanyang Technological University
Source: Ho, M. K. (2020). Question classification via machine learning techniques. Master's thesis, Nanyang Technological University, Singapore.
Abstract: Questions are indispensable tools in our daily communication and for the process of acquiring information and knowledge. Recent developments in technology and the internet has also brought about many social sites where community members engage in knowledge-building discussions. These technologies have also been translated to online-learning platforms, and increasingly, these have become scalable tools where students across the globe interact and learn. Understanding the cognitive complexities and quality of questions in such learning settings provide additional insights for educators to monitor achievement of learning outcomes and administer intervention when required. This thesis therefore aims to propose automated solutions using machine learning methods to address this pedagogical need. Questions in online-learning platforms are commonly found in assessments authored by instructors to assess learners' understanding on the subject. As online-learning platform scales up, it becomes increasingly laborious to manually create assessments comprising questions of various difficulties for students. However, existing question classification models are limited in terms of modeling semantics. Labeling assessment questions by cognitive complexity not only involves the detection of keywords that discriminate between complexities, but also requires consideration of contextual semantic features. A neural network-based machine-learning model is proposed with attention mechanism to direct the creation of a question representation for this purpose. Experiments on university-level digital signal processing questions demonstrate improved performance against other keyword feature machine learning models when detecting patterns resembling Bloom's taxonomy learning outcome templates. In addition, the proposed classifier is integrated into a web-based quiz generation system to support retrieval practice among students with a desired mixture of questions at different complexity levels. User-generated questions have, on the other hand, become increasingly popular on social media sites for inquiring about specific knowledge outside academic settings. These questions, as opposed to assessment questions, are authored casually, which are error-prone and usually not as sophisticated. To overcome problems of noise such as misspellings, it is important to progressively interpret the question by filtering out the noise and pick out only the salient features. This is achieved via a hierarchical architecture with a new topic-weighted attention mechanism that provides context-aware attention on the question. Furthermore, the proposed approach performs well in the chosen evaluation metrics against other baseline models without assistance from community features. The efficacy of this approach is verified on the Stack Overflow questions dataset. This approach is found to be effective at finding contextual information in the sub-divided texts to form an effective overall representation. Studies on human-authored texts have found that specific information included in a piece of text improve comprehension. In education and on websites, this helps to increase the overall quality of information being communicated. In the previous model, the attention scheme was data-driven and may not make use of granular entities for extracting features. Using entity embeddings from a named-entity recognizer, the markers give hints to the attention to focus the feature extraction around the entities, thus enhancing performance in its discrimination of very good vs bad questions. Results on the Stack Overflow question dataset indicate that the tag embeddings enhanced its performance over the predecessor, especially with finer categories of tags used, instead of binary indicators. The entity tags were shown to work well with the proposed topic-weighted attention mechanism, thus creating a structural bias to focus on specificity-related features at these crucial locations.
URI: https://hdl.handle.net/10356/145449
DOI: 10.32657/10356/145449
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:EEE Theses

Files in This Item:
File Description SizeFormat 
MEng_HO_munkit.pdfThesis PDF1.53 MBAdobe PDFView/Open

Page view(s)

242
Updated on Jan 27, 2023

Download(s) 20

210
Updated on Jan 27, 2023

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.