Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/75040
Title: Graph of words for document classification
Authors: Quach, Tri Dung
Keywords: DRNTU::Engineering
DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Issue Date: 2018
Abstract: This FYP project is about the implementations and experimental studies of a novel framework for large data classifications of textual documents. Under this new framework, documents are first transferred from sentences into graph-of-words, so the original classification problem is then considered as graph classification and advanced representation learning (RL) model subgraph2vec can be applied. However, as shared by many other RL based methods, poor efficiency problem is serious because in general NLP dataset has a huge vocabulary. Thus, this project proposes hash embeddings version of subgraph2vec to significantly reduce required memory for training phase, make system become efficient without harming the quality of resultant representations. The approach is evaluated in terms of time, required memory, accuracy and f1 score with benchmark datasets on 3 domains (the first 2 are graph classification task and the last task is document classification). Through experiments, proposed approach outperforms other RL based methods and achieves comparable results with state-of-the-art method. Finally, the FYP project introduces semi supervised version of the method and observes the significant increases in sentimental analysis task.
URI: http://hdl.handle.net/10356/75040
Schools: School of Electrical and Electronic Engineering 
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
fypreport_v3.pdf
  Restricted Access
1.06 MBAdobe PDFView/Open

Page view(s)

297
Updated on Jun 23, 2024

Download(s)

12
Updated on Jun 23, 2024

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.