Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/106850
Title: Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents
Authors: Lu, You
Keywords: Engineering::Computer science and engineering::Data
Issue Date: 2019
Source: Lu, Y. (2019). Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents. Master's thesis, Nanyang Technological University, Singapore.
Abstract: With the wide-spread usage of social media such as Facebook and Twitter, large amount of data with both spatial and temporal information has become available. Topic modelling has been a useful tool to uncover latent information from such data. This thesis considers a specific type of topic model computational problem called topic-range queries, where the topic model of interest is restricted to the data records that fall within a dynamically specified geographic region and time period. To achieve this purpose, one naive approach is to directly apply a range query to retrieve the data items falling within the specified spatio-temporal range, then derive the topic model from the retrieved data by using a known algorithm such as LDA (Latent Dirichlet Allocation). When dealing with large volume of data, however, the two-step naive approach could each incur substantial amount of time. Novel algorithms for expediting the topic-range queries have been designed, including the fast topic combining algorithm FSS (Fast Set Sampling) which indexes the dataset with a tree, and pre-compute the topic model of the subset of data associated with each node of the tree. To answer a topic-range query, the tree nodes covered by the range query are identified, and the pre-computed topic models associated with these tree nodes are merged to produce an approximate result. Compared to the nave approach, this approximation of topic model substantially can reduce runtime. In the original design of the FSS algorithm, Cube trees are used as the indexing structure to support spatio-temporal range queries. In the literature, however, Range Trees offer a better worst-case query time guarantee for a range query. This master thesis thus considers a new combination of Range Trees and FSS (called Topic Ranger) to support the topic-range queries. The thesis presents the design, implementation of several versions of Topic Ranger for trade-offs between execution time and memory space. It also documents the experiments and comparisons of the execution time and the quality of the resulting approximate topic models against that of the original FSS scheme.
URI: https://hdl.handle.net/10356/106850
http://hdl.handle.net/10220/49683
DOI: 10.32657/10220/49683
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
Master_Thesis_Lu_You_submission.pdf3.11 MBAdobe PDFThumbnail
View/Open

Page view(s)

287
Updated on May 19, 2022

Download(s) 20

185
Updated on May 19, 2022

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.