Please use this identifier to cite or link to this item:
Title: Topical analysis of text streams
Authors: He, Qi
Keywords: DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Issue Date: 2009
Source: He, Q. (2009). Topical analysis of text streams. Doctoral thesis, Nanyang Technological University, Singapore.
Abstract: Topic detection (TD) is an important area of research whose primary goal is to detect retrospective or new topics from a stream of news articles. It could be extremely useful in many applications including news aggregation portals, news alert systems, event search engine, terrorist activity tracking, etc. However, specialists who analyze news articles have a hard time separating the wheat from the chaff, due to the overwhelming amount of news streams (over 10,000 as of 2008). For many years, Topic Detection has been tackled as a clustering task by the TDT (Topic Detection and Tracking) research community. However, time, which plays a pivotal role in news articles has never been given due consideration in the past. In this research we present a thorough study on various temporal topic detection models that explicitly incorporate the element of time. We further discovered that bursty temporal word features play an important role in improving topic detection performance, and ventured to provide an in-depth analysis and systematic categorization of all word features into 5 general types using techniques from signal processing. Armed with a small set of extracted bursty features from historical or online news streams, we proposed a number of effective algorithms to detect topics from a news stream in both offline and online modes. Our algorithms are mathematically elegant, simple, and extremely practical, when benchmarked against some of the best topic detection models including spherical k-means, Latent Dirichlet Allocation (LDA), and von-Mises Fisher mixtures. Finally, we present a case study of a personalized news alert application, where subscribers can specify interesting anticipatory events, and show how a simple supervised event transition classifier can be used to effectively identify user anticipated events. Our research is one of the most comprehensive studies on both offline and online topic detection, of which the latter has been an open research problem for many years. In fact, our online topic detection model can be viewed as a significant advancement in the field, which paves the way for further improvements by other TDT experts.
DOI: 10.32657/10356/17764
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
HeQi08.pdfMain report2.76 MBAdobe PDFThumbnail

Page view(s) 10

Updated on May 13, 2021

Download(s) 5

Updated on May 13, 2021

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.