Please use this identifier to cite or link to this item:
Title: Discovery of interesting phrases from text streams
Authors: Pang, Jeffrey Jian Hao
Keywords: DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Issue Date: 2011
Abstract: The fast adoption of blogs and tweets in recent years has been generating a large and diversified amount of information feeds daily. In order to take advantage of this vast knowledge, there is a need to automatically and efficiently organize these timely data into useful information. These text streams usually contain interesting phrases that provide summarized insights of the content of the text. In this project, we are interested in extracting interesting phrases, consolidating them and transforming them into meaningful statistics such as the amount of media coverage of a certain event during a specific time period, by making use of their temporal information such as “date published”. This report explores the various methodologies and algorithms used in keyphrase extraction. It also documents the development and implementation of a search engine titled “Interesting Phrases Analysis Program (IPAP)” designed for this project. IPAP is capable of retrieving interesting phrases from large collection of blog entries. It indexes and allows users to perform a series of different useful analysis on the search result. The trend of phrases, relationship between phrases, niche of each blog and other handy information can be obtained from the analysis. It can also be developed to use with tweets. The applications and future development potential are also discussed in this report. IPAP proves that the analysis of interesting phrases from text stream such as blog can generate unexpectedly large amount of beneficial information.
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
  Restricted Access
1.31 MBAdobe PDFView/Open

Page view(s) 20

checked on Oct 21, 2020

Download(s) 20

checked on Oct 21, 2020

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.