Please use this identifier to cite or link to this item:
Title: Mining user-created content for document summarization and event detection
Authors: Hu, Meishan.
Keywords: DRNTU::Engineering::Computer science and engineering::Information systems
Issue Date: 2011
Abstract: Empowered with the ability of creating content using advanced Web services and ease-to-publish tools, today’s Web users are creating content and contributing knowledge through various Web activities. As a result, the Web is abundant with user-created content. With the aim to derive collective intelligence and wisdom-of-the-crowd, we conducted research in knowledge mining from user-created content. Our research focused on three forms of user-created content, including comments, blogs, and search queries. Being one of the important features in blogs, comments written by readers are believed to represent readers’ feedback about documents. From our user study conducted on blog reading, we found that human summarizers selected significantly different sets of sentences from the blog posts before and after reading comments. Hence, we proposed and studied the problem of comments-oriented document summarization, whose goal is to extract a subset of sentences from a given document that best reflects the topics not only presented in the document but also discussed among the associated comments. To generate comments-oriented summary, we proposed and evaluated a number of methods under two separate approaches. In feature-scoring approach, we view words as the features that bridge the semantics in document and the associated comments and scored sentences according to their contained words. As the important containers of words, the set of comments was scored through either graph-based or tensor-based scoring method based on three relations (i.e., topic, quotation, and mention) identified among comments. In language-modeling approach, we view the desire of a summary as an information need, and estimate a language model of comments-oriented summary from the document language model and comments language model. Sentences are then ranked through either Odds Ratio selection or Negative Kullback-Leibler Divergence selection.
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
  Restricted Access
1.25 MBAdobe PDFView/Open

Page view(s)

checked on Oct 1, 2020


checked on Oct 1, 2020

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.