Please use this identifier to cite or link to this item:
|Title:||Mining user-created content for document summarization and event detection||Authors:||Hu, Meishan.||Keywords:||DRNTU::Engineering::Computer science and engineering::Information systems||Issue Date:||2011||Abstract:||Empowered with the ability of creating content using advanced Web services and ease-to-publish tools, today’s Web users are creating content and contributing knowledge through various Web activities. As a result, the Web is abundant with user-created content. With the aim to derive collective intelligence and wisdom-of-the-crowd, we conducted research in knowledge mining from user-created content. Our research focused on three forms of user-created content, including comments, blogs, and search queries. Being one of the important features in blogs, comments written by readers are believed to represent readers’ feedback about documents. From our user study conducted on blog reading, we found that human summarizers selected significantly different sets of sentences from the blog posts before and after reading comments. Hence, we proposed and studied the problem of comments-oriented document summarization, whose goal is to extract a subset of sentences from a given document that best reflects the topics not only presented in the document but also discussed among the associated comments. To generate comments-oriented summary, we proposed and evaluated a number of methods under two separate approaches. In feature-scoring approach, we view words as the features that bridge the semantics in document and the associated comments and scored sentences according to their contained words. As the important containers of words, the set of comments was scored through either graph-based or tensor-based scoring method based on three relations (i.e., topic, quotation, and mention) identified among comments. In language-modeling approach, we view the desire of a summary as an information need, and estimate a language model of comments-oriented summary from the document language model and comments language model. Sentences are then ranked through either Odds Ratio selection or Negative Kullback-Leibler Divergence selection.||URI:||http://hdl.handle.net/10356/44560||Fulltext Permission:||restricted||Fulltext Availability:||With Fulltext|
|Appears in Collections:||SCSE Theses|
checked on Oct 1, 2020
checked on Oct 1, 2020
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.