Please use this identifier to cite or link to this item:
|Title:||SumCR : a new subtopic-based extractive approach for text summarization||Authors:||Mei, Jian-Ping
|Keywords:||DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems||Issue Date:||2011||Source:||Mei, J. P., & Chen, L. (2012). SumCR: A new subtopic-based extractive approach for text summarization. Knowledge and Information Systems, 31(3), 527-545.||Series/Report no.:||Knowledge and information systems||Abstract:||In text summarization, relevance and coverage are two main criteria that decide the quality of a summary. In this paper, we propose a new multi-document summarization approach SumCR via sentence extraction. A novel feature called Exemplar is introduced to help to simultaneously deal with these two concerns during sentence ranking. Unlike conventional ways where the relevance value of each sentence is calculated based on the whole collection of sentences, the Exemplar value of each sentence in SumCR is obtained within a subset of similar sentences. A fuzzy medoid-based clustering approach is used to produce sentence clusters or subsets where each of them corresponds to a subtopic of the related topic. Such kind of subtopic-based feature captures the relevance of each sentence within different subtopics and thus enhances the chance of SumCR to produce a summary with a wider coverage and less redundancy. Another feature we incorporate in SumCR is Position, i.e., the position of each sentence appeared in the corresponding document. The final score of each sentence is a combination of the subtopic-level feature Exemplar and the document-level feature Position. Experimental studies on DUC benchmark data show the good performance of SumCR and its potential in summarization tasks.||URI:||https://hdl.handle.net/10356/98682
|DOI:||10.1007/s10115-011-0437-x||Fulltext Permission:||none||Fulltext Availability:||No Fulltext|
|Appears in Collections:||EEE Journal Articles|
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.