Memory-based learning for article generation
Date of Issue2000
Conference on Computational Language Learning (4th : 2000 : Lisbon, Portugal)
College of Humanities, Arts, and Social Sciences
Article choice can pose difficult problems in applications such as machine translation and automated summarization. In this paper, we investigate the use of corpus data to collect statistical generalizations about article use in English in order to be able to generate articles automatically to supplement a symbolic generator. We use data from the Penn Treebank as input to a memory-based learner (TiMBL 3.0; Daelemans et al., 2000) which predicts whether to generate an article with respect to an English base noun phrase. We discuss competitive results obtained using a variety of lexical, syntactic and semantic features that play an important role in automated article generation.
© 2000 Association for Computational Linguistics. This paper was published in Proceedings of Conference on Computational Language Learning and is made available as an electronic reprint (preprint) with permission of Association for Computational Linguistics. The paper can be found at the following official URL: http://dx.doi.org/10.3115/1117601.1117611. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper is prohibited and is subject to penalties under law.