A plethora of methods for learning English countability
Date of Issue2003
Conference on Empirical Methods in Natural Language Processing (2003)
School of Humanities and Social Sciences
This paper compares a range of methods for classifying words based on linguistic diagnostics, focusing on the task of learning countabilities for English nouns. We propose two basic approaches to feature representation: distribution-based representation, which simply looks at the distribution of features in the corpus data, and agreement-based representation which analyses the level of tokenwise agreement between multiple preprocessor systems. We additionally compare a single multiclass classifier architecture with a suite of binary classifiers, and combine analyses from multiple preprocessors. Finally, we present and evaluate a feature selection method.
© 2003 ACL. This is the author created version of a work that has been peer reviewed and accepted for publication by Proceedings of 2003 Conference on Empirical Methods in Natural Language Processing: EMNLP 2003, Association for Computational Linguistics. It incorporates referee’s comments but changes resulting from the publishing process, such as copyediting, structural formatting, may not be reflected in this document. The published version is available at: [DOI: http://dx.doi.org/10.3115/1119355.1119365].