High precision treebanking - blazing useful trees using POS information
Date of Issue2005
Annual Meeting on Association for Computational Linguistics (43rd : 2005)
School of Humanities and Social Sciences
In this paper we present a quantitative and qualitative analysis of annotation in the Hinoki treebank of Japanese, and investigate a method of speeding annotation by using part-of-speech tags. The Hinoki treebank is a Redwoods-style treebank of Japanese dictionary de nition sentences. 5,000 sentences are annotated by three different annotators and the agreement evaluated. An average agreement of 65.4% was found using strict agreement, and 83.5% using labeled precision. Exploiting POS tags allowed the annotators to choose the best parse with 19.5% fewer decisions.
© 2005 ACL. This paper was published in Proceedings of the 43rd Annual Meeting of the ACL and is made available as an electronic reprint (preprint) with permission of Association for Computational Linguistics. The paper can be found at: [DOI: http://dx.doi.org/10.3115/1219840.1219881]. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper is prohibited and is subject to penalties under law.