The Hinoki sensebank : a large-scale word sense tagged corpus of Japanese
Date of Issue2006
Workshop on Frontiers in Linguistically Annotated Corpora (2006 : Sydney, Australia)
School of Humanities and Social Sciences
Semantic information is important for precise word sense disambiguation system and the kind of semantic analysis used in sophisticated natural language processing such as machine translation, question answering, etc. There are at least two kinds of semantic information: lexical semantics for words and phrases and structural semantics for phrases and sentences. We have built a Japanese corpus of over three million words with both lexical and structural semantic information. In this paper, we focus on our method of annotating the lexical semantics, that is building a word sense tagged corpus and its properties.
© 2006 Association for Computational Linguistics. This paper was published in Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006 and is made available as an electronic reprint (preprint) with permission of Association for Computational Linguistics. The paper can be found at the following official URL: http://dl.acm.org/citation.cfm?id=1641999. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper is prohibited and is subject to penalties under law.