The Company They Keep: Extracting Japanese Neologisms Using Language Patterns
Date of Issue2018
The 9th Global WordNet Conference (GWC 2018)
School of Humanities and Social Sciences
We describe an investigation into the identification and extraction of unrecorded potential lexical items in Japanese text by detecting text passages containing selected language patterns typically associated with such items. We identified a set of suitable patterns, then tested them with two large collections of text drawn from the WWW and Twitter. Samples of the extracted items were evaluated, and it was demonstrated that the approach has considerable potential for identifying terms for later lexicographic analysis.
© 2018 The author(s). This is the author created version of a work that has been peer reviewed and accepted for publication by The 9th Global WordNet Conference (GWC 2018). It incorporates referee’s comments but changes resulting from the publishing process, such as copyediting, structural formatting, may not be reflected in this document. The full-text is available at: [http://compling.hss.ntu.edu.sg/events/2018-gwc/pdfs/GWC2018_paper_20.pdf].