Please use this identifier to cite or link to this item:
Title: Developing a new statistical method for Chinese text segmentation
Authors: Dai, Yubin
Keywords: DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
DRNTU::Engineering::Computer science and engineering::Theory of computation::Analysis of algorithms and problem complexity
Issue Date: 1999
Abstract: A new statistical formula for Chinese text segmentation called Contextual Information Formula (OF) was developed empirically for identifying 2 and 3-character words. It was developed by performing stepwise logistic regression using a sample of sentences that had been manually segmented. 300 sentences were used for model building and 100 sentences were set aside for model validation and evaluation. Relative frequencies, document frequencies, weighted document frequencies, within-document frequencies of characters, bigrams and trigrams were included in the study.
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
  Restricted Access
17.28 MBAdobe PDFView/Open

Page view(s) 50

Updated on Nov 25, 2020

Download(s) 50

Updated on Nov 25, 2020

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.