Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/2614
Title: Developing a new statistical method for Chinese text segmentation
Authors: Dai, Yubin
Keywords: DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
DRNTU::Engineering::Computer science and engineering::Theory of computation::Analysis of algorithms and problem complexity
Issue Date: 1999
Abstract: A new statistical formula for Chinese text segmentation called Contextual Information Formula (OF) was developed empirically for identifying 2 and 3-character words. It was developed by performing stepwise logistic regression using a sample of sentences that had been manually segmented. 300 sentences were used for model building and 100 sentences were set aside for model validation and evaluation. Relative frequencies, document frequencies, weighted document frequencies, within-document frequencies of characters, bigrams and trigrams were included in the study.
URI: http://hdl.handle.net/10356/2614
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
SCE-THESES_42.pdf
  Restricted Access
17.28 MBAdobe PDFView/Open

Page view(s) 50

432
Updated on Nov 25, 2020

Download(s) 50

1
Updated on Nov 25, 2020

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.