Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/63803
Title: Chinese words segmentation in user generated content
Authors: Cai, Xiaoxuan
Keywords: DRNTU::Engineering::Computer science and engineering
Issue Date: 2015
Abstract: Chinese word segmentation is the first step for Chinese text processing. The accuracy of Chinese word segmentation directly affects the performance of Chinese text processing. Therefore, Chinese word segmentation plays an important role in Chinese text processing. In addition, with the increasing popularity of social media in China, Chinese sentences that are written in an informal manner in user generated content are very common on the Internet. This project is to study Chinese word segmentation in user generated content. In this project, two existing Chinese word segmentation tools Jieba [1] and Stanford Word Segmenter [2] are studied; a new Chinese word segmentation tool named Weibo Segmenter implemented according to [3] is presented; then these three tools are tested using the same dataset to compare the performance. As a result, Weibo Segmenter achieves an accuracy rate of 83.3% in the test. The performance of Weibo Segmenter could be further enhanced by using a more suitable dictionary and some programming techniques.
URI: http://hdl.handle.net/10356/63803
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
FYP_Report_CaiXiaoxuan.pdf
  Restricted Access
824.93 kBAdobe PDFView/Open

Page view(s)

166
checked on Sep 26, 2020

Download(s)

21
checked on Sep 26, 2020

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.