Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/44149
Title: | Building the foundation text for Nanyang Technological University : multilingual corpus (NTU-MC). | Authors: | Tan, Li Ling. | Keywords: | DRNTU::Humanities::Language::Linguistics DRNTU::Humanities::Linguistics::Sociolinguistics::Multilingualism |
Issue Date: | 2011 | Abstract: | The NTU-MC is a multilingual corpus that taps on the availability of multilingual text available in Singapore. The current version of NTU-MC contains a total of ~375,000 words (15,096 sentences) for the NTU-MC in 6 languages (English, Chinese, Japanese, Korean, Indonesian and Vietnamese) from 6 language families (Indo-European, Japonic, Austro-Asiatic, Sino-Tibetan, Austronesian and Korean as a language isolate); all text in English, Chinese, Japanese, Korean and Vietnamese were Part Of Speech (POS) tagged. This project focuses on compiling the foundation text for the NTU-MC and this dissertation describes the motivations, the corpus compilation process and internal and cross-corpora evaluation of the corpus output. The corpus will be made available to the public under the Creative Common – Attribute 3.0 Unported license in Summer 2011. | URI: | http://hdl.handle.net/10356/44149 | Rights: | Nanyang Technological University | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | HSS Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Building the Foundation Text for NTU-MC_final.pdf Restricted Access | 1.97 MB | Adobe PDF | View/Open |
Page view(s) 50
477
Updated on May 15, 2022
Download(s) 50
35
Updated on May 15, 2022
Google ScholarTM
Check
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.