Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/55014
Title: | Online text mining for conversational speech recognition | Authors: | Thong, Kian Hoong. | Keywords: | DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition | Issue Date: | 2013 | Abstract: | Conversational text is a highly varied, and many abbreviations and short forms exist in different languages. To manually enter every single possible term would be difficult, and chances are that certain terms would be missed out. This makes the compilation of conversational texts a difficult task. This project aims to utilize cutting-edge search engines of today, like Google and Bing, to crawl the web for conversational texts to add to the Language Model. It also utilizes certain methods to minimize the clutter that’s present in the final text that will be input into the Language Model. Much research was done into understanding the three aspects of this project, namely: Web-crawling, normalization and language modeling. Relying on academic literature and the internet, the web-crawler was developed to fulfill the needs of obtaining a conversational corpus. It uses filtering and history tracking to ensure that the data is readable and non-repeated. At the conclusion of this project, a substantial amount of data was collected from the Internet, using a combination of normalization techniques and APIs used for web-crawling. The data was then used to generate a language model which was run against the test data. The resulting perplexity would entail if the crawled data would have an improved perplexity over the manually transcribed training data. This report contains all the research and data used to optimize the search engine program, as well as reflections of lessons learnt throughout this process. | URI: | http://hdl.handle.net/10356/55014 | Schools: | School of Computer Engineering | Research Centres: | Centre for Advanced Information Systems | Rights: | Nanyang Technological University | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | SCSE Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Thong Kian Hoong_ Final Year Report.pdf Restricted Access | 1.56 MB | Adobe PDF | View/Open |
Page view(s)
456
Updated on Mar 22, 2025
Download(s)
13
Updated on Mar 22, 2025
Google ScholarTM
Check
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.