Please use this identifier to cite or link to this item:
Title: A multimedia transcription system
Authors: Nguyen, Huy Anh
Keywords: DRNTU::Engineering
Issue Date: 2018
Abstract: With the advent of computing, a huge amount of data is being created everyday. Most of the dataare unstructured or semi-structured, and needs to be processed in order to derive meaning. For multimedia data (audio and video), a textual representation is often desirable, and there are two ways to obtain such a representation --- transcription and captioning. The two processes are well-defined pipelines of multiple components. However, for each component there are many existing implementations, but each having differentiated input and output formats, which makes it difficult to integrate to a pipeline. The pipeline itself is difficult to maintain, with any change/ upgrade to any component having a potential to break the pipeline. Furthermore, as the pipeline changes there is no mechanism to keep track of output versions; this capability is important for research purposes. This project proposes an integrated processing system performing transcription and captioning on a wide range of audio and video inputs --- single-file audio/ video as well as multi-channel audio recordings. The project aims to design a system architecture that allows for modularity and extensibility, keeps track of different component and output versions and performs robustly under many scenarios. The project incorporates Python ports of existing modules from various efforts of the Speech and Language Research Group in the School of Computer Science and Engineering, as well as new Python modules to realize the processing pipeline --- transcription, captioning and visualizations of transcripts and captions. The project would be evaluated on existing audio records of talk shows (Singapore's 93.8FM), video records (Singapore Parliament proceedings) and multi-channel recordings (a four-people conversation on Singapore Army). It achieves all the requirements and proves the usefulness of this project.
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
  Restricted Access
2.25 MBAdobe PDFView/Open

Page view(s) 10

Updated on May 13, 2021

Download(s) 10

Updated on May 13, 2021

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.