Please use this identifier to cite or link to this item:
Title: Seeing sounds : sound visualization and speech to text conversion using laptop and microphone array
Authors: Wu, Haoran
Keywords: Engineering::Computer science and engineering
Issue Date: 2020
Publisher: Nanyang Technological University
Project: SCSE19-0596
Abstract: A large population of people is suffering from hearing loss and inconveniences caused due to lack of auditory sense. There are different types of hearing aids available in the market but due to the discomfort associated with prolong use and the stigma of being recognized as a handicapped person, they are not used often or are abandoned after several years. Therefore, this project aims to explore a way of using mixed reality and sound localization to highlight the existing sound sources in a live video stream to help hearing-impairment identify the sound. The speech-to-text conversion will be implemented to help the conversation. In this project, two approaches were explored to map the sound sources on the video stream, including using mathematical formulas and machine learning. For mathematical formula approach, to find the best-fit equation, pseudo-inverse and dot product were used to find the relationship between sound source coordinates and image coordinates. The best-fit equation was then used to map the sound source to the video stream. Machine learning approach was able to achieve better mapping accuracy in a wider range distance comparing to mathematical approach, however due to the prediction speed, it couldn’t be used in this project. Overall sound tracking and speech-to-text conversion was successfully achieved to a certain extend in this project. The future improvement can be using a smartphone camera as a platform to achieve better mobility and convenience.
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
Amended Final FYP_Report_Wu Haoran.pdf
  Restricted Access
49.85 MBAdobe PDFView/Open

Page view(s)

Updated on May 17, 2022

Download(s) 50

Updated on May 17, 2022

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.