Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/179452
Title: Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation
Authors: Ooi, Kenneth Wen Rui
Keywords: Engineering
Physics
Issue Date: 2024
Publisher: Nanyang Technological University
Source: Ooi, K. W. R. (2024). Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/179452
Project: COT-V4-2020-1 
GCP205559654 
Abstract: The field of soundscape analysis and design is a nascent one, with the framework defined in ISO 12913-1:2014 calling for the understanding of an "acoustic environment as perceived or experienced and/or understood by a person or people, in context". Consequently, one method to improve soundscape quality is soundscape augmentation, whereby sounds are added to an existing soundscape via electroacoustic means to modify its perception. However, determining optimal or appropriate sounds to effect such perceptual changes necessitates listeners to be physically present at a location for the subjective evaluation of performance. Subjective evaluations are known to be the main bottleneck in terms of time and resources of soundscape studies, so being able to sidestep this requirement is extremely crucial in the field of soundscape analysis and design, because urban planners and soundscape architects could then iterate faster through their ideas. Therefore, the overarching aim of this thesis is to provide insight into the following question: To what extent can we remove the human participant from the evaluation process by utilising appropriate design and modelling approaches? To achieve this, we (1) craft a large benchmark dataset of human responses to perceptual attributes of a representative variety of soundscapes in public urban environments that can be used to train generalisable models, (2) develop probabilistic models from the dataset comprising deep neural networks that capture the subjectivity in human evaluations of soundscapes, and (3) integrate such models in a real-life soundscape augmentation system requiring no human input to run. The significance of these contributions is apparent given the dearth of publicly-available, large-scale benchmark datasets in existing soundscape literature, which has stymied the adoption of deep learning models in soundscape research due to their typical need for large datasets. Nonetheless, recent advances in deep learning models for acoustic tasks outside the field of soundscape research suggest at their applicability in soundscape analysis as well, which this thesis will also demonstrate. Highlights of the thesis include the benchmark dataset being the largest soundscape dataset with perceptual labels in the literature (25,440 data samples), a probabilistic loss function allowing for statistically significant improvements (up to 7.8%) over a standard loss function using the mean squared error in the prediction of "pleasantness" as defined in ISO 12913, a modular architecture allowing for the separation of masker and gain inputs for more efficient masker selection in an automated masker selection system, a multimodal expansion on that modular architecture allowing for significant improvements (up to 2.8%) over a model using purely acoustic information, and an in-situ validation of the automated masker selection system with acoustic-only information showing a significant improvement in the perceived pleasantness (up to 23.4% of the possible range in raw ratings and 15.0% as defined by ISO 12913) of the soundscapes at pavilions in green spaces exposed to road traffic noise.
URI: https://hdl.handle.net/10356/179452
DOI: 10.32657/10356/179452
DOI (Related Dataset): 10.21979/N9/9OTEVX
10.21979/N9/0KYIAU
Schools: School of Electrical and Electronic Engineering 
Research Centres: Digital Signal Processing Laboratory 
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: embargo_20250731
Fulltext Availability: With Fulltext
Appears in Collections:EEE Theses

Files in This Item:
File Description SizeFormat 
Artificial Intelligence for Urban Soundscape Augmentation.pdf
  Until 2025-07-31
30.42 MBAdobe PDFUnder embargo until Jul 31, 2025

Page view(s)

202
Updated on Jan 22, 2025

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.