Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/180288
Title: Deep neural network-based automatic speech recognition for ATC-pilot audio transcription
Authors: Low, Ashton Kin Yun
Nimrod, Lilith
Alam, Sameer
Poh, Leston Choo Kiat
Keywords: Engineering
Issue Date: 2024
Source: Low, A. K. Y., Nimrod, L., Alam, S. & Poh, L. C. K. (2024). Deep neural network-based automatic speech recognition for ATC-pilot audio transcription. 2024 International Conference on Research in Air Transportation (ICRAT).
Project: NTU Ref: 2017-1619 
Conference: 2024 International Conference on Research in Air Transportation (ICRAT)
Abstract: Artificial Intelligence (AI) has demonstrated ability to manage complex processes highly effectively, and thus is widely seen as a key component in future airport ATM systems. Future AI tools for ATM will rely on digital data, such as surveillance, radar, weather, flight plans, for their operation. However, the foundational Air Traffic Control Officer (ATCo)-pilot communication medium is voice, which is a vital source of situational data. Controller Pilot Data Link Communications (CPDLC) has been developed as an alternative, text-based communication delivery method, however ATCo-pilot communications will not be completed transitioned to this framework in the near-term future. Moreover, as CPDLC is a one-to-one communication paradigm, the additional situational awareness of other traffic provided by traditional party-line VHF communications is potentially lost. Therefore, an automated speech to-text translation tool can be seen as a missing link, enabling traditional ATCo-pilot voice communications to be automatically translated and input into a datalink system such as CPDLC. To this end this paper presents a Machine Learning (ML) based Automatic Speech Recognition (ASR) framework that is able to accurately translate ATCo-pilot speech communication to text, achieving a Word Error Rate of only 6.13%. Moreover, the presented model is able to extract seven entities with an accuracy and F1-score of 91.8% and 84.4% respectively, which is similar to previously presented models but can only capable of extracting three. A detailed design of the framework is provided to enable its replication by the wider research community.
URI: https://hdl.handle.net/10356/180288
URL: https://www.icrat.org/upcoming-conference/papers/
https://www.icrat.org/
Schools: School of Mechanical and Aerospace Engineering 
Research Centres: Air Traffic Management Research Institute 
Rights: © 2024 ICRAT. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at https://www.icrat.org/upcoming-conference/papers/.
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:MAE Conference Papers

Files in This Item:
File Description SizeFormat 
ICRAT2024_paper_88.pdf1.17 MBAdobe PDFThumbnail
View/Open

Page view(s)

29
Updated on Oct 9, 2024

Download(s)

2
Updated on Oct 9, 2024

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.