Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/168465
Title: From knowledge augmentation to multi-tasking: towards human-like dialogue systems
Authors: Yang, Tianji
Keywords: Engineering::Computer science and engineering
Issue Date: 2023
Publisher: Nanyang Technological University
Source: Yang, T. (2023). From knowledge augmentation to multi-tasking: towards human-like dialogue systems. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/168465
Project: 04SBP000598C130 
Abstract: The goal of building dialogue agents that can converse with humans naturally has been a long-standing dream of researchers since the early days of artificial intelligence. The well-known Turing Test proposed to judge the ultimate validity of an artificial intelligence agent on the indistinguishability of its dialogues from humans'. It should come as no surprise that human-level dialogue systems are very challenging to build. But, while early effort on rule-based systems found limited success, the emergence of deep learning enabled great advance on this topic. The works covered in this thesis originated in an era where data-driven deep learning based dialogue systems were beginning to take off. Dialogue systems trained on message-response pairs found in social media began to show abilities of conducting natural conversations. But they were limited in many ways such as lacking knowledge grounding, multimodality and multi-utility. In this thesis, we focus on methods that address these numerous issues that have been imposing the gap between artificial conversational agents and human-level interlocutors. These methods were proposed and experimented with in ways that were inspired by general state-of-the-art AI methodologies. But they also targeted the characteristics that dialogue systems possess. First of all, we expand the variety of information that dialogue systems can be dependent on. In its simplest and most common form, a dialogue consists of responses and their preceding textual context. This representation, however, falls short compared to real-world human conversation, which is often dependent on other modalities and specific knowledge bases. To the end of conditioning dialogues on more modalities, we explore dialogue generation augmented by the audio representation of the input. We design an auxiliary response classification task to learn suitable audio representation for our dialogue generation objective. We use word-level modality fusion for integrating audio features into the Sequence to Sequence learning framework. Our model can generate appropriate responses corresponding to the emotion and emphasis expressed in the audio. Commonsense knowledge has to be integrated into the dialogue system effectively for it to respond to human utterances in an interesting and engaging way. As the first attempt to integrating a large commonsense knowledge base into end-to-end conversational models, we propose a model to jointly take into account the context and its related commonsense knowledge for selecting an appropriate response. We demonstrate that the knowledge-augmented models are superior to their knowledge-free counterparts. While the two directions mentioned above endeavor to ground the dialogues on various new information, they are not the only challenges that dialogue systems face. Traditionally, the goal of building intelligent dialogue systems has largely been separately pursued assuming two separate utilities: task-oriented dialogue systems, which perform task-specific functions, and open-domain dialogue systems, which focus on non-goal-oriented chitchat. The two dialogue modes can potentially be intertwined together seamlessly in the same conversation, as easily done by a friendly human assistant. This thesis also covers our effort on addressing the problem of fusing the two dialogue modes in multi-turn dialogues. We build a new dataset FusedChat, which contains conversation sessions containing exchanges from both dialogue modes with inter-mode contextual dependency. We propose two baseline models on this task and analyze their accuracy. Last but not least, we demonstrate our effort on addressing the computational efficiency issue that large-scale retrieval-based dialogue systems face. Strong retrieval-based dialogue systems that are based on a large natural candidate set can produce diverse and controllable responses. However, a large candidate set could be computationally costly. We propose methods that support a fast and accurate response retrieval system. To boost accuracy, we adopt a knowledge distillation approach where a very strong yet computationally expensive joint encoding model is used to facilitate training our encoders. We then boost the retrieval speed by adopting a learning-based candidate screening method to further reduce inference time. We demonstrate that our model performs strongly in terms of retrieval accuracy and speed trade-off. In summary, this thesis systematically demonstrates our effort on innovating dialogue systems. Through our experiments, we found that through new designs based upon general state-of-the-art NLP methodologies, dialogue systems can be made faster, multimodal, capable of multiple utilities and grounded on useful external information. We believe that the research questions that we focused on are important aspects for ultimately improving automated dialogue agents to human-level. The main contribution of the works covered in the thesis lies in their initializing effects (to a certain degree) on these directions that have been continuously worked on by researchers till this day. With our effort of innovating dialogue systems spanning the last 4 years, and state-of-the-art NLP models fast evolving year by year, we note that the models used in some of our works in the earlier years (e.g., LSTMs) cannot compete with the state-of-the-art models available today (e.g., GPT4). In such cases, we briefly and systematically explain following works (current state-of-the-art) that stemmed from the methodologies shown in our work, especially those based on recent advances of large language models.
URI: https://hdl.handle.net/10356/168465
DOI: 10.32657/10356/168465
DOI (Related Dataset): 10.21979/N9/QWEBOS
Schools: School of Computer Science and Engineering 
Research Centres: Computational Intelligence Lab 
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
Methods_and_applications__towards_human_level_dialogue_systems__revised_.pdfmy thesis1.8 MBAdobe PDFThumbnail
View/Open

Page view(s)

199
Updated on Jun 22, 2024

Download(s) 50

201
Updated on Jun 22, 2024

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.