Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/168487
Title: Natural language processing as autoregressive generation
Authors: Lin, Xiang
Keywords: Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Issue Date: 2023
Publisher: Nanyang Technological University
Source: Lin, X. (2023). Natural language processing as autoregressive generation. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/168487
Abstract: The advances in deep learning have led to great achievements in many Natural Language Processing (NLP) tasks. With the nature of language, i.e., sequential data, most NLP tasks can be framed into the sequence learning framework, such as text generation. As one of the most important foundations for modern NLP techniques, autoregressive generation models have achieved dominant performance in a great deal of NLP tasks. Therefore, this thesis emphasizes improving the autoregressive generation model for different NLP tasks. While many tasks can naturally fit into the sequence learning framework, some of them, e.g., building discourse parsing tree, require sophisticated designs to fit into neural models. Therefore, this thesis firstly emphasizes a novel unified framework for discourse parsing, which builds a discourse tree in a top-down depth-first manner, and it frames the task as an autoregressive generation task with the goal of each step being the prediction of the node position given a piece of text. The proposed approach is proven effective with extensive empirical experiments. In addition, I extend the above framework by proposing a hierarchical decoder, which leverages the information from parents and siblings of the nodes that are currently processed. The proposed decoder utilizes the nature of the tree structure and further improves the experiment performance on both discourse parsing and dependency parsing tasks. On the other hand, the de facto strategies, i.e., cross entropy loss and teacher forcing, for training the autoregressive generation models have been shown problematic in certain aspects. For example, cross entropy loss, which is one of the widely leveraged training objective functions, often leads to text degeneration in text generation, and teacher forcing suffers from the exposure bias problem, where there exists a mismatch between the training and testing setup. For text degeneration, I introduce a class of diminishing attentions, which enforces the submodularity of the coverage calculated by cross attention in the sequence-to-sequence model. The proposed diminishing attentions achieve notable improvement on several neural text generation tasks, including text summarization, machine translation, and image paragraph generation. Further, I propose a novel training objective, ScaleGrad, to replace cross entropy, which significantly reduces the degeneration problem in different text generation tasks. In fact, ScaleGrad can be extended to problems beyond text degeneration. It provides wide flexibility to inject different inductive biases into the text generation model by directly modifying the gradient information in the output layer. Next, for the exposure bias problem, this thesis introduces a novel type of scheduled sampling based on training accuracy, which requires only minimal hyper-parameter tuning compared to existing scheduled sampling methods. Additionally, a novel imitation loss is proposed to further enforce the model’s generative behavior to match the teacher-forced behavior. Moreover, this thesis demonstrates that reducing exposure bias can improve the robustness of language models against repetition and toxic errors.
URI: https://hdl.handle.net/10356/168487
DOI: 10.32657/10356/168487
Schools: School of Computer Science and Engineering 
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
Amended_thesis_LinXiang.pdfThesis2.64 MBAdobe PDFThumbnail
View/Open

Page view(s)

287
Updated on Jun 16, 2024

Download(s) 50

130
Updated on Jun 16, 2024

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.