Neural architectures for natural language understanding
Date of Issue2019
School of Computer Science and Engineering
Empowering machines with the ability to read and reason live at the heart of Artificial Intelligence (AI) research. Language is ubiquitous, serving as a key communication mechanism that is woven tightly into the fabric of society and humanity. The pervasiveness of textual content is made evident by the billions of documents, social posts, and messages on the web. As such, the ability to make sense, reason and understand textual content has immense potential to benefit a large range of real-world applications such as search, question answering, recommender systems, and/or personal chat assistants. This thesis tackles the problem of natural language understanding (NLU) and in particular, problem domains that fall under the umbrella of NLU, e.g., question answering, machine reading comprehension, natural language inference, retrieval-based NLU, etc. More specifically, we study machine learning models (in particular, neural architectures), for solving a suite of NLU problems. The key goal is to enable machines to be able to read and comprehend natural language. We make several novel contributions in this thesis, mainly revolving around the design of neural architectures for NLU problems. The key contributions are listed as follows: 1) We propose two new state-of-the-art neural models for natural language inference: ComProp Alignment-Factorized Encoders (CAFE) and Co-Stack Residual Affinity Networks (CSRAN). On the single model setting, CAFE and CSRAN achieve 88.5% accuracy and 88.7% accuracy respectively on the well-studied SNLI benchmark. 2) We propose Multi-Cast Attention Networks (MCAN) for retrieval-based NLU. On Ubuntu dialogue corpus, MCAN outperforms the existing state-of-the-art models by 9%. MCAN also achieves the best-performing score of 0.838 MAP and 0.904 MRR on the well-studied TrecQA dataset. 3) We propose Densely Connected Attention Propagation (DecaProp), a new model designed for machine reading comprehension (MRC) on the web. We achieve state-of-the-art performance on reading tests on news and Wikipedia articles. DecaProp achieves 2.6%-14.2% absolute improvement in F1 score over the existing state-of-the-art on four challenging MRC datasets. 4) We propose Introspective Alignment Reader and Curriculum Pointer-Generator (IAL-CPG) model for reading and understanding long narratives. IAL-CPG achieves state-of-the-art performance on the NarrativeQA reading comprehension challenge. On metrics such as BLEU-4 and Rouge-L, we achieve a 17% relative improvement over prior state-of-the-art and a 10 times improvement in terms of BLEU-4 score over BiDAF, a strong span prediction based model. 5) We propose Multi-Pointer Co-Attention Networks (MPCN) for recommendations with reviews. On Amazon Reviews dataset, MPCN improves the existing state-of-the-art DeepCoNN and D-ATT model by up to 71% and 5% respectively in terms of relative improvement. 6) Moreover, we propose two novel general-purpose encoding units for sequence encoding for natural language understanding: Dilated Compositional Units (DCU) and Recurrently Controlled Recurrent Networks (RCRN). DCU achieves state-of-the-art on the RACE dataset, demonstrating improvement over LSTM/GRU encoders by $6\%$. On the other hand, RCRN outperforms stacked BiLSTMs and BiLSTMs across 26 NLP/NLU datasets. 7) Finally, we propose two novel techniques for efficient training and inference of NLU models: HyperQA (Hyperbolic NLU) and Quaternion Attention/Quaternion Transformer Models. HyperQA outperforms strong attention and recurrent baselines while being extremely lightweight (40K to 90K parameters). On the other hand, Quaternion Attention/Quaternion Transformers enables up to 75% parameter reduction while maintaining competitive performance.
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence