Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/183806
Title: Making language models better human-like learners
Authors: Qin, Chengwei
Keywords: Computer and Information Science
Issue Date: 2025
Publisher: Nanyang Technological University
Source: Qin, C. (2025). Making language models better human-like learners. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/183806
Abstract: Recent advancements in language models (LMs) have led to impressive performance in a wide range of natural language processing (NLP) tasks. However, there remains a significant gap between the learning capabilities of these LMs and those of humans. One major distinction lies in the efficiency and flexibility of human learning. Humans can quickly grasp new concepts with only a few labeled examples and continually learn new tasks throughout their lifetime without forgetting previously acquired knowledge. In contrast, LMs typically require large amounts of data to generalize effectively and suffer from catastrophic forgetting when adapting to new tasks with different data distributions, where previously learned knowledge is forgotten. This thesis addresses these challenges by focusing on two key aspects of human-like learning: (1) few-shot learning, where LMs need to generalize effectively from limited labeled data, and (2) continual (lifelong) learning, where LMs are expected to retain and accumulate knowledge when learning from a sequence of tasks. With these two goals in mind, we propose novel frameworks and learning algorithms that enable LMs to become better human-like learners, i.e., learning more efficiently from a few examples and adapting to ever-changing data distributions without catastrophic forgetting. Firstly, we propose meta prompt tuning (MPT), a method that systematically explores how meta-learning can enhance few-shot cross-task generalization in prompt tuning by learning to initialize prompt embeddings from relevant tasks. Through extensive experiments and analysis, we demonstrate both the effectiveness and limitations of MPT across various source/target task settings. Then, we investigate lifelong sequence generation (\lsg), a continual learning problem where the goal is to continuously train a model on a sequence of generation tasks, allowing it to learn new patterns as they emerge while preserving knowledge from previous tasks. Drawing inspiration from human learning, we propose dynamic module expansion and adaptation (\dmea), a framework that enables the model to dynamically adjust its architecture to acquire new knowledge based on task correlations and to select the most relevant previous tasks to facilitate adaptation to new tasks. Going one step further, we tackle a more challenging and realistic setting: continual few-shot learning. In this scenario, models are required to learn new tasks from a limited number of examples while adapting to an evolving sequence of tasks, closely mirroring the incremental learning process observed in humans. Considering that relation extraction serves as a fundamental step for a variety of downstream tasks in NLP, we explore continual few-shot relation learning (\crfl), where the model needs to learn relational patterns from a sequence of few-shot tasks continually. We solve this problem through embedding space regularization and data augmentation (\daer). Finally, recognizing the strong capabilities of current LMs in handling a wide range of tasks, we introduce a novel learning paradigm called lifelong few-shot language learning (LFLL) and propose LFPT5, a unified framework for LFLL based on prompt tuning, which can easily adapt to new types of tasks or new domains while retaining knowledge acquired from previously learned tasks. The work presented in this thesis contributes to improving LMs' efficiency, flexibility, and adaptability in learning, making them better suited for real-world applications where data is scarce and constantly evolving. By integrating advances in few-shot and continual learning, the research brings us closer to building LMs that not only perform better but also more closely emulate the cognitive learning processes of humans. Ultimately, this thesis demonstrates that LMs can become more robust, versatile, and capable learners, moving beyond traditional reliance on vast amounts of labeled data toward a more human-like learning paradigm.
URI: https://hdl.handle.net/10356/183806
DOI: 10.32657/10356/183806
Schools: College of Computing and Data Science 
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:CCDS Theses

Files in This Item:
File Description SizeFormat 
Phd_thesis_ChengweiQin_2025.pdf3.68 MBAdobe PDFView/Open

Page view(s)

72
Updated on May 5, 2025

Download(s) 50

114
Updated on May 5, 2025

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.