Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/183829
Title: | Mitigating backdoor attacks in large language model-based recommendation systems: a defense and unlearning approach | Authors: | Salimin, Joanne Christina | Keywords: | Computer and Information Science | Issue Date: | 2022 | Publisher: | Nanyang Technological University | Source: | Salimin, J. C. (2022). Mitigating backdoor attacks in large language model-based recommendation systems: a defense and unlearning approach. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/183829 | Abstract: | Large Language Models (LLMs) have become integral to modern Recommendation Systems (RS) due to their scalability and ability to learn from diverse, large-scale datasets. However, these systems are increasingly vulnerable to data poisoning backdoor attacks, where adversaries embed hidden triggers within training data to manipulate recommendations. This paper investigates such vulnerabilities by focusing on the P5 model—a T5-based framework that processes multiple recommendation-related tasks as natural language prompts. Through four distinct attack strategies (Sleeper and MTBA with both unusual and legitimate item-based triggers), we demonstrate how malicious prompts can induce harmful outputs, including targeted refusals, negative sentiment, toxicity, and product endorsement. To defend against these attacks, we adopt the BEEAR framework, originally designed to neutralize general jailbreaking attempts. While BEEAR proves adept at mitigating certain adversarial behaviors, our experiments reveal its limitations when dealing with more diverse or context-specific triggers—particularly those not captured by the framework’s pre-specified harmful tokens. To address this shortcoming, we propose a clean unlearning procedure that exposes backdoored outputs, enabling targeted penalization of malicious triggers in a refined defense loop. Our results show a marked reduction in Attack Success Rates across all scenarios while preserving overall model performance, underscoring the need for specialized backdoor detection and removal strategies in LLM-based recommender systems. | URI: | https://hdl.handle.net/10356/183829 | Schools: | College of Computing and Data Science | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | CCDS Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
JoanneChristinaSalimin_FYP (1).pdf Restricted Access | 1.05 MB | Adobe PDF | View/Open |
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.