Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/183829
Title: Mitigating backdoor attacks in large language model-based recommendation systems: a defense and unlearning approach
Authors: Salimin, Joanne Christina
Keywords: Computer and Information Science
Issue Date: 2022
Publisher: Nanyang Technological University
Source: Salimin, J. C. (2022). Mitigating backdoor attacks in large language model-based recommendation systems: a defense and unlearning approach. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/183829
Abstract: Large Language Models (LLMs) have become integral to modern Recommendation Systems (RS) due to their scalability and ability to learn from diverse, large-scale datasets. However, these systems are increasingly vulnerable to data poisoning backdoor attacks, where adversaries embed hidden triggers within training data to manipulate recommendations. This paper investigates such vulnerabilities by focusing on the P5 model—a T5-based framework that processes multiple recommendation-related tasks as natural language prompts. Through four distinct attack strategies (Sleeper and MTBA with both unusual and legitimate item-based triggers), we demonstrate how malicious prompts can induce harmful outputs, including targeted refusals, negative sentiment, toxicity, and product endorsement. To defend against these attacks, we adopt the BEEAR framework, originally designed to neutralize general jailbreaking attempts. While BEEAR proves adept at mitigating certain adversarial behaviors, our experiments reveal its limitations when dealing with more diverse or context-specific triggers—particularly those not captured by the framework’s pre-specified harmful tokens. To address this shortcoming, we propose a clean unlearning procedure that exposes backdoored outputs, enabling targeted penalization of malicious triggers in a refined defense loop. Our results show a marked reduction in Attack Success Rates across all scenarios while preserving overall model performance, underscoring the need for specialized backdoor detection and removal strategies in LLM-based recommender systems.
URI: https://hdl.handle.net/10356/183829
Schools: College of Computing and Data Science 
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:CCDS Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
JoanneChristinaSalimin_FYP (1).pdf
  Restricted Access
1.05 MBAdobe PDFView/Open

Page view(s)

14
Updated on May 5, 2025

Download(s)

1
Updated on May 5, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.