Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/184134
Title: Defending large language models against adversarial attacks using watermarking
Authors: Wee, Nicholas Chun We
Keywords: Computer and Information Science
Issue Date: 2025
Publisher: Nanyang Technological University
Source: Wee, N. C. W. (2025). Defending large language models against adversarial attacks using watermarking. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/184134
Abstract: This project investigates fingerprinting techniques for Large Language Models (LLMs) to ensure secure model attribution while preserving performance and efficiency. We evaluate several existing methods, including Instructional Fingerprinting (IF), and show that they are vulnerable to adversarial unlearning attacks which can effectively erase the embedded fingerprints. In contrast, our proposed approach, which employs AES encryption to encode predefined trigger-response pairs, proves robust against such attacks by preserving fingerprint integrity even after adversarial interference. Empirical results show that fingerprinted models not only retain their generation quality—with improved perplexity scores indicating better fluency and coherence—but also incur minimal inference latency overhead, confirming the harmlessness and efficiency of the method. These findings highlight the viability of AES-based fingerprinting as a reliable and tamper-resistant mechanism for securing LLMs, paving the way for more accountable and secure deployment in real-world applications.
URI: https://hdl.handle.net/10356/184134
Schools: College of Computing and Data Science 
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:CCDS Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
Nicholas_Wee_Chun_We_FYP.pdf
  Restricted Access
1.35 MBAdobe PDFView/Open

Page view(s)

11
Updated on May 5, 2025

Download(s)

1
Updated on May 5, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.