Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/184134
Title: | Defending large language models against adversarial attacks using watermarking | Authors: | Wee, Nicholas Chun We | Keywords: | Computer and Information Science | Issue Date: | 2025 | Publisher: | Nanyang Technological University | Source: | Wee, N. C. W. (2025). Defending large language models against adversarial attacks using watermarking. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/184134 | Abstract: | This project investigates fingerprinting techniques for Large Language Models (LLMs) to ensure secure model attribution while preserving performance and efficiency. We evaluate several existing methods, including Instructional Fingerprinting (IF), and show that they are vulnerable to adversarial unlearning attacks which can effectively erase the embedded fingerprints. In contrast, our proposed approach, which employs AES encryption to encode predefined trigger-response pairs, proves robust against such attacks by preserving fingerprint integrity even after adversarial interference. Empirical results show that fingerprinted models not only retain their generation quality—with improved perplexity scores indicating better fluency and coherence—but also incur minimal inference latency overhead, confirming the harmlessness and efficiency of the method. These findings highlight the viability of AES-based fingerprinting as a reliable and tamper-resistant mechanism for securing LLMs, paving the way for more accountable and secure deployment in real-world applications. | URI: | https://hdl.handle.net/10356/184134 | Schools: | College of Computing and Data Science | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | CCDS Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Nicholas_Wee_Chun_We_FYP.pdf Restricted Access | 1.35 MB | Adobe PDF | View/Open |
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.