Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/184316
Title: | Speak without leaks: a modular pipeline for data-level privacy-preserving utilization of large language models | Authors: | Lee, Ci Hui | Keywords: | Computer and Information Science | Issue Date: | 2025 | Publisher: | Nanyang Technological University | Source: | Lee, C. H. (2025). Speak without leaks: a modular pipeline for data-level privacy-preserving utilization of large language models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/184316 | Abstract: | The widespread adoption of Large Language Models (LLMs) across domains has raised significant concerns about data privacy, particularly when fine-tuning these models on domain-specific or user-generated content that may contain sensitive information. This project addresses the challenge of preserving privacy during LLM fine-tuning by proposing a modular, data-centric pipeline that applies privacy-preserving transformations to training data or prompt before model utilization. Unlike techniques that require changes to model architecture or training algorithms, the proposed pipeline operates independently of the underlying model, making it suitable for black-box scenarios where model internals are inaccessible. The pipeline integrates a suite of privacy-preserving methods — including classical anonymization, format-preserving encryption (FPE), and local differential privacy (LDP) — to sanitize sensitive content at different levels. The Implementation covers key phases such as entity identification, data sanitization, preprocessing, and model fine-tuning. Experiments conducted on benchmark text classification tasks demonstrate the trade-offs between privacy protection and model utility, with evaluation metrics highlighting the impact of different sanitization strategies. This work contributes a practical and extensible framework for privacy-aware LLM deployment, offering insights into how organizations can responsibly fine-tune language models on sensitive data or query a third party black-box model with sensitive prompt without compromising compliance or exposing confidential information. | URI: | https://hdl.handle.net/10356/184316 | Schools: | College of Computing and Data Science | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | CCDS Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
CCDS24-0415_FYP_report_Lee Ci Hui.pdf Restricted Access | 3.24 MB | Adobe PDF | View/Open |
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.