Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/182918
Title: | Mitigating style-image hallucination in large vision language models | Authors: | He, Guoshun | Keywords: | Engineering | Issue Date: | 2025 | Publisher: | Nanyang Technological University | Source: | He, G. (2025). Mitigating style-image hallucination in large vision language models. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/182918 | Abstract: | LLMs are widely applied across various domains, yet a significant challenge remains—their performance deteriorates sharply in out-of-domain scenarios, often leading to increased hallucinations. Despite its importance, this phenomenon has received limited attention in academic research. To address this, we first construct a benchmark dataset using style transfer techniques and employ it to evaluate the out-of-domain performance of several popular large-scale models. Building upon these findings, we introduce CopeCap, a lightweight image captioning model that leverages collaborative prompting to achieve strong out-of-domain performance without requiring additional training. | URI: | https://hdl.handle.net/10356/182918 | Schools: | School of Electrical and Electronic Engineering | Fulltext Permission: | embargo_restricted_20270311 | Fulltext Availability: | With Fulltext |
Appears in Collections: | EEE Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
NTU_EEE_MSc_Dissertation_He Guoshun(1).pdf Until 2027-03-11 | 6.09 MB | Adobe PDF | Under embargo until Mar 11, 2027 |
Page view(s)
61
Updated on Mar 21, 2025
Download(s)
2
Updated on Mar 21, 2025
Google ScholarTM
Check
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.