Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/175074
Title: | Learning deep networks for image classification | Authors: | Zhou, Yixuan | Keywords: | Computer and Information Science | Issue Date: | 2024 | Publisher: | Nanyang Technological University | Source: | Zhou, Y. (2024). Learning deep networks for image classification. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175074 | Project: | SCSE23-0210 | Abstract: | Visual Question Answering stands at the intersection of computer vision and natural language processing, bridging the semantic gap between visual information and textual queries. The dominant approach for this complex task, end-to-end models, do not demonstrate the difference between visual processing and reasoning, leading to constraints in both interpretation and generalization. The exploration of modular program learning emerges as a promising alternative, although its implementation proves intricate due to the challenges in learning the modules and programs simultaneously. This project introduces VQA-GPT, a framework employing code generation models and the Python interpreter for composing vision-and-language modules to produce results for textual queries. This zero-shot method outperforms traditional end-to-end models in solving various complex visual tasks. | URI: | https://hdl.handle.net/10356/175074 | Schools: | School of Computer Science and Engineering | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | SCSE Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
FYP_Report_Amended.pdf Restricted Access | 11.46 MB | Adobe PDF | View/Open |
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.