Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/175074
Title: Learning deep networks for image classification
Authors: Zhou, Yixuan
Keywords: Computer and Information Science
Issue Date: 2024
Publisher: Nanyang Technological University
Source: Zhou, Y. (2024). Learning deep networks for image classification. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175074
Project: SCSE23-0210 
Abstract: Visual Question Answering stands at the intersection of computer vision and natural language processing, bridging the semantic gap between visual information and textual queries. The dominant approach for this complex task, end-to-end models, do not demonstrate the difference between visual processing and reasoning, leading to constraints in both interpretation and generalization. The exploration of modular program learning emerges as a promising alternative, although its implementation proves intricate due to the challenges in learning the modules and programs simultaneously. This project introduces VQA-GPT, a framework employing code generation models and the Python interpreter for composing vision-and-language modules to produce results for textual queries. This zero-shot method outperforms traditional end-to-end models in solving various complex visual tasks.
URI: https://hdl.handle.net/10356/175074
Schools: School of Computer Science and Engineering 
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
FYP_Report_Amended.pdf
  Restricted Access
11.46 MBAdobe PDFView/Open

Page view(s)

85
Updated on May 7, 2025

Download(s)

5
Updated on May 7, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.