Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/156618
Full metadata record
DC FieldValueLanguage
dc.contributor.authorTan, Kuan Yeowen_US
dc.date.accessioned2022-04-21T05:26:57Z-
dc.date.available2022-04-21T05:26:57Z-
dc.date.issued2022-
dc.identifier.citationTan, K. Y. (2022). Grounding referring expressions in images with neural module tree network. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/156618en_US
dc.identifier.urihttps://hdl.handle.net/10356/156618-
dc.description.abstractGrounding referring expressions in images or visual grounding for short, is a task used in Artificial Intelligence (AI) to locate and identify a target object through localization of natural language in images. The complex task of visual grounding requires composite visual reasoning to better mimic the human logical thought process. However, existing methods do not extend towards the multiple components of natural language and over-simplify it into either a monolithic sentence embedding or a rough composition of subject-predicate-object. To venture more into the complexity of natural language, a Neural Module Tree network (NMTree) is applied on the dependency parsing tree of the referring expression during the visual grounding process. Each node of the dependency parsing tree is taken as a neural module that calculates visual attention where the grounding score is accumulated in a bottom-up fashion to the root node of the tree. Gumbel-Softmax approximation is utilized to train the modules and their assembly end-to-end reducing parsing errors. NMTree will allow for the composite reasoning portion to be more loosely coupled from the visual grounding providing more intuitive perception during localization. The inclusion of NMTree had provided better explanation of grounding natural language and outperforms state-of-the-arts on several benchmarks.en_US
dc.language.isoenen_US
dc.publisherNanyang Technological Universityen_US
dc.relationSCSE21-0519en_US
dc.subjectEngineering::Computer science and engineering::Computing methodologies::Image processing and computer visionen_US
dc.titleGrounding referring expressions in images with neural module tree networken_US
dc.typeFinal Year Project (FYP)en_US
dc.contributor.supervisorZhang Hanwangen_US
dc.contributor.schoolSchool of Computer Science and Engineeringen_US
dc.description.degreeBachelor of Engineering (Computer Science)en_US
dc.contributor.supervisoremailhanwangzhang@ntu.edu.sgen_US
item.grantfulltextrestricted-
item.fulltextWith Fulltext-
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)
Files in This Item:
File Description SizeFormat 
FYP_Report_Tan_Kuan_Yeow.pdf
  Restricted Access
1.21 MBAdobe PDFView/Open

Page view(s)

20
Updated on Jun 29, 2022

Download(s)

2
Updated on Jun 29, 2022

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.