Please use this identifier to cite or link to this item:
Title: Learning language to symbol and language to vision mapping for visual grounding
Authors: He, Su
Yang, Xiaofeng
Lin, Guosheng
Keywords: Engineering::Computer science and engineering
Issue Date: 2022
Source: He, S., Yang, X. & Lin, G. (2022). Learning language to symbol and language to vision mapping for visual grounding. Image and Vision Computing, 122, 104451-.
Project: AISG-RP-2018-003 
RG28/18 (S) 
RG22/19 (S) 
Journal: Image and Vision Computing 
Abstract: Visual Grounding (VG) is a task of locating a specific object in an image semantically matching a given linguistic expression. The mapping of the linguistic and visual contents and the understanding of diverse linguistic expressions are the two challenges of this task. The performance of visual grounding is consistently improved by deep visual features in the last few years. While deep visual features contain rich information, they could also be noisy, biased and easily over-fitted. In contrast, symbolic features are discrete, easy to map and usually less noisy. In this work, we propose a novel modular network learning to match both the object's symbolic features and conventional visual features with the linguistic information. Moreover, the Residual Attention Parser is designed to alleviate the difficulty of understanding diverse expressions. Our model achieves competitive performance on three popular datasets of VG.
ISSN: 0262-8856
DOI: 10.1016/j.imavis.2022.104451
Schools: School of Computer Science and Engineering 
Rights: © 2022 Elsevier B.V. All rights reserved.
Fulltext Permission: embargo_20240707
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Journal Articles

Files in This Item:
File Description SizeFormat 
Learning Language to Symbol and Language to Vision.pdf
  Until 2024-07-07
1.5 MBAdobe PDFUnder embargo until Jul 07, 2024

Citations 50

Updated on Sep 26, 2023

Page view(s)

Updated on Sep 23, 2023

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.