Please use this identifier to cite or link to this item:
Full metadata record
DC FieldValueLanguage
dc.contributor.authorLuo, Haonanen_US
dc.contributor.authorLin, Guoshengen_US
dc.contributor.authorLiu, Zichuanen_US
dc.contributor.authorLiu, Fayaoen_US
dc.contributor.authorTang, Zhenminen_US
dc.contributor.authorYao, Yazhouen_US
dc.identifier.citationLuo, H., Lin, G., Liu, Z., Liu, F., Tang, Z., & Yao, Y. (2019). SegEQA : video segmentation based visual attention for embodied question answering. Proceedings of the International Conference on Computer Vision (ICCV) 2019. doi:10.1109/ICCV.2019.00976en_US
dc.description.abstractEmbodied Question Answering (EQA) is a newly defined research area where an agent is required to answer the user's questions by exploring the real world environment. It has attracted increasing research interests due to its broad applications in automatic driving system, in-home robots, and personal assistants. Most of the existing methods perform poorly in terms of answering and navigation accuracy due to the absence of local details and vulnerability to the ambiguity caused by complicated vision conditions. To tackle these problems, we propose a segmentation based visual attention mechanism for Embodied Question Answering. Firstly, We extract the local semantic features by introducing a novel high-speed video segmentation framework. Then by the guide of extracted semantic features, a bottom-up visual attention mechanism is proposed for the Visual Question Answering (VQA) sub-task. Further, a feature fusion strategy is proposed to guide the training of the navigator without much additional computational cost. The ablation experiments show that our method boosts the performance of VQA module by 4.2% (68.99% vs 64.73%) and leads to 3.6% (48.59% vs 44.98%) overall improvement in EQA accuracy.en_US
dc.description.sponsorshipAI Singaporeen_US
dc.description.sponsorshipMinistry of Education (MOE)en_US
dc.description.sponsorshipNational Research Foundation (NRF)en_US
dc.relationRG126/17 (S)en_US
dc.rights© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at:
dc.subjectEngineering::Computer science and engineeringen_US
dc.titleSegEQA : video segmentation based visual attention for embodied question answeringen_US
dc.typeConference Paperen
dc.contributor.schoolSchool of Computer Science and Engineeringen_US
dc.contributor.conferenceInternational Conference on Computer Vision (ICCV) 2019en_US
dc.description.versionAccepted versionen_US
dc.subject.keywordsComputer Visionen_US
dc.subject.keywordsImage Fusionen_US
dc.citation.conferencelocationSeoul, Korea (South)en_US
dc.description.acknowledgementThe authors would like to thank the financial support from the program of China Scholarships Council (No.201806840059). This work is partly sup­ported by the National Research Foundation Singapore un­der its AI Singapore Programme [AISG-RP-2018-003] and the MOE Tier-I research grant [RG126/17 (S)]. We would like to thank NVIDIA for GPU donation. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.en_US
item.fulltextWith Fulltext-
Appears in Collections:SCSE Conference Papers
Files in This Item:
File Description SizeFormat 
gusoheng paper4 iccv 2019.pdf3.67 MBAdobe PDFThumbnail

Citations 20

Updated on May 25, 2023

Web of ScienceTM
Citations 20

Updated on May 28, 2023

Page view(s)

Updated on Jun 2, 2023

Download(s) 20

Updated on Jun 2, 2023

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.