Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/144345
Title: | SegEQA : video segmentation based visual attention for embodied question answering | Authors: | Luo, Haonan Lin, Guosheng Liu, Zichuan Liu, Fayao Tang, Zhenmin Yao, Yazhou |
Keywords: | Engineering::Computer science and engineering | Issue Date: | 2019 | Source: | Luo, H., Lin, G., Liu, Z., Liu, F., Tang, Z., & Yao, Y. (2019). SegEQA : video segmentation based visual attention for embodied question answering. Proceedings of the International Conference on Computer Vision (ICCV) 2019. doi:10.1109/ICCV.2019.00976 | Project: | AISG-RP-2018-003 RG126/17 (S) |
metadata.dc.contributor.conference: | International Conference on Computer Vision (ICCV) 2019 | Abstract: | Embodied Question Answering (EQA) is a newly defined research area where an agent is required to answer the user's questions by exploring the real world environment. It has attracted increasing research interests due to its broad applications in automatic driving system, in-home robots, and personal assistants. Most of the existing methods perform poorly in terms of answering and navigation accuracy due to the absence of local details and vulnerability to the ambiguity caused by complicated vision conditions. To tackle these problems, we propose a segmentation based visual attention mechanism for Embodied Question Answering. Firstly, We extract the local semantic features by introducing a novel high-speed video segmentation framework. Then by the guide of extracted semantic features, a bottom-up visual attention mechanism is proposed for the Visual Question Answering (VQA) sub-task. Further, a feature fusion strategy is proposed to guide the training of the navigator without much additional computational cost. The ablation experiments show that our method boosts the performance of VQA module by 4.2% (68.99% vs 64.73%) and leads to 3.6% (48.59% vs 44.98%) overall improvement in EQA accuracy. | URI: | https://hdl.handle.net/10356/144345 | DOI: | 10.1109/ICCV.2019.00976 | Schools: | School of Computer Science and Engineering | Rights: | © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/ICCV.2019.00976 | Fulltext Permission: | open | Fulltext Availability: | With Fulltext |
Appears in Collections: | SCSE Conference Papers |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
gusoheng paper4 iccv 2019.pdf | 3.67 MB | Adobe PDF | ![]() View/Open |
SCOPUSTM
Citations
20
18
Updated on May 25, 2023
Web of ScienceTM
Citations
20
14
Updated on May 28, 2023
Page view(s)
258
Updated on May 31, 2023
Download(s) 20
194
Updated on May 31, 2023
Google ScholarTM
Check
Altmetric
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.