Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/162629
Title: | Deconfounded image captioning: a causal retrospect | Authors: | Yang, Xu Zhang, Hanwang Cai, Jianfei |
Keywords: | Engineering::Computer science and engineering | Issue Date: | 2021 | Source: | Yang, X., Zhang, H. & Cai, J. (2021). Deconfounded image captioning: a causal retrospect. IEEE Transactions On Pattern Analysis and Machine Intelligence, 3121705-. https://dx.doi.org/10.1109/TPAMI.2021.3121705 | Journal: | IEEE Transactions on Pattern Analysis and Machine Intelligence | Abstract: | Dataset bias in vision-language tasks is becoming one of the main problems which hinders the progress of our community. Existing solutions lack a principled analysis about why modern image captioners easily collapse into dataset bias. In this paper, we present a novel perspective: Deconfounded Image Captioning (DIC), to find out the answer of this question, then retrospect modern neural image captioners, and finally propose a DIC framework: DICv1.0 to alleviate the negative effects brought by dataset bias. DIC is based on causal inference, whose two principles: the backdoor and front-door adjustments, help us review previous studies and design new effective models. In particular, we showcase that DICv1.0 can strengthen two prevailing captioning models and can achieve a single-model 131.1 CIDEr-D and 128.4 c40 CIDEr-D on Karpathy split and online split of the challenging MS COCO dataset, respectively. Interestingly, DICv1.0 is a natural derivation from our causal retrospect, which opens promising directions for image captioning. | URI: | https://hdl.handle.net/10356/162629 | ISSN: | 0162-8828 | DOI: | 10.1109/TPAMI.2021.3121705 | Rights: | © 2021 IEEE. All rights reserved. | Fulltext Permission: | none | Fulltext Availability: | No Fulltext |
Appears in Collections: | SCSE Journal Articles |
SCOPUSTM
Citations
20
8
Updated on Jan 28, 2023
Page view(s)
15
Updated on Jan 29, 2023
Google ScholarTM
Check
Altmetric
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.