Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/162629
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Yang, Xu | en_US |
dc.contributor.author | Zhang, Hanwang | en_US |
dc.contributor.author | Cai, Jianfei | en_US |
dc.date.accessioned | 2022-11-01T06:51:01Z | - |
dc.date.available | 2022-11-01T06:51:01Z | - |
dc.date.issued | 2021 | - |
dc.identifier.citation | Yang, X., Zhang, H. & Cai, J. (2021). Deconfounded image captioning: a causal retrospect. IEEE Transactions On Pattern Analysis and Machine Intelligence, 3121705-. https://dx.doi.org/10.1109/TPAMI.2021.3121705 | en_US |
dc.identifier.issn | 0162-8828 | en_US |
dc.identifier.uri | https://hdl.handle.net/10356/162629 | - |
dc.description.abstract | Dataset bias in vision-language tasks is becoming one of the main problems which hinders the progress of our community. Existing solutions lack a principled analysis about why modern image captioners easily collapse into dataset bias. In this paper, we present a novel perspective: Deconfounded Image Captioning (DIC), to find out the answer of this question, then retrospect modern neural image captioners, and finally propose a DIC framework: DICv1.0 to alleviate the negative effects brought by dataset bias. DIC is based on causal inference, whose two principles: the backdoor and front-door adjustments, help us review previous studies and design new effective models. In particular, we showcase that DICv1.0 can strengthen two prevailing captioning models and can achieve a single-model 131.1 CIDEr-D and 128.4 c40 CIDEr-D on Karpathy split and online split of the challenging MS COCO dataset, respectively. Interestingly, DICv1.0 is a natural derivation from our causal retrospect, which opens promising directions for image captioning. | en_US |
dc.language.iso | en | en_US |
dc.relation.ispartof | IEEE Transactions on Pattern Analysis and Machine Intelligence | en_US |
dc.rights | © 2021 IEEE. All rights reserved. | en_US |
dc.subject | Engineering::Computer science and engineering | en_US |
dc.title | Deconfounded image captioning: a causal retrospect | en_US |
dc.type | Journal Article | en |
dc.contributor.school | School of Computer Science and Engineering | en_US |
dc.identifier.doi | 10.1109/TPAMI.2021.3121705 | - |
dc.identifier.pmid | 34673483 | - |
dc.identifier.scopus | 2-s2.0-85123727842 | - |
dc.identifier.spage | 3121705 | en_US |
dc.subject.keywords | Image Captioning | en_US |
dc.subject.keywords | Causality | en_US |
item.grantfulltext | none | - |
item.fulltext | No Fulltext | - |
Appears in Collections: | SCSE Journal Articles |
SCOPUSTM
Citations
20
8
Updated on Jan 28, 2023
Page view(s)
15
Updated on Feb 3, 2023
Google ScholarTM
Check
Altmetric
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.