Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/172665
Title: | UniD3: unified discrete diffusion for simultaneous vision-language generation | Authors: | Hu, Minghui Zheng, Chuanxia Cham, Tat-Jen Suganthan, Ponnuthurai Nagaratnam Yang, Zuopeng Zheng, Heliang Wang, Chaoyue Tao, Dacheng |
Keywords: | Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision | Issue Date: | 2023 | Source: | Hu, M., Zheng, C., Cham, T., Suganthan, P. N., Yang, Z., Zheng, H., Wang, C. & Tao, D. (2023). UniD3: unified discrete diffusion for simultaneous vision-language generation. 2023 International Conference on Learning Representations (ICLR), 1-23. | Conference: | 2023 International Conference on Learning Representations (ICLR) | Abstract: | The recently developed discrete diffusion model performs extraordinarily well in generation tasks, especially in the text-to-image task, showing great potential for modeling multimodal signals. In this paper, we leverage these properties and present a unified multimodal generation model, which can perform text-based, image-based, and even vision-language simultaneous generation using a single model. Specifically, we unify the discrete diffusion process for multimodal signals by proposing a unified Markov transition matrix and a unified objective. Moreover, we design a multimodal mutual attention module to highlight the inter-modal linkages, which is vital for multimodal generation. Extensive experiments indicate that our proposed method can perform comparably to the state-of-the-art solutions in various generation tasks. | URI: | https://hdl.handle.net/10356/172665 | URL: | https://openreview.net/forum?id=8JqINxA-2a | Schools: | School of Computer Science and Engineering | Rights: | © 2023 The Author(s). All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at https://openreview.net/forum?id=8JqINxA-2a. | Fulltext Permission: | open | Fulltext Availability: | With Fulltext |
Appears in Collections: | SCSE Conference Papers |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
2023-Hu_etal-ICLR-UniD3_Unified_discrete_diffusion_vision_language.pdf | Conference full paper | 16.7 MB | Adobe PDF | View/Open |
Page view(s)
196
Updated on Sep 16, 2024
Download(s) 50
55
Updated on Sep 16, 2024
Google ScholarTM
Check
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.