Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/174934
Title: Effective image synthesis for effective deep neural network training
Authors: Cui, Kaiwen
Keywords: Computer and Information Science
Issue Date: 2024
Publisher: Nanyang Technological University
Source: Cui, K. (2024). Effective image synthesis for effective deep neural network training. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/174934
Abstract: State-of-the-art deep neural networks (DNNs) necessitate a significant number of images to achieve accurate and robust models. However, gathering a large amount of images remains the prevailing approach, which proves expensive, time-consuming, and challenging to scale across different tasks and domains. To address this issue, a solution called data-limited image generation has been proposed. The primary concept behind this approach is to automatically generate valuable and effective images specifically for training purposes. In this thesis, we handle data-limited image generation from three very different perspectives, including regularization-based data limited image generation, augmentation-based data limited image generation and knowledge distillation-based data limited image generation. In regularization-based data-limited image generation, we mitigate the discriminator overfitting from the perspective of regularization. We propose a novel Generative Co-training (GenCo) network that adapts the co-training idea into data-limited generation for tackling its inherent over-fitting issue. Specifically, we design GenCo, a Generative Co-training network that mitigates the discriminator over-fitting issue by introducing multiple complementary discriminators that provide diverse supervision from multiple distinctive views in training. We instantiate the idea of GenCo in two ways. The first way is Weight-Discrepancy Co-training (WeCo) which co-trains multiple distinctive discriminators by diversifying their parameters. The second way is Data-Discrepancy Co-training (DaCo) which achieves co-training by feeding discriminators with different views of the input images. In augmentation-based data-limited image generation, we explore two novel augmentation based data-limited image generation approaches to achieve better generation performance. More specifically, we first introduce masked generative adversarial networks (MaskedGAN), which are robust image generation learners with limited training data and are masking strategy-based augmentation approach. The idea of MaskedGAN is simple: it randomly masks out certain image information for effective GAN training with limited data. We develop two masking strategies that work along orthogonal dimensions of training images, including a shifted spatial masking that masks the images in spatial dimensions with random shifts, and a balanced spectral masking that masks certain image spectral bands with self-adaptive probabilities. The two masking strategies complement each other which together encourage more challenging holistic learning from limited training data, ultimately suppressing trivial solutions and failures in GAN training. Secondly, we design LDA, a Learnable Data Augmentation technique that introduces adversarial attacking for mitigating the discriminator overfitting in data-efficient I2I translation. The core idea is adversarial spectrum dropout which decomposes images into multiple spectra in frequency space and learns to drop certain image spectra for generating effective adversarial samples. The proposed LDA works in spectral space that allows explicit access and manipulation of each image spectrum and accordingly enables direct attack of the easy-to-discriminate image spectra. It evolves dynamically with learnable parameters which is more scalable and can better mitigate the discriminator overfitting than hand-crafted and non-learnable augmentation strategies in most existing studies. In knowledge distillation-based data-limited image generation, we propose KDDLGAN, a knowledge-distillation based generation framework that introduces pretrained vision-language models for training effective data-limited generation models. KD-DLGAN consists of two innovative designs. The first is aggregated generative KD that mitigates the discriminator overfitting by challenging the discriminator with harder learning tasks and distilling more generalizable knowledge from the pre-trained models. The second is correlated generative KD that improves the generation diversity by distilling and preserving the diverse image-text correlation within the pre-trained models. Experimental results over various data-limited image generation benchmarks indicate our proposed approaches achieve superior performance with limited training data.
URI: https://hdl.handle.net/10356/174934
DOI: 10.32657/10356/174934
Schools: School of Computer Science and Engineering
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
Thesis_Kaiwen_revised.pdf42.25 MBAdobe PDFThumbnail
View/Open

Page view(s)

92
Updated on Jul 18, 2024

Download(s)

31
Updated on Jul 18, 2024

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.