Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/160140
Full metadata record
DC FieldValueLanguage
dc.contributor.authorChan, Kelvin Cheuk Kiten_US
dc.date.accessioned2022-07-14T00:54:06Z-
dc.date.available2022-07-14T00:54:06Z-
dc.date.issued2022-
dc.identifier.citationChan, K. C. K. (2022). Image and video super-resolution in the wild. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/160140en_US
dc.identifier.urihttps://hdl.handle.net/10356/160140-
dc.description.abstractWith the increasing need for high-resolution content, there is a need to develop super-resolution techniques that improve the resolution of images and videos captured from non-professional imaging devices. Researchers have made incessant efforts to improve the resolution of images and videos to meliorate user experience and enhance performance in downstream tasks. However, most existing approaches focus on designing an image-to-image mapping, failing in employing auxiliary information readily available in reality. As a result, such methods often possess suboptimal effectiveness and efficiency owing to inadequate information aggregation and large network complexity. In addition, it remains nontrivial to generalize to uncontrolled scenes, whose degradations could be complex, diverse, and unknown. This thesis proposes solutions for effective image and video super-resolution and generalization to real-world degradations through exploiting generative priors and temporal information. The thesis first demonstrates that pre-trained Generative Adversarial Networks (GANs), e.g., StyleGAN, can be used as a latent bank to improve the restoration quality of large-factor image super-resolution (SR). Our method, Generative LatEnt bANk (GLEAN), goes beyond existing practices by directly leveraging rich and diverse priors encapsulated in a pre-trained GAN. GLEAN can be easily incorporated in a simple encoder-bank-decoder architecture with multi-resolution skip connections. Images upscaled by GLEAN show clear improvements in terms of fidelity and texture faithfulness compared to existing methods. Second, we study the underlying mechanism of deformable alignment, which shows compelling performance in aligning multiple frames for video super-resolution. Specifically, we show that deformable convolution can be decomposed into a combination of spatial warping and convolution, revealing the commonality of deformable alignment and flow-based alignment in formulation, but with a key difference in their offset diversity. Based on our observations, we propose an offset-fidelity loss that guides the offset learning with optical flow. Experiments show that our loss successfully avoids the overflow of offsets and alleviates the instability problem of deformable alignment. Third, we reconsider some most essential components for video super-resolution guided by four basic functionalities, i.e., Propagation, Alignment, Aggregation, and Upsampling. By reusing some existing components added with minimal redesigns, we show a succinct pipeline, BasicVSR, that achieves appealing improvements in terms of speed and restoration quality in comparison to many state-of-the-art algorithms. We conduct a systematic analysis to explain how such gain can be obtained and discuss the pitfalls. We further show the extensibility of BasicVSR by presenting IconVSR and BasicVSR++. IconVSR contains an information-refill mechanism to alleviate the error accumulation problem, and a coupled propagation to faciiliate information flow during propagation. BasicVSR++ further enhances propagation and alignment with second-order grid propagation and flow-guided deformable alignment. Our BasicVSR series significantly outperforms existing works in both efficiency and output quality. Fourth, we provide solutions to tackle the unique challenges in real-world video super-resolution in inference and training, induced by the diversity and complexity of degradations. First, we introduce an image pre-cleaning stage to reduce noises and artifacts prior to propagation, substantially improving the output quality. Second, we provide analysis and solutions to the problems resulting from the increased computational burden in the task. In addition, to facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences containing rich textures and patterns. Our dataset can serve as a common ground for benchmarking.en_US
dc.language.isoenen_US
dc.publisherNanyang Technological Universityen_US
dc.relationI1901E0052en_US
dc.relation2018-T1-002-056en_US
dc.relationNTU SUG granten_US
dc.relationNTU NAPen_US
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).en_US
dc.subjectEngineering::Computer science and engineering::Computing methodologies::Artificial intelligenceen_US
dc.subjectEngineering::Computer science and engineering::Computing methodologies::Image processing and computer visionen_US
dc.titleImage and video super-resolution in the wilden_US
dc.typeThesis-Doctor of Philosophyen_US
dc.contributor.supervisorChen Change Loyen_US
dc.contributor.schoolSchool of Computer Science and Engineeringen_US
dc.description.degreeDoctor of Philosophyen_US
dc.identifier.doi10.32657/10356/160140-
dc.contributor.supervisoremailccloy@ntu.edu.sgen_US
item.fulltextWith Fulltext-
item.grantfulltextopen-
Appears in Collections:SCSE Theses
Files in This Item:
File Description SizeFormat 
Thesis_KelvinChan_2.pdf95.18 MBAdobe PDFThumbnail
View/Open

Page view(s) 50

437
Updated on Mar 28, 2024

Download(s) 20

226
Updated on Mar 28, 2024

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.