Please use this identifier to cite or link to this item:
Title: Reconstruction and manipulation of portraits
Authors: Song, Guoxian
Keywords: Engineering::Computer science and engineering
Issue Date: 2022
Publisher: Nanyang Technological University
Source: Song, G. (2022). Reconstruction and manipulation of portraits. Doctoral thesis, Nanyang Technological University, Singapore.
Abstract: The large growth in consumer digital cameras and smart phones has led to prevalent use of digital photography. One typical category is portraiture, where a photograph is taken of a person's face or upper body. Analysis of portraits for 3D reconstruction and further manipulation has received a lot of attention, as it is highly relevant to Virtual Reality (VR) and Augmented Reality (AR), as well as other forms of entertainment. This thesis presents my research contributions to image-based facial analysis, involving 3D facial geometry estimation and facial reflectance inference as well as advanced manipulation techniques for portrait relighting, shadow generation and portrait stylization. Part 1 describes methods for portrait reconstruction. In particular, two frameworks are presented that involve the well-known 3D Morphable Model representation. The first framework targets 3D face-eye performance capture under extreme occlusion, while the second work handles reconstruction of facial reflectance maps and geometry for faces with significant specular reflections. More specifically, in Chapter 3, I present a CNN-based 3D face-eye capture system for users wearing head mounted displays (HMD) users. Our system integrates a 3D parametric gaze model into the 3D morphable face model, and can be used to produce a digital personalized avatar given an exterior RGB image of a user's face occluded by an HMD and an infrared (IR) eye image from the interior of the HMD, with no calibration needed. Moreover, to train the facial and eye gaze neural networks, we collect face and VR IR eye data from multiple subjects, and synthesize pairs of HMD face data with expression labels. In Chapter 4, I describe a model-based method to recover photorealistic facial reflectance and geometry from two video streams of a subject in two views. After estimating initial facial geometry and texture map, the framework then jointly infers specular and diffuse reflectance components, with further refinement of geometry. This leads to significant improvement over prior art. By allowing for better reconstruction of the shape of faces with specular reflections, leading to more compelling rendering of faces with specular effects can be made under new viewpoints. Part 2 describes approaches for portrait manipulation. In particular, three frameworks are presented that involve referred portrait neural relighting, shadow-aware portrait relighting for virtual background, and portrait stylization using limited exemplars. More specifically, in Chapter 5, I present an image-based deep generative model that can dynamically relight half-body portrait images. Key technical contributions include the proposed over-complete lighting representation, the multiplicative neural rendering, and the separation of background and foreground for illumination feature encoding. We have also created a large rendered dataset with annotated and controlled lighting that is suitable for training our model, and which has sufficient photorealism to allow our model to be directly applied to real images. In Chapter 6, I present a new shadow-aware portrait relighting system that can relight an input portrait to be consistent with a given desired background image, including perceptually important shadow effects. Our system consists of four major components: portrait neutralization, illumination estimation, shadow generation and hierarchical neural rendering, which are all based on deep neural networks, with the whole system being end-to-end trainable. The extensive experiments demonstrate that our shadow-aware relighting system outperforms state-of-the-art portrait relighting methods in terms of producing more lighting-consistent images with shadow effects. In Chapter 7, I present AgileGAN, a framework that can generate high quality stylistic portraits via inversion-consistent transfer learning. We propose a novel hierarchical variational autoencoder to ensure the inverse mapped distribution conforms to the original latent Gaussian distribution of the well-known StyleGAN model, while augmenting its original space to a multi-resolution latent space so as to better encode different levels of detail. We show that we can achieve superior portrait stylization quality to previous state-of-the-art methods, with comparisons done qualitatively, quantitatively and through a perceptual user study.
DOI: 10.32657/10356/155400
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
mythesis.pdf90.2 MBAdobe PDFView/Open

Page view(s)

Updated on Feb 7, 2023

Download(s) 50

Updated on Feb 7, 2023

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.