Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/166053
Title: Disentangled image representation: from affine transforms to facial attributes
Authors: Liu, Letao
Keywords: Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Issue Date: 2023
Publisher: Nanyang Technological University
Source: Liu, L. (2023). Disentangled image representation: from affine transforms to facial attributes. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/166053
Abstract: Deep learning has shown unprecedented performance on computer vision tasks in recent years. One of the foundations of deep learning is the large datasets with human annotations. However, the datasets with human annotations are born with natural drawbacks. First, the cost of human annotations is expensive, especially with tasks such as segmentation. Next, the annotation itself may not be correct, which could be due to the subjective nature of the problem. Last but not least, if we wish the algorithm to evolve in real-world scenarios, it is not possible to keep annotating all the surrounding objects in real-time. To better utilize the algorithm in real-world scenarios, we want to deploy deep learning with minimal human annotation, for example, in an unsupervised or self supervised manner. To be more specific, we tackle this problem from the perspec tive of generative models and disentangled representation. With generative mod els, the outputs of the model can be visualized. With disentangled representation, different attributes learned by the model can be separated. The combination of those two approaches provides a pathway to aligning the visualized attributes with human instincts. To learn the disentangled representation in an unsupervised or self-supervised manner, we tackle this problem from the perspective of contrastive learning and inductive bias. With contrastive learning, we can produce more data samples by transforming the original data and comparing the differences between them. With inductive bias, we can formulate a meaningful relationship between the transformed and original data sample pairs. In this thesis, we demonstrate the effectiveness of inductive bias such as affine transforms and facial attributes. In summary, the thesis contributes to the disentangled image representation, which provides a pathway for us to understand the output of the generative model in a more vivid manner by visualizing the results and aligning with human intuition.
URI: https://hdl.handle.net/10356/166053
DOI: 10.32657/10356/166053
Schools: School of Electrical and Electronic Engineering 
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:EEE Theses

Files in This Item:
File Description SizeFormat 
letao's thesis minor revision.pdf41.79 MBAdobe PDFThumbnail
View/Open

Page view(s)

243
Updated on Mar 25, 2025

Download(s) 50

94
Updated on Mar 25, 2025

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.