Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/183921
Title: An empirical study of convolution-based and transformer-based diffusion models
Authors: Loh, Joel Rui Jie
Keywords: Computer and Information Science
Issue Date: 2025
Publisher: Nanyang Technological University
Source: Loh, J. R. J. (2025). An empirical study of convolution-based and transformer-based diffusion models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/183921
Abstract: This project presents an empirical comparison between convolution-based U-Net and transformer-based DiT-S/2 diffusion models on the CIFAR-10 dataset. To address the lack of spatial inductive bias in DiT, we implement three architectural enhancements: Frequency-Based Noise Control (FNC), Overlapping Patch Embeddings (OPE), and Adaptive Positional Encoding (APE). Results show that DiT-S/2 outperforms U-Net in semantic diversity (IS: 5.02 vs. 4.56), but lags in structural fidelity (PSNR: 5.84 vs. 7.65, SSIM: 0.03 vs. 0.04). OPE and APE improve DiT’s image quality, while FNC sig- nificantly reduces inference time. These findings highlight the importance of architectural priors in improving transformer-based diffusion, especially for low-resolution tasks. We hypothesize that similar trends will hold in larger DiT variants, given more training re- sources. This study offers practical insights into enhancing transformer-based generative models through targeted inductive bias.
URI: https://hdl.handle.net/10356/183921
Schools: College of Computing and Data Science 
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:CCDS Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
LohRuiJieJoel_FYP_Report.pdf
  Restricted Access
Final Year Project2 MBAdobe PDFView/Open

Page view(s)

13
Updated on May 5, 2025

Download(s)

3
Updated on May 5, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.