Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/183921
Title: | An empirical study of convolution-based and transformer-based diffusion models | Authors: | Loh, Joel Rui Jie | Keywords: | Computer and Information Science | Issue Date: | 2025 | Publisher: | Nanyang Technological University | Source: | Loh, J. R. J. (2025). An empirical study of convolution-based and transformer-based diffusion models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/183921 | Abstract: | This project presents an empirical comparison between convolution-based U-Net and transformer-based DiT-S/2 diffusion models on the CIFAR-10 dataset. To address the lack of spatial inductive bias in DiT, we implement three architectural enhancements: Frequency-Based Noise Control (FNC), Overlapping Patch Embeddings (OPE), and Adaptive Positional Encoding (APE). Results show that DiT-S/2 outperforms U-Net in semantic diversity (IS: 5.02 vs. 4.56), but lags in structural fidelity (PSNR: 5.84 vs. 7.65, SSIM: 0.03 vs. 0.04). OPE and APE improve DiT’s image quality, while FNC sig- nificantly reduces inference time. These findings highlight the importance of architectural priors in improving transformer-based diffusion, especially for low-resolution tasks. We hypothesize that similar trends will hold in larger DiT variants, given more training re- sources. This study offers practical insights into enhancing transformer-based generative models through targeted inductive bias. | URI: | https://hdl.handle.net/10356/183921 | Schools: | College of Computing and Data Science | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | CCDS Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
LohRuiJieJoel_FYP_Report.pdf Restricted Access | Final Year Project | 2 MB | Adobe PDF | View/Open |
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.