Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/181210
Title: Additive quantization for truly tiny compressed diffusion models
Authors: Hasan, Adil
Keywords: Computer and Information Science
Issue Date: 2024
Publisher: Nanyang Technological University
Source: Hasan, A. (2024). Additive quantization for truly tiny compressed diffusion models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181210
Abstract: Tremendous investments have been made towards the commodification of diffusion models for generation of diverse media. Their mass-market adoption is however still hobbled by the intense hardware resource requirements of diffusion model inference. Model quantization strategies tailored specifically towards diffusion models have seen considerable success in easing this burden, yet without exception have explored only the Uniform Scalar Quantization (USQ) family of quantization methods. In contrast, Vector Quantization (VQ) methods, which replace groups of multiple related weights with indices into codebooks, have recently taken the parallel field of Large Language Model (LLM) quantization by storm. In this FYP project, we for the first time apply codebook-based additive vector quantization algorithms to the problem of diffusion model compression. We are rewarded with state-of-the-art results on the important class-conditional benchmark of LDM-4 on ImageNet at 20 inference time steps, in- cluding sFID as much as 1.93 points lower than the full-precision model at W4A8, the best-reported results for FID, sFID and ISC at W2A8, and the first-ever successful quantization to W1.5A8 (less than 1.5 bits stored per weight). Furthermore, our pro- posed method allows for a dynamic trade-off between quantization-time GPU hours and inference-time savings, in line with the recent trend of approaches blending the best as- pects of post-training quantization (PTQ) and quantization-aware training (QAT), and demonstrates FLOPs savings on arbitrary hardware via an efficient inference kernel.
URI: https://hdl.handle.net/10356/181210
Schools: College of Computing and Data Science 
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:CCDS Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
FYP Report Final.pdf
  Restricted Access
2.94 MBAdobe PDFView/Open

Page view(s)

109
Updated on Mar 16, 2025

Download(s)

9
Updated on Mar 16, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.