Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/183980
Title: Hardware constrained deep learning: an empirical analysis of dynamic quantisation across computer vision and natural language processing domains
Authors: Sai, Shein Htet
Keywords: Computer and Information Science
Issue Date: 2025
Publisher: Nanyang Technological University
Source: Sai, S. H. (2025). Hardware constrained deep learning: an empirical analysis of dynamic quantisation across computer vision and natural language processing domains. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/183980
Abstract: This study provides a comprehensive analysis of deep learning model optimisation techniques for resource-constrained edge devices, with a focus on the Raspberry Pi platform. The research evaluates three implementation approaches—PyTorch, ONNX conversion, and dynamic post-training quantisation on ONNX — across diverse deep learning architectures in both computer vision (CV) and natural language processing (NLP) domains. Through systematic benchmarking of performance metrics including model size, accuracy, inference speed, memory utilisation, and thermal characteristics, the study reveals that optimisation effectiveness varies dramatically across architectural paradigms. In the CV domain, while ResNet50 demonstrated remarkable resilience to quantisation, maintaining accuracy while achieving 75% size reduction, while efficiencyfocused architectures like EfficientNet experienced significant accuracy collapse. Similarly, in the NLP domain, DistilBERT exhibited strong quantisation resilience with only a 12% relative accuracy drop, while SqueezeBERT suffered a significant 36% decline to near-random performance. The research also identifies a significant memory utilisation paradox across both domains, where some quantized models consumed more runtime memory despite reduced model sizes. ONNX conversion emerged as a universally beneficial strategy in both CV and NLP models, improving inference speed by 30-48% without compromising accuracy. These findings highlight the critical importance of architecture-specific optimisation approaches rather than one-size-fits-all strategies for edge deployment, providing practical guidelines for balancing the competing demands of model capability and deployment feasibility on resource-constrained devices.
URI: https://hdl.handle.net/10356/183980
Schools: College of Computing and Data Science 
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:CCDS Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
FYP_FINAL_Sai_Shein Htet.pdf
  Restricted Access
1.01 MBAdobe PDFView/Open

Page view(s)

279
Updated on May 7, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.