Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/165565
Title: Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization
Authors: Huai, Shuo
Liu, Di
Kong, Hao
Liu, Weichen
Subramaniam, Ravi
Makaya, Christian
Lin, Qian
Keywords: Engineering::Computer science and engineering::Software::Software engineering
Issue Date: 2023
Source: Huai, S., Liu, D., Kong, H., Liu, W., Subramaniam, R., Makaya, C. & Lin, Q. (2023). Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization. Future Generation Computer Systems, 142, 314-327. https://dx.doi.org/10.1016/j.future.2022.12.021
Project: I1801E0028 
Journal: Future Generation Computer Systems
Abstract: Deep learning applications have been widely adopted on edge devices, to mitigate the privacy and latency issues of accessing cloud servers. Deciding the number of neurons during the design of a deep neural network to maximize performance is not intuitive. Particularly, many application scenarios are real-time and have a strict latency constraint, while conventional neural network optimization methods do not directly change the temporal cost of model inference for latency-critical edge systems. In this work, we propose a latency-oriented neural network learning method to optimize models for high accuracy while fulfilling the latency constraint. For efficiency, we also introduce a universal hardware-customized latency predictor to optimize this procedure to learn a model that satisfies the latency constraint by only a one-shot training process. The experiment results reveal that, compared to state-of-the-art methods, our approach can well-fit the ‘hard’ latency constraint and achieve high accuracy. Under the same training settings as the original model and satisfying a 34 ms latency constraint on the ImageNet-100 dataset, we reduce GoogLeNet's latency from 40.32 ms to 34 ms with a 0.14% accuracy reduction on the NVIDIA Jetson Nano. When coupled with quantization, our method can be further improved to only 0.04% drop for GoogLeNet. On the NVIDIA Jetson TX2, we compress VGG-19 from 119.98 ms to 34 ms and even improve its accuracy by 0.5%, and we scale GoogLeNet up from 20.27 ms to 34 ms and achieve higher accuracy by 0.78%. We also open source this framework at https://github.com/ntuliuteam/ZeroBN.
URI: https://hdl.handle.net/10356/165565
ISSN: 0167-739X
DOI: 10.1016/j.future.2022.12.021
Schools: School of Computer Science and Engineering 
Research Centres: HP-NTU Digital Manufacturing Corporate Lab
Rights: © 2022 Elsevier B.V. All rights reserved.
Fulltext Permission: none
Fulltext Availability: No Fulltext
Appears in Collections:SCSE Journal Articles

SCOPUSTM   
Citations 50

3
Updated on May 14, 2024

Page view(s)

140
Updated on May 18, 2024

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.