Please use this identifier to cite or link to this item:
Title: Deep neural network compression for pixel-level vision tasks
Authors: He, Wei
Keywords: Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Issue Date: 2021
Publisher: Nanyang Technological University
Source: He, W. (2021). Deep neural network compression for pixel-level vision tasks. Master's thesis, Nanyang Technological University, Singapore.
Abstract: Deep convolutional neural networks (DCNNs) have demonstrated remarkable performance in many computer vision tasks. In order to achieve this, DCNNs typically require a large number of trainable parameters that are optimized to extract informative features. This often results in over-parameterization of the DCNN models, which incurs high computational complexity and large storage requirements that hinder their deployment on embedded devices with stringent computational and memory resources. In this thesis, we aim to develop DCNN compression methods to generate compact DCNN models that can still produce comparable performance as the original models. In particular, our proposed methods must lend themselves well towards DCNN models for pixel-level vision tasks (such as semantic segmentation and crowd counting). DCNN compression for pixel-level vision tasks has not been thoroughly investigated, as existing works mainly target the less challenging image-level classification task. We first present a framework that utilizes knowledge distillation to recover the performance loss of DCNN models that have undergone network pruning. This departs from the existing knowledge distillation approaches, where the student model and teacher model are pre-defined before knowledge adaptation. Experiments on the encoder-decoder type models for semantic segmentation demonstrate that the proposed framework can effectively recover the performance loss of the compact student model after aggressive pruning in most cases. However, in certain cases, knowledge transfer cannot outperform the conventional fine-tuning process on the pruned semantic segmentation architectures. Next, we propose Context-Aware Pruning (CAP) that utilizes channel association, which captures the contextual information, to exploit parameters redundancy for pruning semantic segmentation models. We evaluated our framework on widely-used benchmarks and showed its effectiveness on both large and lightweight models. Our framework reduces the number of parameters of state-of-the-art semantic segmentation models PSPNet101, PSPNet-50, ICNet, and SegNet, by 32%, 47%, 54%, and 63% respectively on the Cityscapes dataset. This reduction is achieved while preserving the best performance among all the baselines pruning methods considered. Finally, we propose Adaptive Correlation-driven Sparsity Learning (ACSL) for DCNN compression that can provide superior performance on both image-level and pixel-level vision tasks. ACSL extends CAP by inducing sparsity into the channel importance with an adaptive penalty strength. The experimental results demonstrate that ACSL outperforms state-of-the-art pruning methods on image-level classification, semantic segmentation, and dense crowd counting tasks. In particular for the crowd counting task, the proposed ACSL framework is able to reduce the DCNN model parameters by up to 94%, while maintaining the same performance of (at times outperforming) the original model.
DOI: 10.32657/10356/150076
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
NTU_MEng_Thesis_He_Wei_Final.pdf24.95 MBAdobe PDFView/Open

Page view(s)

Updated on May 16, 2022

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.