Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/146245
Title: Deep neural network compression : from sufficient to scarce data
Authors: Chen, Shangyu
Keywords: Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Issue Date: 2021
Publisher: Nanyang Technological University
Source: Chen, S. (2021). Deep neural network compression : from sufficient to scarce data. Doctoral thesis, Nanyang Technological University, Singapore.
Abstract: The success of overparameterized deep neural networks (DNNs) poses a great challenge to deploy computationally expensive models on edge devices. Numerous model compression (pruning, quantization) methods have been proposed to overcome this challenge: Pruning eliminates unimportant parameters, while quantization converts full-precision parameters into integers. Both shrink model size and accelerate inference. However, existing methods reply on a large amount of training data. In real cases such as medical domain, it is consuming to collect training data, due to extensive human effort or data privacy. To tackle the problem of model compression in scarce data scenario, in this thesis, I have summarized my previous works on model compression, from using sufficient data to scarce data. My early phase's work focused on model compression in a layer-wise manner: The loss of layer-wise compression is studied and corresponding compression solutions are proposed for alleviation. The layer-wise process enables fewer data dependency in quantization. This work is summarized in Chapter 3. Following model quantization using scarce data, I proposed to prune model on a cross-domain setting in Chapter 4. It aims at improving compression performance on tasks with limited data, with the assistance of rich-resource tasks. Specially, a dynamic and cooperative pruning strategy is utilized to prune both source and target network simultaneously. In Chapter 5, I try to solve the non-differentiable problem in training-based compression, where the pruning or quantization operations prevent gradient backward propagation from loss to trainable parameters. I proposed to use a meta neural network to penetrate the compression operation. The network receives input as trainable parameters and accessible gradients, and outputs gradients for parameters update. By incorporating the meta network into compression training, empirical experiments demonstrate a faster learning rate and better performance. Although works on Chapter 3 and 4 alleviate model compression tasks on scarce data. They either required a pre-trained model or addition cost in compressing another model. In Chapter 6, an arbitrary scarce-data task is able to be compressed, with the inspiration from Chapter 5: I proposed to learn meta-knowledge from multiple model compression tasks using a meta-learning framework. The knowledge is embedded in an initialization for all tasks and a meta neural network which provides gradient during training. When a novel task arrives, it starts from the initialization and is trained by the guidance of meta neural network to reach compressed version in very few steps.
URI: https://hdl.handle.net/10356/146245
DOI: 10.32657/10356/146245
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
Model-Compression.pdf10.93 MBAdobe PDFView/Open

Page view(s)

266
Updated on May 18, 2022

Download(s) 20

172
Updated on May 18, 2022

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.