Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/162483
Title: FAT: an in-memory accelerator with fast addition for ternary weight neural networks
Authors: Zhu, Shien
Duong, Luan H. K.
Chen, Hui
Liu, Di
Liu, Weichen
Keywords: Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Computer science and engineering::Hardware::Arithmetic and logic structures
Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems
Issue Date: 2022
Source: Zhu, S., Duong, L. H. K., Chen, H., Liu, D. & Liu, W. (2022). FAT: an in-memory accelerator with fast addition for ternary weight neural networks. IEEE Transactions On Computer-Aided Design of Integrated Circuits and Systems. https://dx.doi.org/10.1109/TCAD.2022.3184276
Project: MOE2019-T2-1-071
MOE2019-T1-001-072
M4082282
M4082087
Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract: Convolutional Neural Networks (CNNs) demonstrate excellent performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity than BWNs, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient. In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to skip the null operations on zero weights. Second, we propose a fast addition scheme based on the memory Sense Amplifier to avoid the time overhead of both carry propagation and writing back the carry to memory cells. Third, we further propose a Combined-Stationary data mapping to reduce the data movement of activations and weights and increase the parallelism across memory columns. Simulation results show that for addition operations at the Sense Amplifier level, FAT achieves 2.00× speedup, 1.22× power efficiency and 1.22× area efficiency compared with a State-Of-The-Art IMC accelerator ParaPIM. FAT achieves 10.02× speedup and 12.19× energy efficiency compared with ParaPIM on networks with 80% average sparsity.
URI: https://hdl.handle.net/10356/162483
ISSN: 0278-0070
DOI: 10.1109/TCAD.2022.3184276
Rights: © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/TCAD.2022.3184276.
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Journal Articles

Files in This Item:
File Description SizeFormat 
TCAD_2021_FAT_Final_Submitted_Latex 2022-6-14.pdfMain PDF, Accepted Version4.07 MBAdobe PDFView/Open

Page view(s)

8
Updated on Nov 30, 2022

Download(s)

9
Updated on Nov 30, 2022

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.