Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/81168
Title: | Communication Optimization of Iterative Sparse Matrix-Vector Multiply on GPUs and FPGAs | Authors: | Rafique, Abid Constantinides, George A. Kapre, Nachiket |
Keywords: | Graphics processing units (GPUs) Iterative numerical methods Spare matrix-vector multiply Matrix powers kernel Field programmable gate arrays (FPGAs) |
Issue Date: | 2015 | Source: | Rafique, A., Constantinides, G. A., & Kapre, N. (2015). Communication Optimization of Iterative Sparse Matrix-Vector Multiply on GPUs and FPGAs. IEEE Transactions on Parallel and Distributed Systems, 26(1), 24-34. | Series/Report no.: | IEEE Transactions on Parallel and Distributed Systems | Abstract: | Trading communication with redundant computation can increase the silicon efficiency of FPGAs and GPUs in accelerating communication-bound sparse iterative solvers. While k iterations of the iterative solver can be unrolled to provide O(k) reduction in communication cost, the extent of this unrolling depends on the underlying architecture, its memory model, and the growth in redundant computation. This paper presents a systematic procedure to select this algorithmic parameter k, which provides communication-computation tradeoff on hardware accelerators like FPGA and GPU. We provide predictive models to understand this tradeoff and show how careful selection of k can lead to performance improvement that otherwise demands significant increase in memory bandwidth. On an Nvidia C2050 GPU, we demonstrate a 1.9×-42.6× speedup over standard iterative solvers for a range of benchmarks and that this speedup is limited by the growth in redundant computation. In contrast, for FPGAs, we present an architecture-aware algorithm that limits off-chip communication but allows communication between the processing cores. This reduces redundant computation and allows large k and hence higher speedups. Our approach for FPGA provides a 0.3×-4.4× speedup over same-generation GPU devices where k is picked carefully for both architectures for a range of benchmarks. | URI: | https://hdl.handle.net/10356/81168 http://hdl.handle.net/10220/39128 |
ISSN: | 1045-9219 | DOI: | 10.1109/TPDS.2014.6 | Schools: | School of Computer Engineering | Rights: | © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [http://dx.doi.org/10.1109/TPDS.2014.6]. | Fulltext Permission: | open | Fulltext Availability: | With Fulltext |
Appears in Collections: | SCBE Journal Articles |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Communication Optimization of Iterative Sparse Matrix-Vector Multiply on GPUs and FPGAs.pdf | 750.61 kB | Adobe PDF | ![]() View/Open |
SCOPUSTM
Citations
20
19
Updated on Sep 28, 2023
Web of ScienceTM
Citations
20
17
Updated on Sep 20, 2023
Page view(s)
370
Updated on Sep 30, 2023
Download(s) 20
236
Updated on Sep 30, 2023
Google ScholarTM
Check
Altmetric
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.