Please use this identifier to cite or link to this item:
Title: Enhancing performance of Tall-Skinny QR factorization using FPGAs
Authors: Rafique, Abid
Kapre, Nachiket
Constantinides, George A.
Keywords: Computer Science and Engineering
Issue Date: 2012
Source: Rafique, A., Kapre, N., & Constantinides, G. A. (2012). Enhancing performance of Tall-Skinny QR factorization using FPGAs. 22nd International Conference on Field Programmable Logic and Applications (FPL), 433-450.
metadata.dc.contributor.conference: 2012 22nd International Conference on Field Programmable Logic and Applications (FPL)
Abstract: Communication-avoiding linear algebra algorithms with low communication latency and high memory bandwidth requirements like Tall-Skinny QR factorization (TSQR) are highly appropriate for acceleration using FPGAs. TSQR parallelizes QR factorization of tall-skinny matrices in a divide-and-conquer fashion by decomposing them into sub-matrices, performing local QR factorizations and then merging the intermediate results. As TSQR is a dense linear algebra problem, one would therefore imagine GPU to show better performance. However, the performance of GPU is limited by the memory bandwidth in local QR factorizations and global communication latency in the merge stage. We exploit the shape of the matrix and propose an FPGA-based custom architecture which avoids these bottlenecks by using high-bandwidth on-chip memories for local QR factorizations and by performing the merge stage entirely on-chip to reduce communication latency. We achieve a peak double-precision floating-point performance of 129 GFLOPs on Virtex-6 SX475T. A quantitative comparison of our proposed design with recent QR factorization on FPGAs and GPU shows up to 7.7× and 12.7× speed up respectively. Additionally, we show even higher performance over optimized linear algebra libraries like Intel MKL for multi-cores, CULA for GPUs and MAGMA for hybrid systems.
DOI: 10.1109/FPL.2012.6339142
Schools: School of Computer Engineering 
Rights: © 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [].
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Conference Papers

Files in This Item:
File Description SizeFormat 
Enhancing performance of Tall-Skinny QR factorization using FPGAs.pdf937.57 kBAdobe PDFThumbnail

Citations 20

Updated on Aug 25, 2023

Page view(s)

Updated on Sep 23, 2023

Download(s) 20

Updated on Sep 23, 2023

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.