Please use this identifier to cite or link to this item:
|Title:||Architecture centric coarse-grained FPGA overlays||Authors:||Abhishek Kumar Jain||Keywords:||DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems||Issue Date:||2017||Source:||Abhishek Kumar Jain. (2017). Architecture centric coarse-grained FPGA overlays. Doctoral thesis, Nanyang Technological University, Singapore.||Abstract:||Coarse-grained FPGA overlays have emerged as one possible solution to make FPGAs more accessible to application developers who are accustomed to software API abstractions and fast development cycles. Existing overlay architectures offer a number of advantages for general purpose hardware acceleration because of software-like programmability, fast compilation, application portability, and improved design productivity, but at the cost of area and performance overheads due to limited consideration for the underlying FPGA architecture. This thesis explores coarse grained overlays designed using the exible DSP48E1 primitive on Xilinx FPGAs, allowing pipelined execution of compute kernels at significantly higher throughput. We first evaluate an open source overlay architecture, DySER, mapped on the Xilinx Zynq device and show that DySER suffers from a significant area and performance overhead due to limited consideration for the underlying FPGA architecture. Next, we design and implement a more FPGA targeted overlay architecture that maximizes the peak performance and reduces the interconnect area overhead through the use of an array of DSP block based fully pipelined functional units and an island-style coarse-grained routing network. As the interconnect of the island-style overlay is still excessive, we next explore novel interconnect architectures to further reduce the interconnect area. We next develop DeCO, a cone shaped cluster of FUs, which shows 87% savings in LUT requirements compared to our island-style overlay, for a set of compute kernels. Our experimental evaluation shows that the proposed overlays exhibit frequencies close to the DSP theoretical limit and achieve high performance with significantly reduced area overheads. We also present a methodology for compiling high level language (C/OpenCL) descriptions of compute kernels onto DSP block based coarse-grained overlays. Our mapping ow provides a rapid, vendor independent mapping to the overlay, raising the abstraction level while also reducing compilation time significantly, hence addressing the design productivity issue.||URI:||http://hdl.handle.net/10356/69532||DOI:||10.32657/10356/69532||Fulltext Permission:||open||Fulltext Availability:||With Fulltext|
|Appears in Collections:||SCSE Theses|
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.