Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/73042
Title: Design automation flow for partial run-time reconfiguration on FPGAs
Authors: Mao, Fubing
Keywords: DRNTU::Engineering::Computer science and engineering
Issue Date: 2017
Source: Mao, F. (2017). Design automation flow for partial run-time reconfiguration on FPGAs. Doctoral thesis, Nanyang Technological University, Singapore.
Abstract: Field-Programmable Gate Array (FPGA) is a programmable hardware that allows post-manufacturing configuration to meet application-specific functionality and requirement. Partial Reconfiguration (PR) is an advanced feature in modern FPGAs that enables the configuration of the FPGA to be altered at runtime. This provides the means to maximize the utilization of the limited FPGA resources to support more functions and shorten the time to market of the product. However, there is a lack of efficient computer-aided design (CAD) tools for placement and routing that support partial reconfiguration on FPGAs. Traditional approaches usually rely on manual partitioning and placement, which is an error-prone and tedious process, and requires huge efforts and long development cycle due to the large design space. While traditional FPGA design flow usually employs fine-grained tile-based placement, modular placement is increasingly required to speed up the large-scale placement and reduce the synthesis time. Moreover, the commonly used modules can be pre-synthesized and stored in the library for design reuse to significantly lower the design time, verification time and development cost. To address the problems mentioned above, this research attempts to fill the gap by proposing an automatic mapping flow for efficient PR. The thesis has three major contributions. Firstly, we propose a library-based placement and routing flow, which best utilizes the pre-placed and routed modules from the library to significantly save the execution time while considering area-delay products of each module with different ratios and optimizing area and delay of the final design. The flow supports both the static and reconfigurable modules. The modular information is represented in a B*-Tree structure, and the B*-Tree operations are amended and used with Simulated Annealing (SA) to enable rapid exploration of the placement space. Different width-height ratios of the modules are exploited to achieve area and delay optimization. Partial reconfiguration-aware routing using pin-to-wire abutment is proposed to connect the modules after placement. Our placer can reduce the compilation time on average with acceptable area and delay overhead compared to tile-based results from the Versatile Place and Route (VPR) tool through the reuse of module information in the library for the target architecture. Secondly, we propose a dynamic module partitioning approach for the library based design flow to dynamically generate the appropriate shape of modules based on single-ratio modules in the library while efficiently utilizing the pre-placement module information. A set of rules are developed to select the most suitable module and determine the partition to minimize the area and delay of the placement without increasing much of the synthesis time. The proposed approach can adapt to different architectures and also address the fixed-outline constraint. Experiment results show that our approach can reduce the area by up to 10% with marginal increase in delay and acceptable runtime. Finally, we explore the automatic workflow mapping in the interposer based multi-FPGA system. We propose a two-stage modular placement flow for interposer based multiple FPGAs aiming for delay optimization with the incorporation of a detailed interposer routing model for wirelength and delay estimation. We adopt the force-directed method for its global property to obtain an efficient solution as a starting point of the placement. Next, we employ simulated annealing (SA) for its efficiency and effectiveness in refining the initial solution. In order to speed up the refinement process, the hierarchical B*-tree (HB*-tree) is employed to enable a fast search and convergence. The experiment results demonstrate that our flow can achieve an efficient solution in reasonable time and can scale well for different design sizes.
URI: http://hdl.handle.net/10356/73042
DOI: 10.32657/10356/73042
Schools: School of Computer Science and Engineering 
Research Centres: Centre for High Performance Embedded Systems 
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
fbmao_thesis.pdf5.68 MBAdobe PDFThumbnail
View/Open

Page view(s) 50

525
Updated on Jul 23, 2024

Download(s) 50

184
Updated on Jul 23, 2024

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.