Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/81197
Title: SPICE2: Spatial Processors Interconnected for Concurrent Execution for Accelerating the SPICE Circuit Simulator Using an FPGA
Authors: Kapre, Nachiket
DeHon, André
Keywords: Reconfigurable logic
Simulation
Parallelism
Issue Date: 2012
Source: Kapre, N., & DeHon, A. (2012). SPICE2: Spatial Processors Interconnected for Concurrent Execution for Accelerating the SPICE Circuit Simulator Using an FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 31(1), 9-22.
Series/Report no.: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract: Spatial processing of sparse, irregular, double-precision floating-point computation using a single field-programmable gate array (FPGA) enables up to an order of magnitude speedup (mean 2.8× speedup) over a conventional microprocessor for the SPICE circuit simulator. We develop a parallel, FPGA-based, heterogeneous architecture customized for accelerating the SPICE simulator to deliver this speedup. To properly parallelize the complete simulator, we decompose SPICE into its three constituent phases-model evaluation, sparse matrix-solve, and iteration control-and customize a spatial architecture for each phase independently. Our heterogeneous FPGA organization mixes very large instruction word, dataflow and streaming architectures into a cohesive, unified design to match the parallel patterns exposed by our programming framework. This FPGA architecture is able to outperform conventional processors due to a combination of factors, including high utilization of statically-scheduled resources, low-overhead dataflow scheduling of fine-grained tasks, and streaming, overlapped processing of the control algorithms. We demonstrate that we can independently accelerate model evaluation by a mean factor of 6.5 × (1.4-23×) across a range of nonlinear device models and matrix solve by 2.4×(0.6-13×) across various benchmark matrices while delivering a mean combined speedup of 2.8×(0.2-11×) for the composite design when comparing a Xilinx Virtex-6 LX760 (40 nm) with an Intel Core i7 965 (45 nm). We also estimate mean energy savings of 8.9× (up to 40.9×) when comparing a Xilinx Virtex-6 LX760 with an Intel Core i7 965.
URI: https://hdl.handle.net/10356/81197
http://hdl.handle.net/10220/39201
ISSN: 0278-0070
DOI: 10.1109/TCAD.2011.2173199
Schools: School of Computer Engineering 
Rights: © 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [http://dx.doi.org/10.1109/TCAD.2011.2173199].
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Journal Articles

SCOPUSTM   
Citations 20

20
Updated on May 17, 2024

Web of ScienceTM
Citations 20

17
Updated on Oct 28, 2023

Page view(s) 50

448
Updated on May 11, 2024

Download(s) 20

217
Updated on May 11, 2024

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.