Please use this identifier to cite or link to this item:
|Title:||LAMP: load-balanced multipath parallel transmission in point-to-point NoCs||Authors:||Chen, Hui
|Keywords:||Engineering::Computer science and engineering::Computer systems organization||Issue Date:||2022||Source:||Chen, H., Chen, P., Luo, X., Huai, S. & Liu, W. (2022). LAMP: load-balanced multipath parallel transmission in point-to-point NoCs. IEEE Transactions On Computer-Aided Design of Integrated Circuits and Systems. https://dx.doi.org/10.1109/TCAD.2022.3151021||Project:||MoE2019-T2-1-071
|Journal:||IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems||Abstract:||Network-on-Chip (NoC) is an emerging paradigm that is able to connect a significant amount of processing elements (PEs). However, as a distributed sub-system, NoC resources have not been exploited to the fullest. Multipath parallel transmission, which splits one message into multiple parts and sends them simultaneously, shows its efficiency in utilizing NoC resources and further reducing the transmission latency. However, this method is not fully optimized in previous works, especially for emerging point-to-point NoCs due to the following reasons: (1) only limited shortest paths are chosen; (2) static message splitting strategy without considering NoC utilization state increases contentions; (3) the optimization of hardware that supports multipath parallel transmission is missing, resulting in additional overheads. Thus, we propose LAMP, a software and hardware collaborated design to efficiently utilize resources and reduce latency in point-to-point NoCs through the load-balanced multipath parallel transmission. Specifically, we propose a reinforcement learning-based algorithm to decide when and how to split messages, and which path should be used according to traffic loads. Also, the temporal and spatial load-balancing algorithms are proposed so that the message size is adjusted properly to utilize NoC resources. Moreover, we revise the hardware design to support multipath parallel transmission efficiently. Extensive experiments show that our algorithm achieves a remarkable performance improvement (+18.0% ∼ +29.9%) when compared with the state-of-the-art dual-path algorithm. Our hardware design decreases power and area consumption by 23.2% and 10.3% over the dual-path hardware.||URI:||https://hdl.handle.net/10356/159209||ISSN:||0278-0070||DOI:||10.1109/TCAD.2022.3151021||Rights:||© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/TCAD.2022.3151021.||Fulltext Permission:||open||Fulltext Availability:||With Fulltext|
|Appears in Collections:||SCSE Journal Articles|
Updated on Aug 17, 2022
Updated on Aug 17, 2022
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.