Please use this identifier to cite or link to this item:
Title: Soft-error tolerant design for satellite-board computations
Authors: Zhang, Lei
Keywords: DRNTU::Engineering
Issue Date: 2016
Abstract: High energy particles in the outer space could flip the state of the latches of the electronic devices. The upset of the latches is called soft error since it would not cause permanent damage to the device. However, the soft error may cause faults in a processor and lead to malfunction of the computer system on the satellite. In this thesis, we implement a novel soft error fault tolerant scheme based on the LEON2/3 processor to protect the processor from soft errors. We verify the correctness and evaluate the overhead of this scheme, and we also determine the critical resource which should be protected. Our scheme includes two parts: sensor network and rollback scheme. A sensor is used to monitor a target register. It will assert if the monitored register is flipped because of a soft error. The rollback scheme is modified from a synchronization feature of the LEON2 and LEON3 processors. This feature originally aims to synchronize among Floating Point Unit (FPU), CACHE and Integer Unit (IU). We make use of it to stall the IU when a soft error is detected, and recover from the error by re-executing the current operation. To verify the correctness of the sensor and rollback scheme, we inject errors during the execution of a large number of instructions of LEON2/3. The results show that all instruction rollbacks are correct. To evaluate the overhead of the scheme, we determine the time and resource penalties of our scheme. The test results show that the scheme incurs only one extra clock penalty in about 90\% of test cases and increases 0.282% of resource usage of the original processor for adding one 32-bit sensor. To identify the critical resource for protection, we define the weight of instructions based on the frequency of instruction usage. Moreover, we monitor the number of accesses of the internal registers/bits in LEON3. Then we compute the impact factor (IF) for each internal register and status bit according to the register access frequencies and instruction weight. Using this approach, we could figure out the most critical resources according to the impact factor. The results show that there are 233 registers and status bits in total. Of which, 91 of them have 100% IF which suggests they are the critical resources. 43 of them have an IF ranging between 1.9% and 97.6%, which means they are less important. The IF of the remaining 89 is 0 because they are not used in our target application. Our work shows that the sensor network and rollback scheme can protect the processor from soft errors while incurring minimal penalties. By analyzing the vulnerability of registers and status bits, we can selectively deploy limited resource on the most critical registers/bits. Therefore, our scheme could effectively improve the robustness of computer systems for satellite applications.
DOI: 10.32657/10356/69206
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
Thesis_Final_Submission.pdfmain article6.8 MBAdobe PDFThumbnail

Page view(s)

Updated on Jun 21, 2021

Download(s) 50

Updated on Jun 21, 2021

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.