A Fault Tolerant Compiler for Zero Silent Data Corruption
Soft errors, such as transient faults caused by alpha particles or cosmic rays, can alter signal transfers or stored data in microprocessors. As a result, these soft errors often cause incorrect program execution or “glitches”. The exponentially growing rate of soft errors makes reliability a top priority in modern processor design. While many solutions to soft errors focus on hardware improvements, software approaches can also be applied to existing processors. The most popular and effective software approaches involve in-application instruction duplication. However, existing techniques are unable to protect several important microarchitectural components, as well as a significant portion of instructions, resulting in silent Data Corruptions (SDCs). Current technologies, such as SWIFT, aim to completely eliminate SDCs by providing protection and partial instruction duplication. However, research has shown that SWIFT is not effective in completely eliminating SDCs, often due to insufficient protection and redundancy. Therefore, there is a need for a software that eliminates SDCs more efficiently than current technology.
Researchers at Arizona State University have developed a compiler approach to protect applications from soft errors called Zero silent Data Corruption (ZDC). This innovation creates a duplicated instruction stream with an extra copy of registers, and at some synchronization points, checks the data values in the redundant registers against the values in the original registers. ZDC is extremely effective because it comprehensively protects store, load, and compare & branch instructions through reloading and duplication methods. ZDC has made significant advances in terms of performance and failure rates, and has proven superior to SWIFT and other technologies during testing.
- Computer science
- Error reduction
Benefits and Advantages
- Improved Performance –
- Only 0.4% of faults lead to SDC with the ZDC version of the program. This is a significant reduction from the 18% of faults that result in SDC when using SWIFT.
- ZDC achieves this high-level of protection while incurring 10% less performance overhead than SWIFT.
- Reduced Failures - ZDC completely eliminates failures due to faults inserted in LSQ, functional units and register file, and brings down the failure percentage to 0.3% for faults inserted in pipeline registers.
- Proven Results - Extensive fault injection experiments on almost all the unprotected microarchitectural components in simulated ARM Cortex A53 demonstrate that ZDC is extremely effective, without incurring any more performance penalty than the state-of-the-art.
For more information about the inventor(s) and their research, please see