On the Optimal Design of Triple Modular Redundancy Logic for SRAM-based FPGAs

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Triple Modular Redundancy (TMR) is a suitable fault tolerant technique for SRAM-based FPGA. However, one of the main challenges in achieving 100% robustness in designs protected by TMR running on programmable platforms is to prevent upsets in the routing from provoking undesirable connections between signals from distinct redundant logic parts, which can generate an error in the output. This paper investigates the optimal design of the TMR logic (e.g., by cleverly inserting voters) to ensure robustness. Four different versions of a TMR digital filter were analyzed by fault injection. Faults were randomly inserted straight into the bitstream of the FPGA. The experimental results presented in this paper demonstrate that the number and placement of voters in the TMR design can directly affect the fault tolerance, ranging from 4.03% to 0.98% the number of upsets in the routing able to cause an error in the TMR circuit.

💡 Research Summary

**
The paper addresses a critical vulnerability of Triple Modular Redundancy (TMR) when implemented on SRAM‑based FPGAs: routing upsets that create unintended connections between signals belonging to different redundant modules. Because roughly 83 % of the configurable bits in a modern FPGA control routing, a single‑event upset (SEU) in the routing fabric can either short two signals from the same module (harmless, as the error is masked by the downstream voter) or, more dangerously, connect signals from two distinct redundant modules. In the latter case the majority voter may receive two corrupted inputs and consequently produce an incorrect output, defeating the purpose of TMR.

The authors investigate how the number and placement of majority voters influence the probability that a routing upset propagates to the final output. They propose four variants of a digital FIR filter (an 11‑tap, 9‑bit low‑pass filter) implemented on a Xilinx Spartan‑2 XC2S200E‑PQ208 device:

TMR_p1 – Maximum partition: Every elementary combinational block (adder, multiplier, etc.) is triplicated and a voter is inserted at the output of each block. This yields the highest voter count, the greatest routing protection, but also the largest area (560 slices) and the lowest maximum clock frequency (123 MHz).
TMR_p2 – Medium partition: Each partition contains one adder, one multiplier, and a voter only at the partition output. This reduces the number of voters while still breaking up long routing paths. Area is 504 slices and the design runs at 137 MHz.
TMR_p3 – Minimum partition: Voters are placed only at the outermost outputs of the whole filter. This minimizes area (498 slices) and maximizes performance (153 MHz) but leaves the design most exposed to routing upsets.
TMR_p3_nv – Minimum partition without voted registers: Same as TMR_p3, but the internal registers are not voted, further reducing area (476 slices) and flip‑flop configuration bits.

To evaluate robustness, the authors use a custom fault‑injection framework that modifies the FPGA bitstream at the level of individual configuration bits. The framework first extracts the exact set of bits that implement the design, then flips each bit one at a time, emulating an SEU. A golden (non‑TMR) copy of the filter runs in parallel on the same device, allowing cycle‑accurate comparison of outputs and automatic classification of fault effects.

The experimental campaign injects tens of thousands of random routing upsets across the four designs. The key metric is the percentage of injected routing faults that cause an observable error at the filter output. Results show a dramatic reduction when voters are distributed throughout the logic: the maximum‑partition design (TMR_p1) exhibits a 4.03 % error‑inducing routing upset rate, whereas the minimum‑partition design (TMR_p3) reduces this to 0.98 %, a four‑fold improvement. The medium‑partition design (TMR_p2) achieves a comparable reduction with a more balanced trade‑off between area and performance.

An interesting side observation is that increasing the number of voters slightly lowers the proportion of routing bits in the total configuration bitstream (from 81 % in the unprotected design to 77 % in TMR_p1). This occurs because each voter consumes LUT resources, thereby shifting some of the configuration budget away from pure routing bits.

Performance measurements confirm the expected trade‑off: finer partitioning introduces additional voter logic and longer interconnects, reducing the maximum clock frequency, while coarser partitioning yields higher speed but weaker fault tolerance. Area consumption follows a similar trend, with the maximum‑partition design requiring roughly three times the slices of the unprotected filter.

The authors conclude that optimal TMR design for SRAM‑based FPGAs is not a matter of simply placing voters at the final outputs. Instead, a judicious partitioning strategy that inserts voters at intermediate points can dramatically improve resilience to routing upsets while keeping area and timing overhead within acceptable limits. They suggest that a medium‑partition approach (similar to TMR_p2) may offer the best overall balance for many applications. Future work is proposed on automated partitioning tools, integration with periodic configuration scrubbing, and validation on newer, higher‑density FPGA families.

On the Optimal Design of Triple Modular Redundancy Logic for SRAM-based FPGAs

💡 Research Summary

Comments & Academic Discussion

Leave a Comment