Reset-Driven Fault Tolerance



A common approach in embedded systems to achieve fault-tolerance is to reboot the computer whenever some non-permanent error is detected. All the system code and data are recreated from scratch, and a previously established checkpoint, hopefully not corrupted, is used to restart the application data. The confidence is thus restored on the activity of the computer.
The idea explored in this paper is that of unconditionally resetting the computer in each control frame (the classic read sensors -> calculate control action -> update actuators cycle). A stable-storage based in RAM is used to preserve the system's state between consecutive cleanups and a standard watchdog timer guarantees that a reset is forced whenever an error crashes the system.
We have evaluated this approach by using fault-injection in the controller of a standard temperature control system. The experimental observations show that the Reset-Driven Fault Tolerance is a very simple yet effective technique to improve reliability at an extremely low cost since it is a conceptually simple, software only solution with the advantage of being application independent.


reset-driven fault tolerance, fault removal, dependability, embedded real-time systems


Fault-Tolerance in Control Systems


4th European Dependable Computing Conference (EDCC-4), October 2002

