Low-Cost Fault Tolerance for Continuous Real-Time Control Systems



Traditionally related to industrial control, continuous real-time control systems are spread throughout every area of our technology driven society. One of the most popular ways to achieve the correct functioning of those systems is through redundancy, by replicating some parts of the controller hardware and comparing their results. However, the high cost demanded by this approach is normally only acceptable in mission-critic or human-critic systems. A major percentage of control systems is limited to ad hoc solutions for fault tolerance based on empirical observations.
An important characteristic shared by the generality of continuous real-time control systems is the ability to tolerate some external disturbances affecting the controlled process, such as the air temperature, wind or dust. However, when the controller itself is subject to faults and produces some wrong or late control actions, it has been observed that, some times, there is no need to use any redundant system to recover.
The purpose of replication is to guaranty the correct and timely behaviour of the controller. However, continuous real"-time control systems are able to intrinsically tolerate transient controller malfunctions, the same way they can tolerate external disturbances affecting the controlled process. It is thus acceptable that fault tolerance be considered to avoid the collapse of the whole system (i.e., when the system stops delivering the expected
service or its quality becomes unacceptable), instead of avoiding the controller failure (i.e., when the controller produces wrong or late results).
By allowing a controller to fail, although during a limited amount of time, fault tolerance can be addressed by a new standpoint. It is then possible to trade off spatial by time redundancy, significantly reducing the costs for fault tolerance and thus permitting its use in low-cost control systems.
The studies presented in this thesis, along with some practical solutions and experimental results, prove that it is feasible to provide real-time control systems with high dependability by means of generic software solutions.


fault tolerance, real-time systems, continuous control systems, fault injection, stable storage, fail-bounded model, disaster prevision, reset, dependability, assertions, failure tolerance, experimental evaluation


Dependability Analysis

PhD Thesis

Low-Cost Fault Tolerance for Continuous Real-Time Control Systems, July 2003

Cited by

Year 2005 : 1 citations

 1. Girish Baliga, "A Middleware Framework for Networked Control Systems," PhD Thesis, Department of Cumputer Science, University of Illinois at Urbana-Champaign, USA, 2005

Year 2004 : 1 citations

 1. Scott R. Graham, “Fault Tolerance in Networked Control Systems Through Real-Time Restarts”, Report, University of Illinois at Urbana, Air Force Inst of Tech Wright-Pattersonafb OH, 2004