CISUC researcher Frederico Cerveira wins Distinguished Paper Award at prestigious conference

Publication Date: 2015-10-20 17:43:28

EDCC 2015 - European Dependable Computing Conference

CISUC researcher Frederico Cerveira has been awarded the Distinguished Paper Award at the prestigious EDCC 2015 - European Dependable Computing Conference, that took place in Paris-France, from the 07th until the 11th of September 2015. Frederico was awarded for the the paper "Recovery for Virtualized Environments".


Cloud infrastructures provide elastic computing resources to client organizations, enabling them to build online applications while avoiding the fixed costs associated to a complete IT infrastructure.
However, such organizations are unlikely to fully trust the cloud for the most critical applications. Among other threats, soft errors are expected to increase with the shrinking geometries of transistors, and many errors are left for the software layers to correct and mask. This paper characterizes the behavior of a virtualized environment, using Xen with CentOS as the hypervisor, in presence of soft errors. One of the main threats arises from soft errors directly affecting the hypervisor, as these faults have the potential to disrupt several virtual machines at once. With this in mind, we develop a fault tolerant architecture for cloud applications, which relies on experimental data collected using fault injection to guide its design.
This architecture recovers from bit-flip errors with the help of a watchdog timer, to securely reboot the hypervisor. Nevertheless, errors might still propagate outside the system, for example to a client in a client-server interaction. Despite this, our results suggest that our architecture and a few simple techniques, like timers on the client, can recover a very large fraction of errors in client-server applications with small hardware and performance overhead. Conversely, the fraction of errors requiring Byzantine fault-tolerant techniques is quite small, thus restricting those expensive approaches to highly critical applications.