A DHT-based Infrastructure for Sharing Checkpoints in Desktop Grid Computing



In this paper we present Chkpt2Chkpt, a desktop grid
system that aims to reduce turnaround times of applications
by replicating checkpoints. We target desktop computing
projects with applications that are comprised of long-running
independent tasks, executed in hundreds or thousands of
computers spread over the Internet. While these
applications typically do local checkpointing to deal with
failures, we propose to replicate those checkpoints in
remote places to make them available to other worker nodes.

The main idea is to organize the worker nodes of a desktop
grid into a peer-to-peer Distributed Hash Table. Worker
nodes can take advantage of this P2P network to keep track,
share,manage and reclaim the space of the checkpoint files.
We used simulation to validate our system and we show that
remotely storing replicas of checkpoints can considerably
reduce the turnaround times of the tasks, when compared to
the traditional approaches where nodes manage their own
checkpoints locally. These results make us conclude that the
application of P2P techniques seems to be quite helpful in
wide-scaledesktop grid environments.


checkpointing, peer-to-peer, DHT


desktop grid


2nd IEEE International Conference on e-Science and Grid Computing, December 2006

