Algorithm-based fault tolerance applied to P2P computing networksReport as inadecuate

Algorithm-based fault tolerance applied to P2P computing networks - Download this document for free, or read online. Document in PDF available to download.

* Corresponding author 1 MOAIS - PrograMming and scheduling design fOr Applications in Interactive Simulation Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d-Informatique de Grenoble 2 PLANETE - Protocols and applications for the Internet Inria Grenoble - Rhône-Alpes, CRISAM - Inria Sophia Antipolis - Méditerranée

Abstract : P2P computing platforms are subject to a wide range of attacks. In this paper, we propose a generalisation of the previous disk-less checkpointing approach for fault-tolerance in High Performance Computing systems. Our contribution is in two di- rections: first, instead of restricting to 2D checksums that tolerate only a small number of node failures, we propose to base disk-less checkpointing on linear codes to tolerate potentially a large number of faults. Then, we compare and analyse the use of Low Density Parity Check LDPC to classical Reed-Solomon RS codes with respect to different fault models to fit P2P systems. Our LDPC disk-less checkpointing method is well suited when only node disconnections are considered, but cannot deal with byzantine peers. Our RS disk-less checkpointing method tolerates such byzantine errors, but is restricted to exact finite field computations.

Author: Thomas Roche - Jean-Louis Roch - Mathieu Cunche -



Related documents