Architecture for the Next Generation System Management Tools for Distributed Computing PlatformsReport as inadecuate

Architecture for the Next Generation System Management Tools for Distributed Computing Platforms - Download this document for free, or read online. Document in PDF available to download.

1 MYRIADS - Design and Implementation of Autonomous Distributed Systems IRISA-D1 - SYSTÈMES LARGE ÉCHELLE, Inria Rennes – Bretagne Atlantique 2 ORNL - Oak Ridge National Laboratory Oak Ridge 3 ASCOLA - Aspect and composition languages LINA - Laboratoire d-Informatique de Nantes Atlantique, Département informatique - EMN, Inria Rennes – Bretagne Atlantique 4 LINA - Laboratoire d-Informatique de Nantes Atlantique

Abstract : In order to get more results or greater accuracy, computational scientists execute mainly parallel or distributed applications, and try to scale these applications up. Accordingly, they use more and more distributed resources, using local large-scale HPC systems, grids or even clouds. However, in most of cases, the use and management of such platforms is static. Indeed generally, the application has to be adapted to the environment rather than adapting the environment to the applications- needs. In addition, platforms are managed through the concept of time and space partitioning mainly via the use of batch schedulers: time partitioning enables the execution of several applications on a same resources, and space partitioning enables the execution of applications across several distributed resources. This leads to some usage limitations, where applications can only be executed on a subset of the available resources. Therefore, scientists have to manage technical details related to the execution of their applications on each target HPC platforms, which could result in application modifications, rather than focusing on the science. In this article, we advocate for a system management tool enabling the transparent configuration of the HPC platform and the customization of the execution environment for large-scale HPC systems such as clusters or MPPs, grids, and clouds. We propose a new approach to manage these systems in a more dynamic way, where the resources can be configured and reconfigured automatically and transparently. The proposed solution is not removing the benefit of resource management systems such as batch system they still provide a well-known interface for job submission, but rather redefine the underlying system capabilities. Our approach is based on a refinement of the concept of emulation and virtualization introduced by Goldberg. Furthermore, the proposed approach leads to the definition of a method that provides a unique interface to scientists for the deployment and management of their applications on HPC platforms. This method is based on two concepts: i the Virtual System Environment VSE, and ii the Virtual Platforms VPs.

Author: Jérôme Gallard - Geoffroy Vallée - Thomas Naughton - Adrien Lèbre - Stephen Scott - Christine Morin -



Related documents