Performance Analysis and Optimization of the Tiled Cholesky Factorization on NUMA MachinesReport as inadecuate




Performance Analysis and Optimization of the Tiled Cholesky Factorization on NUMA Machines - Download this document for free, or read online. Document in PDF available to download.

1 RUNTIME - Efficient runtime systems for parallel architectures Inria Bordeaux - Sud-Ouest, UB - Université de Bordeaux, CNRS - Centre National de la Recherche Scientifique : UMR5800 2 LaBRI - Laboratoire Bordelais de Recherche en Informatique

Abstract : We discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-time NUMA shared memory machines. We show how to optimize thread placement and data placement in order to achieve performance gain up to 50% compared to state-of-the-art libraries such as Plasma or MKL.





Author: Emmanuel Jeannot -

Source: https://hal.archives-ouvertes.fr/



DOWNLOAD PDF




Related documents