Fast and Portable Locking for Multicore ArchitecturesReport as inadecuate

Fast and Portable Locking for Multicore Architectures - Download this document for free, or read online. Document in PDF available to download.

1 Laboratoire d-Informatique, Signaux, et Systèmes de Sophia-Antipolis I3S - Equipe MODALIS SPARKS - Scalable and Pervasive softwARe and Knowledge Systems 2 Whisper - Well Honed Infrastructure Software for Programming Environments and Runtimes LIP6 - Laboratoire d-Informatique de Paris 6, Inria de Paris 3 SAMOVAR - Services répartis, Architectures, MOdélisation, Validation, Administration des Réseaux

Abstract : The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. The main contribution presented in this article is a new locking technique, Remote Core Locking RCL, that aims to accelerate the execution of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated server hardware thread. RCL limits the performance collapse observed with other lock algorithms when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the hardware thread acquiring the lock, because such data can typically remain in the server-s cache. Other contributions presented in this article include a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool that transforms POSIX lock acquisitions into RCL locks. Eighteen applications were used to evaluate RCL: the nine applications of the SPLASH-2 benchmark suite, the seven applications of the Phoenix 2 benchmark suite, Memcached, and Berkeley DB with a TPC-C client. Eight of these applications are unable to scale because of locks and benefit from RCL on an ×86 machine with four AMD Opteron processors and 48 hardware threads. By using RCL instead of Linux POSIX locks, performance is improved by up to 2.5 times on Memcached, and up to 11.6 times on Berkeley DB with the TPC-C client. On a SPARC machine with two Sun Ultrasparc T2+ processors and 128 hardware threads, three applications benefit from RCL. In particular, performance is improved by up to 1.3 times with respect to Solaris POSIX locks on Memcached, and up to 7.9 times on Berkeley DB with the TPC-C client

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and-or a fee. Permissions may be requested from Publications

Keywords : profiling reengineering memory contention locks busy-waiting RPC locality Categories and Subject Descriptors: D41 Operating Systems: Process Management—Mutual exclusion General Terms: Design Algorithms Performance Additional Key Words and Phrases: Multicore synchronization

Author: Jean-Pierre Lozi - Florian David - Gaël Thomas - Julia Lawall - Gilles Muller -



Related documents