Everything You Always Wanted to Know about Multicore Graph Processing but Were Afraid to AskReport as inadecuate




Everything You Always Wanted to Know about Multicore Graph Processing but Were Afraid to Ask - Download this document for free, or read online. Document in PDF available to download.

Presented at: 2017 USENIX Annual Technical Conference (USENIX ATC 17), Santa Clara, California, USA, July 12-14, 2017 Published in: Proceedings of 2017 USENIX Annual Technical Conference (USENIX ATC 17), p. 631--643 USENIX Association, 2017

Graph processing systems are used in a wide variety of fields, ranging from biology to social networks, and a large number of such systems have been described in the recent literature. We perform a systematic comparison of various techniques proposed to speed up in-memory multicore graph processing. In addition, we take an end- to-end view of execution time, including not only algorithm execution time, but also pre-processing time and the time to load the graph input data from storage. More specifically, we study various data structures to represent the graph in memory, various approaches to pre-processing and various ways to structure the graph computation. We also investigate approaches to improve cache locality, synchronization, and NUMA-awareness. In doing so, we take our inspiration from a number of graph processing systems, and implement the techniques they propose in a single system. We then selectively enable different techniques, allowing us to assess their benefits in isolation and independent of unrelated implementation considerations. Our main observation is that the cost of pre-processing in many circumstances dominates the cost of algorithm execution, calling into question the benefits of proposed algorithmic optimizations that rely on extensive pre- processing. Equally surprising, using radix sort turns out to be the most efficient way of pre-processing the graph input data into adjacency lists, when the graph in- put data is already in memory or is loaded from fast storage. Furthermore, we adapt a technique developed for out-of-core graph processing, and show that it significantly improves cache locality. Finally, we demonstrate that NUMA-awareness and its attendant pre-processing costs are beneficial only on large machines and for certain algorithms.

Keywords: Graph processing ; Benchmarking ; Multicore Note: Awarded Best Paper Reference EPFL-CONF-228854





Author: Malicevic, Jasmina; Lepers, Baptiste Joseph Eustache; Zwaenepoel, Willy

Source: https://infoscience.epfl.ch/record/228854?ln=en







Related documents