Scaling Out Link Prediction with SNAPLE: 1 Billion Edges and BeyondReport as inadecuate

Scaling Out Link Prediction with SNAPLE: 1 Billion Edges and Beyond - Download this document for free, or read online. Document in PDF available to download.

1 ASAP - As Scalable As Possible: foundations of large scale dynamic distributed systems Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE 2 UR1 - Université de Rennes 1

Abstract : In this paper, we consider how the emblematic problem of link-prediction can be implementedefficiently in gather-apply-scatter GAS platforms, a popular distributed graph-computation model. Ourproposal, called S NAPLE , exploits a novel highly-localized vertex scoring technique, and minimizes thecost of data flow while maintaining prediction quality.When used within GraphLab, S NAPLE can scale to extremely large graphs that a standard implementationof link prediction on GraphLab cannot handle. More precisely, we show that S NAPLE can process a graphcontaining 1.4 billions edges on a 256 cores cluster in less than three minutes, with no penalty in the qualityof predictions. This result corresponds to an over-linear speedup of 30 against a 20-core standalone machinerunning a non-distributed state-of-the-art solution.

Keywords : big data Distributed systems Graph link prediction

Author: Anne-Marie Kermarrec - François Taïani - Juan Manuel Tirado Martin -



Related documents