Reuse-based Optimization for Pig LatinReport as inadecuate

Reuse-based Optimization for Pig Latin - Download this document for free, or read online. Document in PDF available to download.

1 Hortonworks Inc. 2 PSL - PSL Research University 3 LAMSADE - Laboratoire d-analyse et modélisation de systèmes pour l-aide à la décision 4 IPVS - Institut für Parallele und Verteilte Systeme 5 CEDAR - Rich Data Analytics at Cloud Scale Inria Saclay - Ile de France 6 LIX - Laboratoire d-informatique de l-École polytechnique Palaiseau 7 Université Paris-Saclay

Abstract : Pig Latin is a popular language which is widely used for parallel processing of massive data sets. Currently, subexpres-sions occurring repeatedly in Pig Latin scripts are executed as many times as they appear, and the current Pig Latin optimizer does not identify reuse opportunities. We present a novel optimization approach aiming at identifying and reusing repeated subexpressions in Pig Latin scripts. Our optimization algorithm, named PigReuse, identifies subexpression merging opportunities, selects the best ones to execute based on a cost function, and reuses their results as needed in order to compute exactly the same output as the original scripts. Our experiments demonstrate the effectiveness of our approach.

Keywords : Reuse-based Optimization Multi-query Optimization Linear Programming PigLatin

Author: Jesús Camacho-Rodríguez - Dario Colazzo - Melanie Herschel - Ioana Manolescu - Soudip Roy Chowdhury -



Related documents