Abstract: Very large datasets are often encountered in climatology, either from amultiplicity of observations over time and space or outputs from deterministicmodels sometimes in petabytes= 1 million gigabytes. Loading a large datavector and sorting it, is impossible sometimes due to memory limitations orcomputing power. We show that a proposed algorithm to approximating the median,-the median of the median- performs poorly. Instead we develop an algorithm toapproximate quantiles of very large datasets which works by partitioning thedata or use existing partitions possibly of non-equal size. We show thedeterministic precision of this algorithm and how it can be adjusted to getcustomized precisions.

Author: Reza Hosseini


