Separating populations with wide data: A spectral analysis - Statistics > Machine LearningReport as inadecuate




Separating populations with wide data: A spectral analysis - Statistics > Machine Learning - Download this document for free, or read online. Document in PDF available to download.

Abstract: In this paper, we consider the problem of partitioning a small data sampledrawn from a mixture of $k$ product distributions. We are interested in thecase that individual features are of low average quality $\gamma$, and we wantto use as few of them as possible to correctly partition the sample. We analyzea spectral technique that is able to approximately optimize the total datasize-the product of number of data points $n$ and the number of features$K$-needed to correctly perform this partitioning as a function of $1-\gamma$for $K>n$. Our goal is motivated by an application in clustering individualsaccording to their population of origin using markers, when the divergencebetween any two of the populations is small.



Author: Avrim Blum, Amin Coja-Oghlan, Alan Frieze, Shuheng Zhou

Source: https://arxiv.org/







Related documents