Treelets-An adaptive multi-scale basis for sparse unordered data - Statistics > MethodologyReport as inadecuate

Treelets-An adaptive multi-scale basis for sparse unordered data - Statistics > Methodology - Download this document for free, or read online. Document in PDF available to download.

Abstract: In many modern applications, including analysis of gene expression and textdocuments, the data are noisy, high-dimensional, and unordered-with noparticular meaning to the given order of the variables. Yet, successfullearning is often possible due to sparsity: the fact that the data aretypically redundant with underlying structures that can be represented by onlya few features. In this paper we present treelets-a novel construction ofmulti-scale bases that extends wavelets to nonsmooth signals. The method isfully adaptive, as it returns a hierarchical tree and an orthonormal basiswhich both reflect the internal structure of the data. Treelets are especiallywell-suited as a dimensionality reduction and feature selection tool prior toregression and classification, in situations where sample sizes are small andthe data are sparse with unknown groupings of correlated or collinearvariables. The method is also simple to implement and analyze theoretically.Here we describe a variety of situations where treelets perform better thanprincipal component analysis, as well as some common variable selection andcluster averaging schemes. We illustrate treelets on a blocked covariance modeland on several data sets hyperspectral image data, DNA microarray data, andinternet advertisements with highly complex dependencies between variables.

Author: Ann B. Lee, Boaz Nadler, Larry Wasserman


Related documents