Searching in one billion vectors: re-rank with source codingReport as inadecuate

Searching in one billion vectors: re-rank with source coding - Download this document for free, or read online. Document in PDF available to download.

1 TEXMEX - Multimedia content-based indexing IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique 2 LEAR - Learning and recognition in vision Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble 3 SED Grenoble

Abstract : Recent indexing techniques inspired by source coding have been shown successful to index billions of high-dimensional vectors in memory. In this paper, we propose an approach that re-ranks the neighbor hypotheses obtained by these compressed-domain indexing methods. In contrast to the usual post-verification scheme, which performs exact distance calculation on the short-list of hypotheses, the estimated distances are refined based on short quantization codes, to avoid reading the full vectors from disk. We have released a new public dataset of one billion 128-dimensional vectors and proposed an experimental setup to evaluate high dimensional indexing algorithms on a realistic scale. Experiments show that our method accurately and efficiently re-ranks the neighbor hypotheses using little memory compared to the full vectors representation.

Keywords : approximate nearest neighbors large scale indexing source coding similarity search

Author: Hervé Jégou - Romain Tavenard - Matthijs Douze - Laurent Amsaleg -



Related documents