Clustering is Efficient for Approximate Maximum Inner Product Search

Alex Auvolat; Hugo Larochelle; Pascal Vincent; Sarath Chandar; Yoshua Bengio

Clustering is Efficient for Approximate Maximum Inner Product Search

Not yet reviewed by Pith; the record is open.

Re-run · record.json Download PDF Read on arXiv ↗

This paper has not been read by Pith yet. Machine review is queued; the pith claim, tier, and objections will appear here once it completes.

SPECIMEN: schema-true, not a live event

T0 review · schema-true

One-sentence machine reading of the paper's core claim.

pith:XXXXXXXX · record.json · timestamp

arxiv 1507.05910 v3 pith:INUFUFSQ submitted 2015-07-21 cs.LG cs.CLstat.ML

Clustering is Efficient for Approximate Maximum Inner Product Search

Alex Auvolat , Sarath Chandar , Pascal Vincent , Hugo Larochelle , Yoshua Bengio This is my paper

classification cs.LG cs.CLstat.ML

keywords mipsapproximatemaximumsearchsimpleapproachclusteringefficient

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

0 comments

read the original abstract

Efficient Maximum Inner Product Search (MIPS) is an important task that has a wide applicability in recommendation systems and classification with a large number of classes. Solutions based on locality-sensitive hashing (LSH) as well as tree-based solutions have been investigated in the recent literature, to perform approximate MIPS in sublinear time. In this paper, we compare these to another extremely simple approach for solving approximate MIPS, based on variants of the k-means clustering algorithm. Specifically, we propose to train a spherical k-means, after having reduced the MIPS problem to a Maximum Cosine Similarity Search (MCSS). Experiments on two standard recommendation system benchmarks as well as on large vocabulary word embeddings, show that this simple approach yields much higher speedups, for the same retrieval precision, than current state-of-the-art hashing-based and tree-based methods. This simple method also yields more robust retrievals when the query is corrupted by noise.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Pyramid: A General Framework for Distributed Similarity Search
cs.DC 2019-06 unverdicted novelty 6.0

Pyramid is a distributed similarity search framework based on HNSW that partitions datasets into similar-item sub-datasets for efficient query processing and includes failure recovery and straggler mitigation.