Particular object retrieval with integral max-pooling of CNN activations

Giorgos Tolias , Ronan Sicre , Herv\'e J\'egou

Authors on Pith no claims yet

classification 💻 cs.CV

keywords imagere-rankingretrievalsearchactivationsconvolutionalintegralmatching

read the original abstract

Recently, image representation built upon Convolutional Neural Network (CNN) has been shown to provide effective descriptors for image search, outperforming pre-CNN features as short-vector representations. Yet such models are not compatible with geometry-aware re-ranking methods and still outperformed, on some particular object retrieval benchmarks, by traditional image search systems relying on precise descriptor matching, geometric re-ranking, or query expansion. This work revisits both retrieval stages, namely initial search and re-ranking, by employing the same primitive information derived from the CNN. We build compact feature vectors that encode several image regions without the need to feed multiple inputs to the network. Furthermore, we extend integral images to handle max-pooling on convolutional layer activations, allowing us to efficiently localize matching objects. The resulting bounding box is finally used for image re-ranking. As a result, this paper significantly improves existing CNN-based recognition pipeline: We report for the first time results competing with traditional methods on the challenging Oxford5k and Paris6k datasets.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Self-Pruned Key-Value Attention: Learning When to Write by Predicting Future Utility
cs.LG 2026-05 unverdicted novelty 6.0

SP-KV trains a utility predictor jointly with the LLM to dynamically prune low-utility KV cache entries, achieving 3-10x memory reduction during generation with negligible performance loss.
Revisiting Human-in-the-Loop Object Retrieval with Pre-Trained Vision Transformers
cs.CV 2026-04 unverdicted novelty 4.0

Pre-trained ViT representations combined with active learning and targeted design choices for annotations and selection improve object class retrieval in multi-object scenes.