Approximating CNNs with Bag- of-local-Features models works surprisingly well on ImageNet

· 2019 · cs.CV · arXiv 1904.00760

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Deep Neural Networks (DNNs) excel on many complex perceptual tasks but it has proven notoriously difficult to understand how they reach their decisions. We here introduce a high-performance DNN architecture on ImageNet whose decisions are considerably easier to explain. Our model, a simple variant of the ResNet-50 architecture called BagNet, classifies an image based on the occurrences of small local image features without taking into account their spatial ordering. This strategy is closely related to the bag-of-feature (BoF) models popular before the onset of deep learning and reaches a surprisingly high accuracy on ImageNet (87.6% top-5 for 33 x 33 px features and Alexnet performance for 17 x 17 px features). The constraint on local features makes it straight-forward to analyse how exactly each part of the image influences the classification. Furthermore, the BagNets behave similar to state-of-the art deep neural networks such as VGG-16, ResNet-152 or DenseNet-169 in terms of feature sensitivity, error distribution and interactions between image parts. This suggests that the improvements of DNNs over previous bag-of-feature classifiers in the last few years is mostly achieved by better fine-tuning rather than by qualitatively different decision strategies.

representative citing papers

Cumulative Meta-Learning from Active Learning Queries for Robustness to Spurious Correlations

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

CAML meta-learns a progressively refined inductive bias from active-learning queries to improve robustness to spurious correlations, reporting accuracy gains on minority groups across several benchmarks.

ShapeY: A Principled Framework for Measuring Shape Recognition Capacity via Nearest-Neighbor Matching

cs.CV · 2026-04-27 · unverdicted · novelty 6.0

ShapeY is a benchmark dataset and nearest-neighbor protocol that measures shape-based recognition in vision models, revealing that even state-of-the-art networks fail to generalize consistently across 3D viewpoints and non-shape appearance changes.

Predicting Visual Memory Schemas with Variational Autoencoders

cs.CV · 2019-07-19 · unverdicted · novelty 4.0

Variational autoencoders generate higher-resolution dual-channel visual memory schema maps that separately predict true and false memorability, extending prior CNN approaches.

citing papers explorer

Showing 3 of 3 citing papers.

Cumulative Meta-Learning from Active Learning Queries for Robustness to Spurious Correlations cs.LG · 2026-05-20 · unverdicted · none · ref 61 · internal anchor
CAML meta-learns a progressively refined inductive bias from active-learning queries to improve robustness to spurious correlations, reporting accuracy gains on minority groups across several benchmarks.
ShapeY: A Principled Framework for Measuring Shape Recognition Capacity via Nearest-Neighbor Matching cs.CV · 2026-04-27 · unverdicted · none · ref 8
ShapeY is a benchmark dataset and nearest-neighbor protocol that measures shape-based recognition in vision models, revealing that even state-of-the-art networks fail to generalize consistently across 3D viewpoints and non-shape appearance changes.
Predicting Visual Memory Schemas with Variational Autoencoders cs.CV · 2019-07-19 · unverdicted · none · ref 3 · internal anchor
Variational autoencoders generate higher-resolution dual-channel visual memory schema maps that separately predict true and false memorability, extending prior CNN approaches.

Approximating CNNs with Bag- of-local-Features models works surprisingly well on ImageNet

fields

years

verdicts

representative citing papers

citing papers explorer