pith. machine review for the scientific record. sign in

arxiv: 1409.0575 · v3 · submitted 2014-09-01 · 💻 cs.CV

Recognition: unknown

ImageNet Large Scale Visual Recognition Challenge

Authors on Pith no claims yet
classification 💻 cs.CV
keywords objectchallengerecognitionaccuracybeenbenchmarkclassificationdetection
0
0 comments X
read the original abstract

The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Deep Residual Learning for Image Recognition

    cs.CV 2015-12 accept novelty 8.0

    Residual networks reformulate layers to learn residual functions, enabling effective training of up to 152-layer models that achieve 3.57% error on ImageNet and win ILSVRC 2015.

  2. Session-based Recommendations with Recurrent Neural Networks

    cs.LG 2015-11 conditional novelty 8.0

    RNNs with ranking loss outperform item-to-item baselines for session-based recommendations on two datasets.

  3. Diffusion Models Beat GANs on Image Synthesis

    cs.LG 2021-05 accept novelty 7.0

    Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.

  4. Deep Learning Scaling is Predictable, Empirically

    cs.LG 2017-12 unverdicted novelty 7.0

    Deep learning generalization error follows power-law scaling with training set size across multiple domains, with model size scaling sublinearly with data size.

  5. Deepfake Detection Generalization with Diffusion Noise

    cs.CV 2026-04 unverdicted novelty 6.0

    ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.

  6. Teacher-Guided Routing for Sparse Vision Mixture-of-Experts

    cs.CV 2026-04 unverdicted novelty 5.0

    Teacher-guided routing supplies pseudo-supervision from a dense model's intermediate features to stabilize expert selection in sparse vision MoE models.

  7. Using Deep Learning Models Pretrained by Self-Supervised Learning for Protein Localization

    cs.CV 2026-04 unverdicted novelty 4.0

    DINO-based ViT models pretrained on HPA FOV achieve macro F1 of 0.822 zero-shot and 0.860 after fine-tuning for protein localization on OpenCell, demonstrating effective transfer from SSL pretraining.

  8. Discrete Meanflow Training Curriculum

    cs.LG 2026-04 unverdicted novelty 4.0

    A DMF curriculum initialized from pretrained flow models achieves one-step FID 3.36 on CIFAR-10 after only 2000 epochs by exploiting a discretized consistency property in the Meanflow objective.