hub

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C · 2014 · cs.CV · arXiv 1409.0575

23 Pith papers cite this work. Polarity classification is still indexing.

23 Pith papers citing it

open full Pith review browse 23 citing papers arXiv PDF

abstract

The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 3 method 1

citation-polarity summary

use dataset 3 use method 1

representative citing papers

Deep Residual Learning for Image Recognition

cs.CV · 2015-12-10 · accept · novelty 8.0

Residual networks reformulate layers to learn residual functions, enabling effective training of up to 152-layer models that achieve 3.57% error on ImageNet and win ILSVRC 2015.

Session-based Recommendations with Recurrent Neural Networks

cs.LG · 2015-11-21 · conditional · novelty 8.0

RNNs with ranking loss outperform item-to-item baselines for session-based recommendations on two datasets.

Self-Organized Conformal Prediction: Reducing Regional Coverage Gaps with Unsupervised Group Discovery

stat.ML · 2026-06-28 · unverdicted · novelty 7.0

SOCP uses self-organizing maps for unsupervised group discovery to enable local calibration in conformal prediction, reducing regional coverage gaps on benchmarks with small set-size increases while preserving validity guarantees.

Layerwise Progressive Freezing: A Training Scaffold for Depth-Scalable Binary Networks

cs.LG · 2026-06-26 · unverdicted · novelty 7.0

StoMPP progressively binarizes BNN layers layerwise from input to output via stochastic masks, delivering depth-scalable accuracy gains in a fully STE-free regime by controlling activation-induced gradient blockades.

CORP: Closed-Form One-shot Representation-Preserving Structured Pruning for Transformers

cs.LG · 2026-02-05 · unverdicted · novelty 7.0

CORP performs one-shot structured pruning of Transformers by modeling removed components as affine functions of retained ones and solving closed-form ridge regressions on calibration data to fold compensation into weights, retaining 83.27% Top-1 accuracy on DeiT-Huge after 50% pruning.

VCBench: Benchmarking LLMs in Venture Capital

cs.AI · 2025-09-17 · unverdicted · novelty 7.0

VCBench is a new privacy-preserving benchmark showing LLMs like DeepSeek-V3 achieve over six times the market baseline precision in predicting founder success.

DiffGradCAM: A Class Activation Map Using the Full Model Decision to Solve Unaddressed Adversarial Attacks

cs.LG · 2025-06-10 · unverdicted · novelty 7.0

DiffGradCAM and DiffGradCAM++ use logit differences for contrastive class activation maps that resist passive fooling while matching GradCAM outputs in clean cases, tested with a new SHAM benchmark on multi-class tasks.

Diffusion Models Beat GANs on Image Synthesis

cs.LG · 2021-05-11 · accept · novelty 7.0

Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.

Deep Learning Scaling is Predictable, Empirically

cs.LG · 2017-12-01 · unverdicted · novelty 7.0

Deep learning generalization error follows power-law scaling with training set size across multiple domains, with model size scaling sublinearly with data size.

MJEPA: A Simple and Scalable Joint-Embedding Predictive Architecture for Audio-Visual Learning

cs.CV · 2026-06-23 · unverdicted · novelty 6.0

MJ EPA applies a single shared ViT encoder and one predictive objective within and across audio-visual modalities, reporting >6.8 mAP gains on AudioSet-20K and competitive video results with 10x less data.

Score-Control for Hallucination Reduction in Diffusion Models

cs.CV · 2026-05-29 · unverdicted · novelty 6.0

VSM modulates the score Jacobian using variance guidance to reduce hallucinations in diffusion models by up to 25% on synthetic and real datasets while preserving fidelity and diversity.

Motion-Compensated Weight Compression

cs.CV · 2026-05-23 · unverdicted · novelty 6.0

MCWC aligns permutation-symmetric blocks across layers to enable sequential prediction and residual entropy coding, improving rate-accuracy tradeoffs versus quantization and prior codecs on language and vision models.

Causal Attribution via Activation Patching

cs.CV · 2026-03-13 · unverdicted · novelty 6.0

CAAP produces patch attributions in ViTs by direct activation patching on intermediate layers to measure causal contribution to the target class score.

FedOptima: Optimizing Resource Utilization in Federated Learning

cs.DC · 2025-03-10 · unverdicted · novelty 6.0

FedOptima reduces both straggler and dependency idle times in federated learning via layer offloading, asynchronous aggregation, auxiliary networks, and server scheduling, delivering up to 21.8x faster training.

DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning

cs.RO · 2024-11-07 · unverdicted · novelty 6.0

DINO-WM builds world models on pre-trained DINOv2 features to enable zero-shot planning from offline data without rewards or demonstrations.

Deepfake Detection Generalization with Diffusion Noise

cs.CV · 2026-04-16 · unverdicted · novelty 6.0

ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.

LEAP: Layer-skipping Efficiency via Adaptive Progression for Vision Transformer Distillation

cs.CV · 2026-06-17 · unverdicted · novelty 5.0

LEAP is an adaptive layer-skipping curriculum for ViT feature distillation that reports accuracy gains on ImageNet and retrieval tasks plus training compute savings.

SORA: Free Second-Order Attacks in Fast Adversarial Training

cs.LG · 2026-05-30 · unverdicted · novelty 5.0

SORA is an adaptive step-size adversarial training algorithm that formalizes epsilon overfitting, introduces the PertAlign metric to predict catastrophic overfitting, and dynamically adjusts perturbations to achieve state-of-the-art robustness and clean accuracy with fixed hyperparameters.

DetailCLIP: Injecting Image Details into CLIP's Feature Space

cs.CV · 2022-08-31 · unverdicted · novelty 5.0

A patch-based fusion method extends CLIP to high-resolution images by retaining multi-scale details for improved class-prompted retrieval.

Teacher-Guided Routing for Sparse Vision Mixture-of-Experts

cs.CV · 2026-04-23 · unverdicted · novelty 5.0

Teacher-guided routing supplies pseudo-supervision from a dense model's intermediate features to stabilize expert selection in sparse vision MoE models.

PH-GCN: Person Re-identification with Part-based Hierarchical Graph Convolutional Network

cs.CV · 2019-07-20 · unverdicted · novelty 4.0

PH-GCN constructs a hierarchical graph of person parts and performs local/global feature learning via message passing in an end-to-end network for person re-identification.

Using Deep Learning Models Pretrained by Self-Supervised Learning for Protein Localization

cs.CV · 2026-04-13 · unverdicted · novelty 4.0

DINO-based ViT models pretrained on HPA FOV achieve macro F1 of 0.822 zero-shot and 0.860 after fine-tuning for protein localization on OpenCell, demonstrating effective transfer from SSL pretraining.

Discrete Meanflow Training Curriculum

cs.LG · 2026-04-10 · unverdicted · novelty 4.0

A DMF curriculum initialized from pretrained flow models achieves one-step FID 3.36 on CIFAR-10 after only 2000 epochs by exploiting a discretized consistency property in the Meanflow objective.

citing papers explorer

Showing 23 of 23 citing papers.

Deep Residual Learning for Image Recognition cs.CV · 2015-12-10 · accept · none · ref 36
Residual networks reformulate layers to learn residual functions, enabling effective training of up to 152-layer models that achieve 3.57% error on ImageNet and win ILSVRC 2015.
Session-based Recommendations with Recurrent Neural Networks cs.LG · 2015-11-21 · conditional · none · ref 13
RNNs with ranking loss outperform item-to-item baselines for session-based recommendations on two datasets.
Self-Organized Conformal Prediction: Reducing Regional Coverage Gaps with Unsupervised Group Discovery stat.ML · 2026-06-28 · unverdicted · none · ref 36 · internal anchor
SOCP uses self-organizing maps for unsupervised group discovery to enable local calibration in conformal prediction, reducing regional coverage gaps on benchmarks with small set-size increases while preserving validity guarantees.
Layerwise Progressive Freezing: A Training Scaffold for Depth-Scalable Binary Networks cs.LG · 2026-06-26 · unverdicted · none · ref 5 · internal anchor
StoMPP progressively binarizes BNN layers layerwise from input to output via stochastic masks, delivering depth-scalable accuracy gains in a fully STE-free regime by controlling activation-induced gradient blockades.
CORP: Closed-Form One-shot Representation-Preserving Structured Pruning for Transformers cs.LG · 2026-02-05 · unverdicted · none · ref 16 · internal anchor
CORP performs one-shot structured pruning of Transformers by modeling removed components as affine functions of retained ones and solving closed-form ridge regressions on calibration data to fold compensation into weights, retaining 83.27% Top-1 accuracy on DeiT-Huge after 50% pruning.
VCBench: Benchmarking LLMs in Venture Capital cs.AI · 2025-09-17 · unverdicted · none · ref 14 · internal anchor
VCBench is a new privacy-preserving benchmark showing LLMs like DeepSeek-V3 achieve over six times the market baseline precision in predicting founder success.
DiffGradCAM: A Class Activation Map Using the Full Model Decision to Solve Unaddressed Adversarial Attacks cs.LG · 2025-06-10 · unverdicted · none · ref 19 · internal anchor
DiffGradCAM and DiffGradCAM++ use logit differences for contrastive class activation maps that resist passive fooling while matching GradCAM outputs in clean cases, tested with a new SHAM benchmark on multi-class tasks.
Diffusion Models Beat GANs on Image Synthesis cs.LG · 2021-05-11 · accept · none · ref 52
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
Deep Learning Scaling is Predictable, Empirically cs.LG · 2017-12-01 · unverdicted · none · ref 7
Deep learning generalization error follows power-law scaling with training set size across multiple domains, with model size scaling sublinearly with data size.
MJEPA: A Simple and Scalable Joint-Embedding Predictive Architecture for Audio-Visual Learning cs.CV · 2026-06-23 · unverdicted · none · ref 36 · internal anchor
MJ EPA applies a single shared ViT encoder and one predictive objective within and across audio-visual modalities, reporting >6.8 mAP gains on AudioSet-20K and competitive video results with 10x less data.
Score-Control for Hallucination Reduction in Diffusion Models cs.CV · 2026-05-29 · unverdicted · none · ref 32 · internal anchor
VSM modulates the score Jacobian using variance guidance to reduce hallucinations in diffusion models by up to 25% on synthetic and real datasets while preserving fidelity and diversity.
Motion-Compensated Weight Compression cs.CV · 2026-05-23 · unverdicted · none · ref 49 · internal anchor
MCWC aligns permutation-symmetric blocks across layers to enable sequential prediction and residual entropy coding, improving rate-accuracy tradeoffs versus quantization and prior codecs on language and vision models.
Causal Attribution via Activation Patching cs.CV · 2026-03-13 · unverdicted · none · ref 28 · internal anchor
CAAP produces patch attributions in ViTs by direct activation patching on intermediate layers to measure causal contribution to the target class score.
FedOptima: Optimizing Resource Utilization in Federated Learning cs.DC · 2025-03-10 · unverdicted · none · ref 13 · internal anchor
FedOptima reduces both straggler and dependency idle times in federated learning via layer offloading, asynchronous aggregation, auxiliary networks, and server scheduling, delivering up to 21.8x faster training.
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning cs.RO · 2024-11-07 · unverdicted · none · ref 47 · internal anchor
DINO-WM builds world models on pre-trained DINOv2 features to enable zero-shot planning from offline data without rewards or demonstrations.
Deepfake Detection Generalization with Diffusion Noise cs.CV · 2026-04-16 · unverdicted · none · ref 57
ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.
LEAP: Layer-skipping Efficiency via Adaptive Progression for Vision Transformer Distillation cs.CV · 2026-06-17 · unverdicted · none · ref 20 · internal anchor
LEAP is an adaptive layer-skipping curriculum for ViT feature distillation that reports accuracy gains on ImageNet and retrieval tasks plus training compute savings.
SORA: Free Second-Order Attacks in Fast Adversarial Training cs.LG · 2026-05-30 · unverdicted · none · ref 55 · internal anchor
SORA is an adaptive step-size adversarial training algorithm that formalizes epsilon overfitting, introduces the PertAlign metric to predict catastrophic overfitting, and dynamically adjusts perturbations to achieve state-of-the-art robustness and clean accuracy with fixed hyperparameters.
DetailCLIP: Injecting Image Details into CLIP's Feature Space cs.CV · 2022-08-31 · unverdicted · none · ref 24 · internal anchor
A patch-based fusion method extends CLIP to high-resolution images by retaining multi-scale details for improved class-prompted retrieval.
Teacher-Guided Routing for Sparse Vision Mixture-of-Experts cs.CV · 2026-04-23 · unverdicted · none · ref 37
Teacher-guided routing supplies pseudo-supervision from a dense model's intermediate features to stabilize expert selection in sparse vision MoE models.
PH-GCN: Person Re-identification with Part-based Hierarchical Graph Convolutional Network cs.CV · 2019-07-20 · unverdicted · none · ref 31 · internal anchor
PH-GCN constructs a hierarchical graph of person parts and performs local/global feature learning via message passing in an end-to-end network for person re-identification.
Using Deep Learning Models Pretrained by Self-Supervised Learning for Protein Localization cs.CV · 2026-04-13 · unverdicted · none · ref 33
DINO-based ViT models pretrained on HPA FOV achieve macro F1 of 0.822 zero-shot and 0.860 after fine-tuning for protein localization on OpenCell, demonstrating effective transfer from SSL pretraining.
Discrete Meanflow Training Curriculum cs.LG · 2026-04-10 · unverdicted · none · ref 15
A DMF curriculum initialized from pretrained flow models achieves one-step FID 3.36 on CIFAR-10 after only 2000 epochs by exploiting a discretized consistency property in the Meanflow objective.

ImageNet Large Scale Visual Recognition Challenge

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer