hub

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C · 2014 · cs.CV · arXiv 1409.0575

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

open full Pith review browse 16 citing papers arXiv PDF

abstract

The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 3 method 1

citation-polarity summary

use dataset 3 use method 1

representative citing papers

Deep Residual Learning for Image Recognition

cs.CV · 2015-12-10 · accept · novelty 8.0

Residual networks reformulate layers to learn residual functions, enabling effective training of up to 152-layer models that achieve 3.57% error on ImageNet and win ILSVRC 2015.

Session-based Recommendations with Recurrent Neural Networks

cs.LG · 2015-11-21 · conditional · novelty 8.0

RNNs with ranking loss outperform item-to-item baselines for session-based recommendations on two datasets.

CORP: Closed-Form One-shot Representation-Preserving Structured Pruning for Transformers

cs.LG · 2026-02-05 · unverdicted · novelty 7.0

CORP performs one-shot structured pruning of Transformers by modeling removed components as affine functions of retained ones and solving closed-form ridge regressions on calibration data to fold compensation into weights, retaining 83.27% Top-1 accuracy on DeiT-Huge after 50% pruning.

VCBench: Benchmarking LLMs in Venture Capital

cs.AI · 2025-09-17 · unverdicted · novelty 7.0

VCBench is a new privacy-preserving benchmark showing LLMs like DeepSeek-V3 achieve over six times the market baseline precision in predicting founder success.

DiffGradCAM: A Class Activation Map Using the Full Model Decision to Solve Unaddressed Adversarial Attacks

cs.LG · 2025-06-10 · unverdicted · novelty 7.0

DiffGradCAM and DiffGradCAM++ use logit differences for contrastive class activation maps that resist passive fooling while matching GradCAM outputs in clean cases, tested with a new SHAM benchmark on multi-class tasks.

Diffusion Models Beat GANs on Image Synthesis

cs.LG · 2021-05-11 · accept · novelty 7.0

Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.

Deep Learning Scaling is Predictable, Empirically

cs.LG · 2017-12-01 · unverdicted · novelty 7.0

Deep learning generalization error follows power-law scaling with training set size across multiple domains, with model size scaling sublinearly with data size.

Causal Attribution via Activation Patching

cs.CV · 2026-03-13 · unverdicted · novelty 6.0

CAAP produces patch attributions in ViTs by direct activation patching on intermediate layers to measure causal contribution to the target class score.

FedOptima: Optimizing Resource Utilization in Federated Learning

cs.DC · 2025-03-10 · unverdicted · novelty 6.0

FedOptima reduces both straggler and dependency idle times in federated learning via layer offloading, asynchronous aggregation, auxiliary networks, and server scheduling, delivering up to 21.8x faster training.

DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning

cs.RO · 2024-11-07 · unverdicted · novelty 6.0

DINO-WM builds world models on pre-trained DINOv2 features to enable zero-shot planning from offline data without rewards or demonstrations.

Deepfake Detection Generalization with Diffusion Noise

cs.CV · 2026-04-16 · unverdicted · novelty 6.0

ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.

DetailCLIP: Injecting Image Details into CLIP's Feature Space

cs.CV · 2022-08-31 · unverdicted · novelty 5.0

A patch-based fusion method extends CLIP to high-resolution images by retaining multi-scale details for improved class-prompted retrieval.

Teacher-Guided Routing for Sparse Vision Mixture-of-Experts

cs.CV · 2026-04-23 · unverdicted · novelty 5.0

Teacher-guided routing supplies pseudo-supervision from a dense model's intermediate features to stabilize expert selection in sparse vision MoE models.

PH-GCN: Person Re-identification with Part-based Hierarchical Graph Convolutional Network

cs.CV · 2019-07-20 · unverdicted · novelty 4.0

PH-GCN constructs a hierarchical graph of person parts and performs local/global feature learning via message passing in an end-to-end network for person re-identification.

Using Deep Learning Models Pretrained by Self-Supervised Learning for Protein Localization

cs.CV · 2026-04-13 · unverdicted · novelty 4.0

DINO-based ViT models pretrained on HPA FOV achieve macro F1 of 0.822 zero-shot and 0.860 after fine-tuning for protein localization on OpenCell, demonstrating effective transfer from SSL pretraining.

Discrete Meanflow Training Curriculum

cs.LG · 2026-04-10 · unverdicted · novelty 4.0

A DMF curriculum initialized from pretrained flow models achieves one-step FID 3.36 on CIFAR-10 after only 2000 epochs by exploiting a discretized consistency property in the Meanflow objective.

citing papers explorer

Showing 16 of 16 citing papers.

Deep Residual Learning for Image Recognition cs.CV · 2015-12-10 · accept · none · ref 36
Residual networks reformulate layers to learn residual functions, enabling effective training of up to 152-layer models that achieve 3.57% error on ImageNet and win ILSVRC 2015.
Session-based Recommendations with Recurrent Neural Networks cs.LG · 2015-11-21 · conditional · none · ref 13
RNNs with ranking loss outperform item-to-item baselines for session-based recommendations on two datasets.
CORP: Closed-Form One-shot Representation-Preserving Structured Pruning for Transformers cs.LG · 2026-02-05 · unverdicted · none · ref 16 · internal anchor
CORP performs one-shot structured pruning of Transformers by modeling removed components as affine functions of retained ones and solving closed-form ridge regressions on calibration data to fold compensation into weights, retaining 83.27% Top-1 accuracy on DeiT-Huge after 50% pruning.
VCBench: Benchmarking LLMs in Venture Capital cs.AI · 2025-09-17 · unverdicted · none · ref 14 · internal anchor
VCBench is a new privacy-preserving benchmark showing LLMs like DeepSeek-V3 achieve over six times the market baseline precision in predicting founder success.
DiffGradCAM: A Class Activation Map Using the Full Model Decision to Solve Unaddressed Adversarial Attacks cs.LG · 2025-06-10 · unverdicted · none · ref 19 · internal anchor
DiffGradCAM and DiffGradCAM++ use logit differences for contrastive class activation maps that resist passive fooling while matching GradCAM outputs in clean cases, tested with a new SHAM benchmark on multi-class tasks.
Diffusion Models Beat GANs on Image Synthesis cs.LG · 2021-05-11 · accept · none · ref 52
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
Deep Learning Scaling is Predictable, Empirically cs.LG · 2017-12-01 · unverdicted · none · ref 7
Deep learning generalization error follows power-law scaling with training set size across multiple domains, with model size scaling sublinearly with data size.
Causal Attribution via Activation Patching cs.CV · 2026-03-13 · unverdicted · none · ref 28 · internal anchor
CAAP produces patch attributions in ViTs by direct activation patching on intermediate layers to measure causal contribution to the target class score.
FedOptima: Optimizing Resource Utilization in Federated Learning cs.DC · 2025-03-10 · unverdicted · none · ref 13 · internal anchor
FedOptima reduces both straggler and dependency idle times in federated learning via layer offloading, asynchronous aggregation, auxiliary networks, and server scheduling, delivering up to 21.8x faster training.
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning cs.RO · 2024-11-07 · unverdicted · none · ref 47 · internal anchor
DINO-WM builds world models on pre-trained DINOv2 features to enable zero-shot planning from offline data without rewards or demonstrations.
Deepfake Detection Generalization with Diffusion Noise cs.CV · 2026-04-16 · unverdicted · none · ref 57
ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.
DetailCLIP: Injecting Image Details into CLIP's Feature Space cs.CV · 2022-08-31 · unverdicted · none · ref 24 · internal anchor
A patch-based fusion method extends CLIP to high-resolution images by retaining multi-scale details for improved class-prompted retrieval.
Teacher-Guided Routing for Sparse Vision Mixture-of-Experts cs.CV · 2026-04-23 · unverdicted · none · ref 37
Teacher-guided routing supplies pseudo-supervision from a dense model's intermediate features to stabilize expert selection in sparse vision MoE models.
PH-GCN: Person Re-identification with Part-based Hierarchical Graph Convolutional Network cs.CV · 2019-07-20 · unverdicted · none · ref 31 · internal anchor
PH-GCN constructs a hierarchical graph of person parts and performs local/global feature learning via message passing in an end-to-end network for person re-identification.
Using Deep Learning Models Pretrained by Self-Supervised Learning for Protein Localization cs.CV · 2026-04-13 · unverdicted · none · ref 33
DINO-based ViT models pretrained on HPA FOV achieve macro F1 of 0.822 zero-shot and 0.860 after fine-tuning for protein localization on OpenCell, demonstrating effective transfer from SSL pretraining.
Discrete Meanflow Training Curriculum cs.LG · 2026-04-10 · unverdicted · none · ref 15
A DMF curriculum initialized from pretrained flow models achieves one-step FID 3.36 on CIFAR-10 after only 2000 epochs by exploiting a discretized consistency property in the Meanflow objective.

ImageNet Large Scale Visual Recognition Challenge

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer