Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Li Fei-Fei · 2009

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

SFHand: Learning Embodied Manipulation by Streaming Egocentric 3D Hand Forecasting

cs.CV · 2025-11-22 · unverdicted · novelty 7.0

SFHand presents the first streaming language-guided autoregressive framework for 3D hand forecasting, achieving up to 35.8% gains over prior methods and 13.4% better downstream embodied task performance.

VibeToken: Scaling 1D Image Tokenizers and Autoregressive Models for Dynamic Resolution Generations

cs.CV · 2026-04-27 · unverdicted · novelty 6.0

VibeToken enables autoregressive image generation at arbitrary resolutions using 64 tokens for 1024x1024 images with 3.94 gFID, constant 179G FLOPs, and better efficiency than diffusion or fixed AR baselines.

I Walk the Line: Examining the Role of Gestalt Continuity in Object Binding for Vision Transformers

cs.CV · 2026-04-10 · unverdicted · novelty 6.0

Pretrained vision transformers use specific attention heads sensitive to Gestalt continuity for object binding, shown via probes on synthetic datasets and ablation experiments.

Lite Any Stereo: Efficient Zero-Shot Stereo Matching

cs.CV · 2025-11-20 · unverdicted · novelty 6.0

Lite Any Stereo delivers top-ranked zero-shot accuracy on four real-world stereo benchmarks using a lightweight backbone, hybrid cost aggregation, and three-stage training on million-scale data, at less than 1% of typical computational cost.

citing papers explorer

Showing 4 of 4 citing papers.

SFHand: Learning Embodied Manipulation by Streaming Egocentric 3D Hand Forecasting cs.CV · 2025-11-22 · unverdicted · none · ref 11
SFHand presents the first streaming language-guided autoregressive framework for 3D hand forecasting, achieving up to 35.8% gains over prior methods and 13.4% better downstream embodied task performance.
VibeToken: Scaling 1D Image Tokenizers and Autoregressive Models for Dynamic Resolution Generations cs.CV · 2026-04-27 · unverdicted · none · ref 7
VibeToken enables autoregressive image generation at arbitrary resolutions using 64 tokens for 1024x1024 images with 3.94 gFID, constant 179G FLOPs, and better efficiency than diffusion or fixed AR baselines.
I Walk the Line: Examining the Role of Gestalt Continuity in Object Binding for Vision Transformers cs.CV · 2026-04-10 · unverdicted · none · ref 4
Pretrained vision transformers use specific attention heads sensitive to Gestalt continuity for object binding, shown via probes on synthetic datasets and ablation experiments.
Lite Any Stereo: Efficient Zero-Shot Stereo Matching cs.CV · 2025-11-20 · unverdicted · none · ref 12
Lite Any Stereo delivers top-ranked zero-shot accuracy on four real-world stereo benchmarks using a lightweight backbone, hybrid cost aggregation, and three-stage training on million-scale data, at less than 1% of typical computational cost.

Imagenet: A large-scale hierarchical image database

fields

years

verdicts

representative citing papers

citing papers explorer