pith. sign in

hub Baseline reference

Microsoft COCO: Common Objects in Context

Baseline reference. 60% of citing Pith papers use this work as a benchmark or comparison.

57 Pith papers citing it
Baseline 60% of classified citations
abstract

We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object localization. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.

hub tools

citation-role summary

dataset 8 background 5 baseline 1 method 1

citation-polarity summary

representative citing papers

WildDet3D: Scaling Promptable 3D Detection in the Wild

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.

High-Resolution Image Synthesis with Latent Diffusion Models

cs.CV · 2021-12-20 · conditional · novelty 7.0

Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and

U4D: Unsupervised 4D Dynamic Scene Understanding

cs.CV · 2019-07-23 · unverdicted · novelty 7.0

Unsupervised joint semantic instance segmentation, 4D reconstruction, and scene flow from multi-view video of multi-person dynamic scenes, with reported ~40% gains over prior methods.

CASCADE: Context-Aware Relaxation for Speculative Image Decoding

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

CASCADE formalizes semantic interchangeability and convergence in target model representations to enable context-aware acceptance relaxation in tree-based speculative decoding, delivering up to 3.6x speedup on text-to-image models without quality loss.

LatentDiff: Scaling Semantic Dataset Comparison to Millions of Images

cs.CV · 2026-04-28 · unverdicted · novelty 6.0

LatentDiff scales semantic dataset comparison to millions of images using latent spaces of vision encoders combined with sparse autoencoders and density ratio estimation, showing better accuracy and robustness than caption-based approaches on a new benchmark for sparse distribution shifts.

Source-Modality Monitoring in Vision-Language Models

cs.CL · 2026-04-23 · unverdicted · novelty 6.0

Vision-language models use semantic signals more than syntactic ones to bind words like 'image' to actual visual inputs, with implications for robustness in multimodal systems.

citing papers explorer

Showing 50 of 57 citing papers.