Spair-71k: A large-scale benchmark for semantic correspon- dence

Min, J · 2019 · arXiv 1908.10543

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

Geometry Matters: 3D Foundation Priors for Learning Semantic Correspondence

cs.CV · 2026-05-28 · unverdicted · novelty 7.0

A 3D-aware framework uses SAM3D geometry and pose estimation plus geodesic filtering to supervise a lightweight adapter on DINO and Stable Diffusion features, improving semantic correspondence with less manual supervision.

Category-Level 3D Correspondence in Camera Space via Morphable Object Priors

cs.CV · 2026-05-27 · unverdicted · novelty 7.0

Morpheus learns morphable category-level shape priors to produce implicit 3D correspondences in camera space without explicit supervision and releases the HouseCorr3D benchmark with amodal and symmetry annotations.

Zero-Ablation Overstates Register Content Dependence in DINO Vision Transformers

cs.CV · 2026-04-15 · accept · novelty 7.0

Zero-ablation overstates register content dependence in DINO ViTs because mean, noise, and cross-image shuffle replacements preserve performance while zeroing does not.

Weighted Reverse Convolution for Feature Upsampling

cs.CV · 2026-05-17 · unverdicted · novelty 6.0 · 2 refs

Weighted Reverse Convolution is a spatially adaptive inverse operator for densifying high-level visual descriptors from vision foundation models, using weighted regularization and an FFT closed-form solution to improve dense prediction tasks.

MARCO: Navigating the Unseen Space of Semantic Correspondence

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

MARCO achieves new state-of-the-art semantic correspondence on SPair-71k, AP-10K and PF-PASCAL by combining coarse-to-fine refinement with self-distillation on DINOv2, delivering larger gains at fine thresholds and on unseen keypoints and categories while using 3x fewer parameters and running 10x更快.

VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors

cs.CV · 2026-04-02 · unverdicted · novelty 6.0

VLMs bypass visual comparison by recovering semantic labels for nameable entities and hallucinate on unnamable ones, as shown by performance gaps and Logit Lens analysis.

Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning

cs.CV · 2025-07-18 · conditional · novelty 6.0

Franca introduces nested Matryoshka clustering and positional disentanglement in a transparent SSL pipeline to deliver open-source vision models competitive with closed proprietary systems.

Normalized Matching Transformer

cs.CV · 2025-03-22 · unverdicted · novelty 6.0

Normalized Matching Transformer enforces unit-norm embeddings at every Transformer layer and trains with InfoNCE plus hyperspherical uniformity loss, reaching new state-of-the-art accuracy on PascalVOC and SPair-71k while converging faster than prior matching networks.

BLINK: Multimodal Large Language Models Can See but Not Perceive

cs.CV · 2024-04-18 · accept · novelty 6.0

BLINK benchmark shows multimodal LLMs reach only 45-51 percent accuracy on core visual perception tasks where humans achieve 95 percent, indicating these abilities have not emerged.

citing papers explorer

Showing 1 of 1 citing paper after filters.

BLINK: Multimodal Large Language Models Can See but Not Perceive cs.CV · 2024-04-18 · accept · none · ref 60
BLINK benchmark shows multimodal LLMs reach only 45-51 percent accuracy on core visual perception tasks where humans achieve 95 percent, indicating these abilities have not emerged.

Spair-71k: A large-scale benchmark for semantic correspon- dence

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer