Spair-71k: A large-scale benchmark for semantic correspondence.arXiv preprint arXiv:1908.10543

Juhong Min, Jongmin Lee, Jean Ponce, Minsu Cho · 1908 · arXiv 1908.10543

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

Zero-Ablation Overstates Register Content Dependence in DINO Vision Transformers

cs.CV · 2026-04-15 · accept · novelty 7.0

Zero-ablation overstates register content dependence in DINO ViTs because mean, noise, and cross-image shuffle replacements preserve performance while zeroing does not.

Weighted Reverse Convolution for Feature Upsampling

cs.CV · 2026-05-17 · unverdicted · novelty 6.0 · 2 refs

Weighted Reverse Convolution is a spatially adaptive inverse operator for densifying high-level visual descriptors from vision foundation models, using weighted regularization and an FFT closed-form solution to improve dense prediction tasks.

MARCO: Navigating the Unseen Space of Semantic Correspondence

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

MARCO achieves new state-of-the-art semantic correspondence on SPair-71k, AP-10K and PF-PASCAL by combining coarse-to-fine refinement with self-distillation on DINOv2, delivering larger gains at fine thresholds and on unseen keypoints and categories while using 3x fewer parameters and running 10x更快.

VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors

cs.CV · 2026-04-02 · unverdicted · novelty 6.0

VLMs bypass visual comparison by recovering semantic labels for nameable entities and hallucinate on unnamable ones, as shown by performance gaps and Logit Lens analysis.

Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning

cs.CV · 2025-07-18 · conditional · novelty 6.0

Franca introduces nested Matryoshka clustering and positional disentanglement in a transparent SSL pipeline to deliver open-source vision models competitive with closed proprietary systems.

Normalized Matching Transformer

cs.CV · 2025-03-22 · unverdicted · novelty 6.0

Normalized Matching Transformer enforces unit-norm embeddings at every Transformer layer and trains with InfoNCE plus hyperspherical uniformity loss, reaching new state-of-the-art accuracy on PascalVOC and SPair-71k while converging faster than prior matching networks.

BLINK: Multimodal Large Language Models Can See but Not Perceive

cs.CV · 2024-04-18 · accept · novelty 6.0

BLINK benchmark shows multimodal LLMs reach only 45-51 percent accuracy on core visual perception tasks where humans achieve 95 percent, indicating these abilities have not emerged.

citing papers explorer

Showing 7 of 7 citing papers.

Zero-Ablation Overstates Register Content Dependence in DINO Vision Transformers cs.CV · 2026-04-15 · accept · none · ref 31
Zero-ablation overstates register content dependence in DINO ViTs because mean, noise, and cross-image shuffle replacements preserve performance while zeroing does not.
Weighted Reverse Convolution for Feature Upsampling cs.CV · 2026-05-17 · unverdicted · none · ref 35 · 2 links
Weighted Reverse Convolution is a spatially adaptive inverse operator for densifying high-level visual descriptors from vision foundation models, using weighted regularization and an FFT closed-form solution to improve dense prediction tasks.
MARCO: Navigating the Unseen Space of Semantic Correspondence cs.CV · 2026-04-20 · unverdicted · none · ref 34
MARCO achieves new state-of-the-art semantic correspondence on SPair-71k, AP-10K and PF-PASCAL by combining coarse-to-fine refinement with self-distillation on DINOv2, delivering larger gains at fine thresholds and on unseen keypoints and categories while using 3x fewer parameters and running 10x更快.
VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors cs.CV · 2026-04-02 · unverdicted · none · ref 11
VLMs bypass visual comparison by recovering semantic labels for nameable entities and hallucinate on unnamable ones, as shown by performance gaps and Logit Lens analysis.
Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning cs.CV · 2025-07-18 · conditional · none · ref 80
Franca introduces nested Matryoshka clustering and positional disentanglement in a transparent SSL pipeline to deliver open-source vision models competitive with closed proprietary systems.
Normalized Matching Transformer cs.CV · 2025-03-22 · unverdicted · none · ref 26
Normalized Matching Transformer enforces unit-norm embeddings at every Transformer layer and trains with InfoNCE plus hyperspherical uniformity loss, reaching new state-of-the-art accuracy on PascalVOC and SPair-71k while converging faster than prior matching networks.
BLINK: Multimodal Large Language Models Can See but Not Perceive cs.CV · 2024-04-18 · accept · none · ref 60
BLINK benchmark shows multimodal LLMs reach only 45-51 percent accuracy on core visual perception tasks where humans achieve 95 percent, indicating these abilities have not emerged.

Spair-71k: A large-scale benchmark for semantic correspondence.arXiv preprint arXiv:1908.10543

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer