pith. sign in

Spair-71k: A large-scale benchmark for semantic correspondence.arXiv preprint arXiv:1908.10543

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

citation-role summary

dataset 1

citation-polarity summary

fields

cs.CV 7

roles

dataset 1

polarities

use dataset 1

representative citing papers

Weighted Reverse Convolution for Feature Upsampling

cs.CV · 2026-05-17 · unverdicted · novelty 6.0 · 2 refs

Weighted Reverse Convolution is a spatially adaptive inverse operator for densifying high-level visual descriptors from vision foundation models, using weighted regularization and an FFT closed-form solution to improve dense prediction tasks.

MARCO: Navigating the Unseen Space of Semantic Correspondence

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

MARCO achieves new state-of-the-art semantic correspondence on SPair-71k, AP-10K and PF-PASCAL by combining coarse-to-fine refinement with self-distillation on DINOv2, delivering larger gains at fine thresholds and on unseen keypoints and categories while using 3x fewer parameters and running 10x更快.

Normalized Matching Transformer

cs.CV · 2025-03-22 · unverdicted · novelty 6.0

Normalized Matching Transformer enforces unit-norm embeddings at every Transformer layer and trains with InfoNCE plus hyperspherical uniformity loss, reaching new state-of-the-art accuracy on PascalVOC and SPair-71k while converging faster than prior matching networks.

citing papers explorer

Showing 7 of 7 citing papers.

  • Zero-Ablation Overstates Register Content Dependence in DINO Vision Transformers cs.CV · 2026-04-15 · accept · none · ref 31

    Zero-ablation overstates register content dependence in DINO ViTs because mean, noise, and cross-image shuffle replacements preserve performance while zeroing does not.

  • Weighted Reverse Convolution for Feature Upsampling cs.CV · 2026-05-17 · unverdicted · none · ref 35 · 2 links

    Weighted Reverse Convolution is a spatially adaptive inverse operator for densifying high-level visual descriptors from vision foundation models, using weighted regularization and an FFT closed-form solution to improve dense prediction tasks.

  • MARCO: Navigating the Unseen Space of Semantic Correspondence cs.CV · 2026-04-20 · unverdicted · none · ref 34

    MARCO achieves new state-of-the-art semantic correspondence on SPair-71k, AP-10K and PF-PASCAL by combining coarse-to-fine refinement with self-distillation on DINOv2, delivering larger gains at fine thresholds and on unseen keypoints and categories while using 3x fewer parameters and running 10x更快.

  • VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors cs.CV · 2026-04-02 · unverdicted · none · ref 11

    VLMs bypass visual comparison by recovering semantic labels for nameable entities and hallucinate on unnamable ones, as shown by performance gaps and Logit Lens analysis.

  • Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning cs.CV · 2025-07-18 · conditional · none · ref 80

    Franca introduces nested Matryoshka clustering and positional disentanglement in a transparent SSL pipeline to deliver open-source vision models competitive with closed proprietary systems.

  • Normalized Matching Transformer cs.CV · 2025-03-22 · unverdicted · none · ref 26

    Normalized Matching Transformer enforces unit-norm embeddings at every Transformer layer and trains with InfoNCE plus hyperspherical uniformity loss, reaching new state-of-the-art accuracy on PascalVOC and SPair-71k while converging faster than prior matching networks.

  • BLINK: Multimodal Large Language Models Can See but Not Perceive cs.CV · 2024-04-18 · accept · none · ref 60

    BLINK benchmark shows multimodal LLMs reach only 45-51 percent accuracy on core visual perception tasks where humans achieve 95 percent, indicating these abilities have not emerged.