Title resolution pending

Learning transferable visual models from natural language supervision , author=

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

browse 7 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

OcclusionFormer adds explicit Z-order modeling via a new SA-Z dataset and volume-rendering compositing in a diffusion transformer to resolve occlusion ambiguities in layout-grounded image synthesis.

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

cs.CV · 2023-10-09 · unverdicted · novelty 7.0

A new shared video-image tokenizer enables large language models to surpass diffusion models on standard visual generation benchmarks.

Prototype-Based Test-Time Adaptation of Vision-Language Models

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

PTA adapts VLMs at test time by maintaining and updating class-specific knowledge prototypes from test samples, achieving higher accuracy than cache-based methods with far less speed loss.

MARCO: Navigating the Unseen Space of Semantic Correspondence

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

MARCO achieves new state-of-the-art semantic correspondence on SPair-71k, AP-10K and PF-PASCAL by combining coarse-to-fine refinement with self-distillation on DINOv2, delivering larger gains at fine thresholds and on unseen keypoints and categories while using 3x fewer parameters and running 10x更快.

Vision Transformers Need Registers

cs.CV · 2023-09-28 · unverdicted · novelty 6.0

Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.

CARE: Class-Adaptive Expert Consensus for Reliable Learning with Long-Tailed Noisy Labels

cs.CV · 2026-05-22 · unverdicted · novelty 5.0

CARE is a parameter-efficient framework that aggregates predictions from noisy labels, VLM text embeddings, and visual features with class-frequency-based agreement thresholds to rectify labels in long-tailed noisy datasets.

Layer by Layer: Uncovering Hidden Representations in Language Models

cs.LG · 2025-02-04 · unverdicted · novelty 5.0

Intermediate layers in LLMs consistently provide stronger features than final layers across tasks and architectures, as quantified by a new framework of information-theoretic, geometric, and invariance metrics.

citing papers explorer

Showing 7 of 7 citing papers.

OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation cs.CV · 2026-05-20 · unverdicted · none · ref 45
OcclusionFormer adds explicit Z-order modeling via a new SA-Z dataset and volume-rendering compositing in a diffusion transformer to resolve occlusion ambiguities in layout-grounded image synthesis.
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation cs.CV · 2023-10-09 · unverdicted · none · ref 100
A new shared video-image tokenizer enables large language models to surpass diffusion models on standard visual generation benchmarks.
Prototype-Based Test-Time Adaptation of Vision-Language Models cs.CV · 2026-04-23 · unverdicted · none · ref 1
PTA adapts VLMs at test time by maintaining and updating class-specific knowledge prototypes from test samples, achieving higher accuracy than cache-based methods with far less speed loss.
MARCO: Navigating the Unseen Space of Semantic Correspondence cs.CV · 2026-04-20 · unverdicted · none · ref 92
MARCO achieves new state-of-the-art semantic correspondence on SPair-71k, AP-10K and PF-PASCAL by combining coarse-to-fine refinement with self-distillation on DINOv2, delivering larger gains at fine thresholds and on unseen keypoints and categories while using 3x fewer parameters and running 10x更快.
Vision Transformers Need Registers cs.CV · 2023-09-28 · unverdicted · none · ref 270
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
CARE: Class-Adaptive Expert Consensus for Reliable Learning with Long-Tailed Noisy Labels cs.CV · 2026-05-22 · unverdicted · none · ref 105
CARE is a parameter-efficient framework that aggregates predictions from noisy labels, VLM text embeddings, and visual features with class-frequency-based agreement thresholds to rectify labels in long-tailed noisy datasets.
Layer by Layer: Uncovering Hidden Representations in Language Models cs.LG · 2025-02-04 · unverdicted · none · ref 150
Intermediate layers in LLMs consistently provide stronger features than final layers across tasks and architectures, as quantified by a new framework of information-theoretic, geometric, and invariance metrics.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer