pith. sign in

Clipself: Vision trans- former distills itself for open-vocabulary dense prediction

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

fields

cs.CV 8 cs.IR 1

years

2026 8 2025 1

roles

background 2

polarities

background 2

representative citing papers

WOW-Seg: A Word-free Open World Segmentation Model

cs.CV · 2026-05-16 · conditional · novelty 6.0

WOW-Seg proposes a word-free open-world segmentation model using Mask2Token and Cascade Attention Mask modules, reporting 89.7 semantic similarity and 82.4 semantic IoU on LVIS with one-eighth the parameters of prior SOTA plus a new 7,662-class benchmark.

Pi-HOC: Pairwise 3D Human-Object Contact Estimation

cs.CV · 2026-04-14 · unverdicted · novelty 6.0 · 2 refs

Pi-HOC predicts dense 3D semantic contacts for all human-object pairs in an image via instance-aware tokens and an InteractionFormer, achieving higher accuracy and 20x throughput than prior methods.

Vision Transformers Need More Than Registers

cs.CV · 2026-02-25 · unverdicted · novelty 6.0

ViTs exhibit lazy aggregation by relying on irrelevant background patches for global semantics, and selectively integrating patch features into the CLS token reduces this effect and improves results across label-, text-, and self-supervision.

citing papers explorer

Showing 9 of 9 citing papers.