Clip-dinoiser: Teaching clip a few dino tricks for open-vocabulary semantic segmentation

Monika Wysocza´nska, Oriane Siméoni, Michaël Ramamonjisoa, Andrei Bursuc, Tomasz Trzci´nski, Patrick Pérez · 2023 · arXiv 2312.12359

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

Vision Transformers Need More Than Registers

cs.CV · 2026-02-25 · unverdicted · novelty 6.0

ViTs exhibit lazy aggregation by relying on irrelevant background patches for global semantics, and selectively integrating patch features into the CLS token reduces this effect and improves results across label-, text-, and self-supervision.

FUS3DMaps: Scalable and Accurate Open-Vocabulary Semantic Mapping by 3D Fusion of Voxel- and Instance-Level Layers

cs.RO · 2026-05-05 · unverdicted · novelty 5.0

FUS3DMaps fuses voxel- and instance-level open-vocabulary layers inside a shared 3D voxel map to improve both layers and enable scalable accurate semantic mapping.

citing papers explorer

Showing 2 of 2 citing papers.

Vision Transformers Need More Than Registers cs.CV · 2026-02-25 · unverdicted · none · ref 38
ViTs exhibit lazy aggregation by relying on irrelevant background patches for global semantics, and selectively integrating patch features into the CLS token reduces this effect and improves results across label-, text-, and self-supervision.
FUS3DMaps: Scalable and Accurate Open-Vocabulary Semantic Mapping by 3D Fusion of Voxel- and Instance-Level Layers cs.RO · 2026-05-05 · unverdicted · none · ref 19
FUS3DMaps fuses voxel- and instance-level open-vocabulary layers inside a shared 3D voxel map to improve both layers and enable scalable accurate semantic mapping.

Clip-dinoiser: Teaching clip a few dino tricks for open-vocabulary semantic segmentation

fields

years

verdicts

representative citing papers

citing papers explorer