PET-DINO unifies visual and text prompts in Grounding DINO via an alignment-friendly generation module and prompt-enriched training strategies to improve zero-shot open-set object detection.
Elevater: A benchmark and toolkit for evaluating language-augmented visual models.Advances in Neural Information Processing Systems, 35:9287–9301
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
dataset 2
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2roles
dataset 2polarities
use dataset 2representative citing papers
DetRefiner fuses global and local features with a Transformer to refine OVOD confidence scores, delivering up to +10.1 AP gains on novel categories across multiple datasets.
citing papers explorer
-
PET-DINO: Unifying Visual Cues into Grounding DINO with Prompt-Enriched Training
PET-DINO unifies visual and text prompts in Grounding DINO via an alignment-friendly generation module and prompt-enriched training strategies to improve zero-shot open-set object detection.
-
DetRefiner: Model-Agnostic Detection Refinement with Feature Fusion Transformer
DetRefiner fuses global and local features with a Transformer to refine OVOD confidence scores, delivering up to +10.1 AP gains on novel categories across multiple datasets.