PND reduces object hallucination in VLMs via a dual-path contrast during decoding that amplifies visual features and penalizes linguistic priors, achieving reported SOTA results on POPE, MME, and CHAIR without retraining.
Microsoft coco: Common objects in context
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
dataset 1polarities
use dataset 1representative citing papers
S2-CoT coordinates a Structural Fidelity Adapter in the encoder-decoder with a Semantic Context Adapter in the entropy model to convert potential performance loss into state-of-the-art gains across base codecs while using only a small fraction of parameters.
STiTch refines LLM captions via embedding transition and uses set-to-set bidirectional transportation alignment to improve training-free zero-shot composed image retrieval.
SigLino distills SigLIP2 and DINOv3 into efficient vision models via asymmetric relation-knowledge distillation, token-balanced batching, and hierarchical data sampling on a new 200M-image corpus, yielding better transfer to grounding VLMs than training from scratch.
citing papers explorer
-
SigLino: Efficient Multi-Teacher Distillation for Agglomerative Vision Foundation Models
SigLino distills SigLIP2 and DINOv3 into efficient vision models via asymmetric relation-knowledge distillation, token-balanced batching, and hierarchical data sampling on a new 200M-image corpus, yielding better transfer to grounding VLMs than training from scratch.