A label-free metric-guided fusion of complementary features from visual foundation models yields consistent gains in dense prediction tasks with improved object semantics and boundary localization.
Microsoft coco: Common objects in context
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
SSL-R1 reformulates visual SSL tasks into verifiable puzzles to supply rewards for RL post-training of MLLMs, yielding gains on multimodal benchmarks without external supervision.
ComMark embeds covert watermarks in models using frequency-domain compressed samples and simulated attacks, claiming state-of-the-art covertness and robustness across image, speech, text, and video tasks.
citing papers explorer
-
Metric-Guided Feature Fusion of Visual Foundation Models for Segmentation Tasks
A label-free metric-guided fusion of complementary features from visual foundation models yields consistent gains in dense prediction tasks with improved object semantics and boundary localization.
-
SSL-R1: Self-Supervised Visual Reinforcement Post-Training for Multimodal Large Language Models
SSL-R1 reformulates visual SSL tasks into verifiable puzzles to supply rewards for RL post-training of MLLMs, yielding gains on multimodal benchmarks without external supervision.
-
ComMark: Covert and Robust Black-Box Model Watermarking with Compressed Samples
ComMark embeds covert watermarks in models using frequency-domain compressed samples and simulated attacks, claiming state-of-the-art covertness and robustness across image, speech, text, and video tasks.