arXiv preprint arXiv:2502.07193 , year=

Long-Fei Li, Yu-Yang Qian, Peng Zhao, Zhi-Hua Zhou · 2025 · arXiv 2502.07193

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Vision-driven Preference Synthesis for Mitigating Hallucinations in VLMs

cs.CV · 2026-06-24 · unverdicted · novelty 6.0

ViPSy constructs policy-aligned and visually grounded preference pairs for VLMs via visual cues from image variants, yielding SOTA hallucination reductions of 35.7% on AMBER and 24.5% on Object HalBench.

Data-dependent Exploration for Online Reinforcement Learning from Human Feedback

cs.LG · 2026-05-06 · unverdicted · novelty 5.0

DEPO constructs uncertainty bonuses from historical data for exploration in online RLHF and provides a data-dependent regret bound that adapts to task hardness.

citing papers explorer

Showing 2 of 2 citing papers.

Vision-driven Preference Synthesis for Mitigating Hallucinations in VLMs cs.CV · 2026-06-24 · unverdicted · none · ref 43
ViPSy constructs policy-aligned and visually grounded preference pairs for VLMs via visual cues from image variants, yielding SOTA hallucination reductions of 35.7% on AMBER and 24.5% on Object HalBench.
Data-dependent Exploration for Online Reinforcement Learning from Human Feedback cs.LG · 2026-05-06 · unverdicted · none · ref 30
DEPO constructs uncertainty bonuses from historical data for exploration in online RLHF and provides a data-dependent regret bound that adapts to task hardness.

arXiv preprint arXiv:2502.07193 , year=

fields

years

verdicts

representative citing papers

citing papers explorer