Chain-of-Zoom factorizes extreme super-resolution into an autoregressive sequence of intermediate scales using a reused backbone model plus GRPO-tuned multi-scale VLM prompts.
ArXiv preprint abs/2410.02712 (2024)
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
UnifiedReward is the first unified reward model that jointly assesses multimodal understanding and generation to provide better preference signals for aligning vision models via DPO.
VCap pairs reference captions as witnesses with visual signals as adjudicators to deliver hypergeometric-precision rewards for RL in visual captioning, enabling an 8B model to outperform SOTA on benchmarks and improve weak-to-strong generalization.
A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
citing papers explorer
-
VCap: Hypergeometric Rewards for Weak-to-Strong Visual Captioning
VCap pairs reference captions as witnesses with visual signals as adjudicators to deliver hypergeometric-precision rewards for RL in visual captioning, enabling an 8B model to outperform SOTA on benchmarks and improve weak-to-strong generalization.