ViLBench: A suite for vision-language process reward modeling

· 2025 · arXiv 2503.20271

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

dataset 1

citation-polarity summary

background 1

representative citing papers

ReRec: Reasoning-Augmented LLM-based Recommendation Assistant via Reinforcement Fine-tuning

cs.IR · 2026-04-09 · unverdicted · novelty 6.0

ReRec uses reinforcement fine-tuning with dual-graph reward shaping, reasoning-aware advantage estimation, and online curriculum scheduling to improve LLM reasoning and performance in recommendation tasks.

Reinforcement Learning from Human Feedback

cs.LG · 2025-04-16 · unverdicted · novelty 2.0

The book introduces the origins, mathematical setup, and optimization stages of RLHF including reward modeling, reinforcement learning, rejection sampling, and direct alignment algorithms.

citing papers explorer

Showing 2 of 2 citing papers.

ReRec: Reasoning-Augmented LLM-based Recommendation Assistant via Reinforcement Fine-tuning cs.IR · 2026-04-09 · unverdicted · none · ref 49
ReRec uses reinforcement fine-tuning with dual-graph reward shaping, reasoning-aware advantage estimation, and online curriculum scheduling to improve LLM reasoning and performance in recommendation tasks.
Reinforcement Learning from Human Feedback cs.LG · 2025-04-16 · unverdicted · none · ref 100
The book introduces the origins, mathematical setup, and optimization stages of RLHF including reward modeling, reinforcement learning, rejection sampling, and direct alignment algorithms.

ViLBench: A suite for vision-language process reward modeling

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer