pith. sign in

hub Canonical reference

Critique-out-loud reward models

Canonical reference. 83% of citing Pith papers cite this work as background.

17 Pith papers citing it
Background 83% of classified citations

hub tools

citation-role summary

background 5 method 1

citation-polarity summary

clear filters

representative citing papers

STRIDE: Learnable Stepwise Language Feedback for LLM Reasoning

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

STRIDE co-trains generator and verifier on outcome rewards alone to deliver learnable stepwise language feedback that redirects LLM reasoning trajectories and outperforms scalar-reward baselines.

Building a Precise Video Language with Human-AI Oversight

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

CHAI framework pairs AI pre-captions with expert human critiques to produce precise video descriptions, enabling open models to outperform closed ones like Gemini-3.1-Pro and improve fine-grained control in video generation models.

RewardBench 2: Advancing Reward Model Evaluation

cs.CL · 2025-06-02 · unverdicted · novelty 6.0

RewardBench 2 is a new benchmark that supplies challenging fresh human prompts for reward model evaluation, yielding lower average scores but higher correlation with downstream best-of-N sampling and RLHF training performance.

Trust Region On-Policy Distillation

cs.LG · 2026-05-31 · unverdicted · novelty 5.0

TrOPD stabilizes on-policy distillation for LLMs with trust-region learning, outlier estimation, and off-policy guidance, outperforming prior OPD methods on reasoning and code benchmarks.

A Survey on LLM-as-a-Judge

cs.CL · 2024-11-23 · unverdicted · novelty 4.0

A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.

citing papers explorer

Showing 1 of 1 citing paper after filters.