Pairwise or pointwise? evaluating feedback protocols for bias in llm-based evaluation

Tuhina Tripathi, Manya Wadhwa, Greg Durrett, Scott Niekum · 2025 · arXiv 2504.14716

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

JudgmentBench: Comparing Rubric and Preference Evaluation for Quality Assessment

cs.CL · 2026-05-24 · unverdicted · novelty 7.0

JudgmentBench supplies the first public paired rubric and preference annotations from legal experts on the same LLM outputs, showing comparative judgments outperform rubrics in recovering quality orderings.

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning

cs.LG · 2026-04-08 · unverdicted · novelty 7.0

This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.

Towards Fast Domain Adaptation and Fine-Grained User Simulation for Evaluating Conversational Recommender Systems

cs.IR · 2026-06-22 · unverdicted · novelty 5.0

AdaptSim is an adaptive user simulator for CRS evaluation that combines automatic prompt generation, open actions, controlled text generation, and BFS-based pairwise comparison to produce realistic dialogues and assess system robustness across domains.

Trust Region On-Policy Distillation

cs.LG · 2026-05-31 · unverdicted · novelty 5.0

TrOPD stabilizes on-policy distillation for LLMs with trust-region learning, outlier estimation, and off-policy guidance, outperforming prior OPD methods on reasoning and code benchmarks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning cs.LG · 2026-04-08 · unverdicted · none · ref 112
This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.
Trust Region On-Policy Distillation cs.LG · 2026-05-31 · unverdicted · none · ref 152
TrOPD stabilizes on-policy distillation for LLMs with trust-region learning, outlier estimation, and off-policy guidance, outperforming prior OPD methods on reasoning and code benchmarks.

Pairwise or pointwise? evaluating feedback protocols for bias in llm-based evaluation

fields

years

verdicts

representative citing papers

citing papers explorer