Llm-rubric: A multidimen- sional, calibrated approach to automated evaluation of natural language texts

Helia Hashemi, Jason Eisner, Corby Rosset, Benjamin Van Durme, Chris Kedzie · 2025 · arXiv 2501.00274

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Does Capability Transfer to Subjective Behavior -- and Would Our Instruments Tell Us? A Self-Evolving, Trust-by-Construction Evaluation Paradigm

cs.CL · 2026-05-27 · unverdicted · novelty 7.0

Self-evolving rubric with anti-gaming fitness reveals that objective capability scaling fails to transfer to subjective LLM behaviors, with advice-restraint as the universal lowest dimension that can regress.

Beyond Verifiable Rewards: Rubric-Based GRM for Reinforced Fine-Tuning SWE Agents

cs.LG · 2026-03-13 · unverdicted · novelty 7.0

A rubric-based generative reward model improves reinforced fine-tuning of SWE agents by supplying richer behavioral guidance than binary terminal rewards alone.

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains

cs.LG · 2025-07-23 · unverdicted · novelty 6.0

RaR uses aggregated rubric feedback as rewards in on-policy RL, delivering up to 31% relative gains on HealthBench and 7% on GPQA-Diamond versus direct Likert LLM-as-judge baselines.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Does Capability Transfer to Subjective Behavior -- and Would Our Instruments Tell Us? A Self-Evolving, Trust-by-Construction Evaluation Paradigm cs.CL · 2026-05-27 · unverdicted · none · ref 51
Self-evolving rubric with anti-gaming fitness reveals that objective capability scaling fails to transfer to subjective LLM behaviors, with advice-restraint as the universal lowest dimension that can regress.
Beyond Verifiable Rewards: Rubric-Based GRM for Reinforced Fine-Tuning SWE Agents cs.LG · 2026-03-13 · unverdicted · none · ref 11
A rubric-based generative reward model improves reinforced fine-tuning of SWE agents by supplying richer behavioral guidance than binary terminal rewards alone.
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains cs.LG · 2025-07-23 · unverdicted · none · ref 13
RaR uses aggregated rubric feedback as rewards in on-policy RL, delivering up to 31% relative gains on HealthBench and 7% on GPQA-Diamond versus direct Likert LLM-as-judge baselines.

Llm-rubric: A multidimen- sional, calibrated approach to automated evaluation of natural language texts

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer