GRASP aggregates stable local LLM interaction judgments into global argument rankings via a convergent attack-defense propagation operator on interaction graphs, yielding higher reproducibility than holistic judging and no correlation with human convincingness.
Explaining length bias in llm-based preference evaluations
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
method 1polarities
use method 1representative citing papers
New benchmark RoleCDE reveals LLMs exhibit role value decoupling under conflicts and demonstrates mitigation via targeted fine-tuning.
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
citing papers explorer
-
GRASP: Deterministic argument ranking in interaction graphs
GRASP aggregates stable local LLM interaction judgments into global argument rankings via a convergent attack-defense propagation operator on interaction graphs, yielding higher reproducibility than holistic judging and no correlation with human convincingness.
-
RoleCDE:Benchmarking and Mitigating Role-Alignment Trade-offs in Role-Playing Agents
New benchmark RoleCDE reveals LLMs exhibit role value decoupling under conflicts and demonstrates mitigation via targeted fine-tuning.