← back to paper
arxiv: 2603.20562 · 2 revisions
Permutation-Consensus Listwise Judging for Robust Factuality Evaluation