pith. sign in

← back to paper

Review history

arxiv: 2603.20562 · 2 revisions

Permutation-Consensus Listwise Judging for Robust Factuality Evaluation

  1. 2026-05-21 UNVERDICTED LOW v0.9.0 novelty 5.0
    58592 ms 5712 in 1318 out 2026-05-21T09:43:23.560095+00:00
  2. 2026-05-15 CONDITIONAL LOW v0.9.0 novelty 6.0
    24122 ms 5464 in 1009 out 2026-05-15T07:40:00.308621+00:00