Structural uncertainty from self-preference-induced rankings of LLM reasoning paths complements answer dispersion for identifying unreliable instances on logical tasks while collapsing on factual retrieval.
Rank analysis of incomplete block designs: I. the method of paired comparisons,
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6verdicts
UNVERDICTED 6roles
method 1polarities
use method 1representative citing papers
A perceptron model trained on kernel data and run via eBPF in the Linux page cache outperforms FIFO by up to 10% in insertion rate on some workloads with low overhead.
The only aggregation rule satisfying same-scale normalization, recursive consistency, and marginal Elo-strength consistency converts ratings to strengths, takes their weighted arithmetic mean, and converts back.
LMs systematically inflate expressed certainty during rewriting, affecting up to 75% of outputs with a 1.5-2x bias toward increasing rather than decreasing certainty, and the effect compounds over iterations.
CivBench trains models on turn-level states in Civilization V to predict victory probabilities, providing a progress-based evaluation of LLM strategic capabilities across 307 games with 7 models.
AURA is an adaptive uncertainty-aware refinement method for auditing LLM-as-a-judge pairwise decisions that learns human-consistency signals through selective human verification on uncertain cases.
citing papers explorer
-
CivBench: Progress-Based Evaluation for LLMs' Strategic Decision-Making in Civilization V
CivBench trains models on turn-level states in Civilization V to predict victory probabilities, providing a progress-based evaluation of LLM strategic capabilities across 307 games with 7 models.