pith. sign in
Pith Number

pith:PTMIVYF5

pith:2026:PTMIVYF57TOIQIPUDUGX2MICTO
not attested not anchored not stored refs resolved

Margin-Adaptive Confidence Ranking for Reliable LLM Judgement

Gaojie Jin, Lijia Yu, Tianjin Huang, Yong Tao

A learned margin-adaptive confidence estimator improves LLM-human agreement by strengthening the link between confidence scores and disagreement risk.

arxiv:2605.15416 v1 · 2026-05-14 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{PTMIVYF57TOIQIPUDUGX2MICTO}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

When integrated into fixed-sequence testing, the learned confidence estimator yields improved ranking accuracy and empirically strengthens the monotonic relationship between confidence and disagreement risk, leading to higher success rates in satisfying target agreement levels across multiple datasets and judge models.

C2weakest assumption

That training on simulated annotator diversity produces a confidence estimator whose ranking behavior transfers to real human disagreement distributions; the abstract notes the original monotonicity assumption is often violated but does not quantify how well the simulation matches actual human variance.

C3one line summary

Introduces a margin-adaptive confidence ranking method that learns an estimator from simulated diversity and derives margin-dependent generalization bounds for use in fixed-sequence testing of LLM-human agreement.

References

300 extracted · 300 resolved · 34 Pith anchors

[1] Under review
[2] Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback · arXiv:2204.05862
[3] Advances in neural information processing systems , volume=
[4] Learning to summarize with human feedback , author=. NeurIPS , year=
[5] Advances in Neural Information Processing Systems , volume=

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:00:57.468346Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

7cd88ae0bdfcdc8821f41d0d7d31029b9ae32276c77e3cce68d407239f3108b1

Aliases

arxiv: 2605.15416 · arxiv_version: 2605.15416v1 · doi: 10.48550/arxiv.2605.15416 · pith_short_12: PTMIVYF57TOI · pith_short_16: PTMIVYF57TOIQIPU · pith_short_8: PTMIVYF5
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/PTMIVYF57TOIQIPUDUGX2MICTO \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7cd88ae0bdfcdc8821f41d0d7d31029b9ae32276c77e3cce68d407239f3108b1
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "c15f1938ef9dde2ace8989fb774fd69b3b0fbe6d4a64d90a55ef4877104997bd",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-14T21:01:05Z",
    "title_canon_sha256": "0a97817509d89b0754952f0409660f732f9fb2d7b2b5893010e11dfa0c0ee9db"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.15416",
    "kind": "arxiv",
    "version": 1
  }
}