Pith Number

pith:PTMIVYF5

pith:2026:PTMIVYF57TOIQIPUDUGX2MICTO

not attested not anchored not stored refs resolved

Margin-Adaptive Confidence Ranking for Reliable LLM Judgement

Gaojie Jin, Lijia Yu, Tianjin Huang, Yong Tao

A learned margin-adaptive confidence estimator improves LLM-human agreement by strengthening the link between confidence scores and disagreement risk.

arxiv:2605.15416 v1 · 2026-05-14 · cs.LG · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{PTMIVYF57TOIQIPUDUGX2MICTO}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

When integrated into fixed-sequence testing, the learned confidence estimator yields improved ranking accuracy and empirically strengthens the monotonic relationship between confidence and disagreement risk, leading to higher success rates in satisfying target agreement levels across multiple datasets and judge models.

C2weakest assumption

That training on simulated annotator diversity produces a confidence estimator whose ranking behavior transfers to real human disagreement distributions; the abstract notes the original monotonicity assumption is often violated but does not quantify how well the simulation matches actual human variance.

C3one line summary

Introduces a margin-adaptive confidence ranking method that learns an estimator from simulated diversity and derives margin-dependent generalization bounds for use in fixed-sequence testing of LLM-human agreement.

References

300 extracted · 300 resolved · 34 Pith anchors

[1] Under review

[2] Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback · arXiv:2204.05862

[3] Advances in neural information processing systems , volume=

[4] Learning to summarize with human feedback , author=. NeurIPS , year=

[5] Advances in Neural Information Processing Systems , volume=

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-20T00:00:57.468346Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

7cd88ae0bdfcdc8821f41d0d7d31029b9ae32276c77e3cce68d407239f3108b1

Aliases

arxiv: 2605.15416 · arxiv_version: 2605.15416v1 · doi: 10.48550/arxiv.2605.15416 · pith_short_12: PTMIVYF57TOI · pith_short_16: PTMIVYF57TOIQIPU · pith_short_8: PTMIVYF5

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/PTMIVYF57TOIQIPUDUGX2MICTO \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7cd88ae0bdfcdc8821f41d0d7d31029b9ae32276c77e3cce68d407239f3108b1

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "c15f1938ef9dde2ace8989fb774fd69b3b0fbe6d4a64d90a55ef4877104997bd",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-14T21:01:05Z",
    "title_canon_sha256": "0a97817509d89b0754952f0409660f732f9fb2d7b2b5893010e11dfa0c0ee9db"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.15416",
    "kind": "arxiv",
    "version": 1
  }
}