pith:I44TW35N
Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios
Reward Auditor uses hypothesis testing to detect if reward models have systematic vulnerabilities under real-world perturbations.
arxiv:2512.00920 v5 · 2025-11-30 · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{I44TW35NPR5PJAAZBXKBHS6KEN}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Under real-world perturbed scenarios, Reward Auditor quantifies statistical significance and effect size by auditing distribution degradation of RM preference perception confidence. This enables inference of both the certainty and severity of RM vulnerabilities across diverse real-world scenarios.
The chosen real-world perturbations and the definition of suitability as conditional reliability under those perturbations accurately capture the vulnerabilities that matter for safe LLM alignment in deployment.
Reward Auditor is a statistical auditing framework that infers systematic vulnerabilities in reward models by quantifying distribution degradation of preference perception confidence under real-world perturbations.
Cited by
Receipt and verification
| First computed | 2026-05-20T00:00:29.144435Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
47393b6fad7c7af480190dd413cbca234016c43855b6f69389c34c9d00954930
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/I44TW35NPR5PJAAZBXKBHS6KEN \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 47393b6fad7c7af480190dd413cbca234016c43855b6f69389c34c9d00954930
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "34c28eddc3a84ec52591111031a61ace34d1728a5b1e5a47f6dd1f16ec7690d0",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CL",
"submitted_at": "2025-11-30T14:54:12Z",
"title_canon_sha256": "58a3a4d390f8261f81cba59cb1c194c6484e3151b0a82d6103dee92fb921eba3"
},
"schema_version": "1.0",
"source": {
"id": "2512.00920",
"kind": "arxiv",
"version": 5
}
}