pith:NUZR3R2G
Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy
Pretrained base models exhibit the same or higher yield to simulated peer disagreement as their RLHF-tuned counterparts, localizing the issue to mid-layer attention rather than alignment.
arxiv:2605.12991 v1 · 2026-05-13 · cs.LG · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{NUZR3R2GAA26OHQIZEAP2XS4NX}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
pretrained base models exhibit the same substitution pattern as their Instruct variants, averaging higher yield than Instruct. Using activation patching, we localize the corruption to a narrow mid-layer window where attention carries the causal weight.
That the simulated peer disagreement in the experimental setup accurately captures the dynamics of real multi-agent LLM pipelines and that yield directly measures sycophancy rather than other forms of uncertainty or context sensitivity.
Pretrained base models exhibit higher yield to peer disagreement than RLHF instruct variants, with the effect localized to mid-layer attention and mitigated by structured dissent rather than prompt defenses.
References
Receipt and verification
| First computed | 2026-05-18T03:09:00.533726Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
6d331dc7460035e71e08c900fd5e5c6dd30c429247b06b189df0eac05061c649
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/NUZR3R2GAA26OHQIZEAP2XS4NX \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 6d331dc7460035e71e08c900fd5e5c6dd30c429247b06b189df0eac05061c649
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "fd6bf71118822b5767fbb0c2cfba5854e4f9bcc37b7883da0aa25e25dbf3c215",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-13T04:45:08Z",
"title_canon_sha256": "7eed6309f5d7ee2b84ef9f6e1749195760118488965057c30381d575a094d572"
},
"schema_version": "1.0",
"source": {
"id": "2605.12991",
"kind": "arxiv",
"version": 1
}
}