Integrity report for Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases

A machine-verified record of the checks Pith has run against this paper: detector runs, findings, signed bundle events, and canonical identifiers.

arXiv:2605.27355 · pith:2026:CIVC5APGZ5JW37FC2DUAY5F5RU

0Critical

0Advisory

8Detectors run

2026-06-05Last checked

Paper page arXiv integrity.json bundle.json

Detector runs

ai_meta_artifact skipped v1.0.0 · findings 0 · 2026-06-05 20:35:58.443533+00:00

claim_evidence completed v1.0.0 · findings 0 · 2026-06-02 23:48:04.766962+00:00

claim_evidence completed v1.0.0 · findings 0 · 2026-05-31 05:46:38.156242+00:00

external_links completed v1.0.0 · findings 0 · 2026-05-27 17:31:37.525592+00:00

shingle_duplication skipped v0.1.0 · findings 0 · 2026-05-27 05:49:58.230368+00:00

citation_quote_validity skipped v0.1.0 · findings 0 · 2026-05-27 03:50:18.586989+00:00

ai_meta_artifact skipped v1.0.0 · findings 0 · 2026-05-27 02:34:12.004665+00:00

cited_work_retraction completed v1.0.0 · findings 0 · 2026-05-27 02:23:35.799303+00:00

Findings

No public integrity findings for this paper.

Signed record

The machine-readable record for this paper lives at /pith/CIVC5APG/integrity.json. Pith Number bundles also include signed pith.integrity.v1 events where a Pith Number exists.