Integrity report for Self-Jailbreaking: Language Models Can Reason Themselves Out of Safety Alignment After Benign Reasoning Training

A machine-verified record of the checks Pith has run against this paper: detector runs, findings, signed bundle events, and canonical identifiers.

arXiv:2510.20956

0Critical

0Advisory

0Detectors run

—Last checked

Paper page arXiv integrity.json

Detector runs

Findings

No public integrity findings for this paper.

Signed record

The machine-readable record for this paper lives at /pith/2510.20956/integrity.json. Pith Number bundles also include signed pith.integrity.v1 events where a Pith Number exists.