pith. sign in

Recoverable Identifier

arXiv:2605.04135 · detector doi_compliance · incontrovertible · 2026-05-19 14:49:23.071985+00:00

advisory doi_compliance recoverable_identifier

DOI in the printed bibliography is fragmented by whitespace or line breaks. A longer candidate (10.3348/kjr.2024.1161.Closest) was visible in the surrounding text but could not be confirmed against doi.org as printed.

Paper page Integrity report arXiv Try DOI

Evidence text

Competitive-comparison table corroborating the Opus 4.6 Thinking Max SWE-Bench-Verified baseline; accessed 2026-04-23. David Gringras. frontierlag: A python package for auditing the capability gap of published AI evaluations. Python Package Index (PyPI), 2026a. URLhttps://pypi.org/project/frontierlag/ 0.1.0/. Released 2026-04-16 under MIT license; live web tool athttps://frontierlag.org; frozen- dataset snapshots refreshed quarterly. Persistent identifier (Zenodo DOI) added at companion paper’s arXiv launch. 36 David Gringras. Iatrobench: Pre-registered evidence of iatrogenic harm from AI safety measures. Preprint, target venue NeurIPS 2026 Datasets & Benchmarks Track, 2026b. 60 clinically vali- dated scenarios scored on dual axes of commission harm (CH, 0–3) and omission harm (OH, 0–4), with a matched-framing Decoupling Eval; six frontier models across3,600responses; elicitation- adequate multi-model reporting (exact versions, dates, reasoning-mode status, scoring protocol) as the companion exemplar of proximate-frontier medical evaluation. David Gringras. Pre-registration: Frontier lag — a bibliometric audit of capability misrepresentation in academic ai evaluation. Open Science Framework, 2026c. URLhttps://osf.io/7xm3d/. Registered 2026-04-17 (OSF timestamp 2026-04-18T00:38:52Z UTC); CC-BY 4.0; Internet Archive:https: //archive.org/details/osf-registrations-7xm3d-v1. David Gringras. Safety under scaffolding: How evaluation conditions shape measured safety. Preprint, 2026d.

Evidence payload

{
  "printed_excerpt": "Competitive-comparison table corroborating the Opus 4.6 Thinking Max SWE-Bench-Verified baseline; accessed 2026-04-23. David Gringras. frontierlag: A python package for auditing the capability gap of published AI evaluations. Python Package",
  "reconstructed_doi": "10.3348/kjr.2024.1161.Closest",
  "ref_index": 3,
  "resolved_title": null,
  "verdict_class": "incontrovertible"
}