pith. sign in

Integrity report for Measuring Security Without Fooling Ourselves: Why Benchmarking Agents Is Hard

A machine-verified record of the checks Pith has run against this paper: detector runs, findings, signed bundle events, and canonical identifiers.

arXiv:2605.22568 · pith:2026:HVN2PGCTYEELDM2M2KJEFRZO2J

0Critical
0Advisory
7Detectors run
2026-05-23Last checked

Paper page arXiv integrity.json bundle.json

Detector runs

doi_title_agreement completed v1.0.0 · findings 0 · 2026-05-23 05:32:05.949652+00:00
doi_compliance completed v1.0.0 · findings 0 · 2026-05-23 05:12:55.127165+00:00
citation_quote_validity completed v0.1.0 · findings 0 · 2026-05-23 03:50:52.449200+00:00
claim_evidence completed v1.0.0 · findings 0 · 2026-05-23 01:22:59.517766+00:00
shingle_duplication completed v0.1.0 · findings 0 · 2026-05-22 21:49:56.270541+00:00
cited_work_retraction completed v1.0.0 · findings 0 · 2026-05-22 01:52:33.510950+00:00
ai_meta_artifact skipped v1.0.0 · findings 0 · 2026-05-22 01:33:41.057969+00:00

Findings

No public integrity findings for this paper.

Signed record

The machine-readable record for this paper lives at /pith/HVN2PGCT/integrity.json. Pith Number bundles also include signed pith.integrity.v1 events where a Pith Number exists.