Pith Integrity
The author's signature on a scientific paper used to be a one-time promise checked by a few reviewers. Pith makes that promise a continuous, signed, machine-checked property that follows the paper forever and lets anyone challenge any claim on the record.
§1The contract
Every scientific paper rests on an implicit contract from its author: each reference exists; each cited work says what the paper says it said; each data point is real; each paragraph is the author's; each theorem the paper claims to prove is proved. The contract was historically checked once, at peer review, by a few humans. It is now systematically violated at scale and there is no infrastructure to verify it.
Pith makes the contract explicit and continuously verifiable. Five properties must hold for a paper to deserve trust from anyone who did not write it:
- ExplicitEvery factual claim has machine-readable provenance: cited works, evidence type, location, proof artifact. Surfaced at
/pith/<id>/claims.json.live - Machine-checkableVerification does not require a human to read the paper. Detectors run automatically, deterministically, and reproducibly.live
- ContinuousThe integrity record changes when the world changes. URL availability, DOI status, and Crossref/OpenAlex retraction flags are re-checked on a schedule; a retracted cited work flips the record to critical.live
- SignedEvery finding and every challenge is signed with the Pith Ed25519 key and emitted as a bundle event. Replayable by anyone with the cited paper.live
- ChallengeableAny Pith user can file a signed challenge against a specific claim, reference, attribution, data point, or figure. The challenge is a first-class bundle event. The author may respond. The disagreement is the receipt.live
§2What is running
§3Position
Pith is a support layer, not a replacement. The existing publication infrastructure stays. Pith sits beneath it.
"Verification and trust infrastructure could become complementary to the existing publication system."Milan Zlatanovic, May 2026
§4What we check
| Detector | Verdict class | What it does |
|---|---|---|
| doi_compliance | incontrovertible | Resolves every DOI and arXiv ID in a paper's bibliography against Crossref, OpenAlex, internal corpus, and arXiv. Flags only identifiers that cannot resolve anywhere. |
| doi_title_agreement | cross source | Compares the title that a paper claims for each cited reference against the title that the reference's DOI or arXiv ID actually resolves to. |
| ai_meta_artifact | incontrovertible | Scans paper body text for verbatim AI assistant artifacts (refusal templates, placeholder cites, training-cutoff disclaimers). |
| external_links | incontrovertible | Extracts external URLs from paper text and re-verifies them with HTTP HEAD/GET. Flags dead repos and 404 URLs with the status code at check time. |
| citation_quote_validity | threshold with margin | When a citing paper attributes a specific factual claim to a referenced work, verifies the claim against the cited paper's text. Publishes only when the cited text is in the Pith corpus and definitively contradicts the attribution. |
| shingle_duplication | incontrovertible | Hashes 40-token n-grams of paper body text and flags identical n-grams shared with another paper that has no shared authors and no citation relationship in either direction. |
| claim_evidence | incontrovertible | For every recorded claim in a paper, verifies the asserted evidence artifact (Lean module, cited work, formal proof) actually exists. |
| cited_work_retraction | cross source | Continuously monitors every cited reference for retraction or expression-of-concern flags from Crossref and OpenAlex. Flags when at least two sources agree (retracted) or any one source surfaces an editorial concern (advisory). |
§5Surfaces
- Public feeds and protocol
- /findingsSeverity-banded, detector-filterable feed of every finding the layer emits.
- /challengesSigned challenges filed by readers against specific claims or references.
- /pith-integrity-protocolDetector contracts, verdict classes, evidence schemas, framing rules, rescission.
- /numberPith Number — citable, content-addressed identifier complementary to DOI/arXiv.
- Per-paper records
- /pith/2605.12611/integrity.jsonDetector summary, findings, and signed events for arXiv:2605.12611.
- /pith/2605.12611/claims.jsonMachine-readable claim ledger with evidence anchors.
- /pith/KBA77APKBK425RMJCW6FVVBP6Y/bundle.jsonFull signed bundle including integrity events and challenges.
- Schemas, signing, audit
- /schemas/pith-integrity-event/v1.jsonJSON Schema for the
pith.integrity.v1events emitted with each finding. - /schemas/pith-open-graph-bundle/v1.jsonJSON Schema for the bundle envelope.
- /schemas/pith-open-graph-event/v1.jsonJSON Schema for events inside a bundle.
- /pith-signing-key.jsonEd25519 public key used to sign every integrity event and canonical record.
- /pith-mirrors.jsonEndpoints that mirror Pith bundles. Integrity survives if Pith goes down.
§6How a finding is produced
- A timer wakes one detector. The detector pulls a batch of papers due for a fresh check.
- For each paper, the detector inspects extracted references, body text, claims, or external URLs and emits zero or more candidates. Each carries an
evidence_hashover the canonicalized evidence payload. - Findings are upserted into
integrity_findingskeyed by(detector, evidence_hash). Re-detections are idempotent. - The emitter drains pending findings, signs each one with the Pith Ed25519 key, and writes a
pith.integrity.v1event tointegrity_event_log. - The paper's Open Graph Bundle now carries those events alongside any signed challenges. External verifiers can re-run the detector code and reproduce the finding.
§7For journals, repositories, and partners
If you run a journal, a preprint server, a discovery engine, or an institutional repository, Pith is built to be consumed. Fetch /pith/<id>/integrity.json for any paper, embed the summary inline beside a paper, subscribe to the integrity and challenge event streams via the Open Graph Bundle, or mirror the bundles. The protocol is open, the implementation runs every minute, and the findings are reproducible by anyone with the cited paper.