pith. machine review for the scientific record. sign in

The Pith Number

A content-addressed, machine-readable identifier for every paper in the scientific record. Free to compute. Free to read. Author-attestable. Future-resolvable.

1.53M+papers indexed (claims, references, reviews extracted)
53k+machine reviews shipped
2.27M+references resolved to Pith anchors

Pith Numbers are rolling out to those indexed papers in batches. The first 100 are live now; the sample record below is one of them.

What a Pith Number is

A DOI is a pointer to a document. A Pith Number is a pointer to a document plus its claims, its citation graph, its formal-canon links, its corrections, its replications, and its current status. The identifier is a cryptographic hash, computable by anyone, verifiable without trusting Pith.

Pith does not host papers. Pith does not gatekeep numbers. Pith computes and resolves them. Authors can claim and co-sign them. Agents can read, write, and verify against them. Independent mirrors can serve them. The Pith Number outlives Pith.

Pith Number = content hash + canonical metadata + version anchor. Anyone can recompute it. Pith is the canonical resolver, not the issuing authority.

Pith Number vs DOI

DOI is a name service. It tells you where a paper lives. A Pith Number adds a knowledge layer on top of that pointer.

DOI Capability Pith Number

Identifier integrity

no Content-addressed identifier (verifiable via hash) yes
no Permissionless to compute (no registry to ask for an ID) yes
via Crossref Permissionless to read yes

Knowledge layer

no Machine-readable claims yes
indirect Bidirectional citation graph yes
no Claim-level citations (not just paper-level) yes
no Cascading citation health when a cited claim is corrected yes
ORCID only Author-signed attestation of every load-bearing claim yes

Verifiability

no Formal-canon links (Lean, Coq, machine-checked proofs) yes
no Signed replication or falsification records yes
no Independent timestamp anchor (e.g. OpenTimestamps) yes
DOI suffix Signed version lineage and corrections yes

Sovereignty

no Portable proof bundle (verify without the registry) yes
Crossref API Agent-first JSON-LD surface at the same address yes
no Identifier remains verifiable if the central authority goes offline (anyone can recompute the hash) yes

DOI remains the canonical name service for journal publication. Pith Number does not replace it; the Pith Number record always links back to the canonical source where one exists.

See it on a real paper

Mixed-State Long-Range Entanglement from Dimensional Constraints

This live Pith page shows the current surface that Pith Numbers build on: a core claim, load-bearing premise, falsifier, figures, referee report, simulated rebuttal, circularity audit, theorem links, and a resolved reference graph.

arXiv:2605.15201 · quant-ph · 62 extracted references · 2 theorem links

What it means to be "computed"

A traditional identifier (like a DOI) is an assigned serial number handed out by a central registry. If the registry disappears, the number is a dead string.

A computed identifier is a content hash. It is a deterministic mathematical result derived from the paper itself. Pith doesn't assign the number; Pith calculates it. Because the math is public, anyone with the paper can calculate the exact same Pith Number independently.

This creates three massive structural advantages:

  • Zero-permission scale: We don't have to ask a central authority to mint IDs. Pith can backfill the entire 1.53M+ arXiv corpus in an afternoon.
  • Authority independence: The identifier outlives Pith. If pith.science goes offline in ten years, the Pith Number still fundamentally proves the paper hasn't been altered, because the hash still matches the text.
  • Decoupling identity from state: The Pith Number identifies the immutable paper. Things that change over time (author claims, peer reviews, citation graphs, replication notes) are stored as separate layers that point to the Pith Number.

How does this relate to Blockchain and IPFS?

It integrates them. Pith separates the identity of a paper from its storage and its timestamp. Pith acts as the knowledge graph that binds these layers together.

Storage (IPFS, Arweave)

Computing the ID doesn't store the file. Pith's stance is "no custody." The paper itself lives wherever the author chooses: arXiv, institutional servers, or decentralized networks like IPFS. The Pith Number record acts as a secure routing table: it records those storage locations and verifies that any file they serve perfectly matches the canonical hash.

Time and Proof (Blockchain)

A hash proves tamper-evidence, but it doesn't prove when the paper was written. Blockchains provide immutable proof of priority. Because Pith Numbers are computed hashes, we don't need to burden authors with crypto wallets or per-paper fees. Pith can batch thousands of Pith Numbers into a single Merkle Tree and anchor the root to the Bitcoin blockchain. This grants every indexed paper unbreakable cryptographic proof of existence automatically.

If Pith shuts down, what survives?

The Pith Number itself survives because it is computed from the paper. The richer question is the graph: claims, references, theorem links, author attestations, corrections, replication records, signatures, and timestamp proofs. That graph should not depend on one website or one database.

The answer is the Pith Open Graph Bundle. Each paper gets a portable, signed bundle that any service can host, mirror, verify, and merge. Pith is one resolver for that bundle, not the permanent owner of it.

Protocol claim: a Pith Number record is reconstructible from signed events. If pith.science disappears, a mirror can serve the same bundle and compute the same current state.

What the bundle contains

  • Canonical record: source id, version, metadata hashes, Pith Number, canonical SHA-256.
  • Claim ledger: extracted claims, load-bearing premises, falsifiers, author responses.
  • Reference graph: resolved works, internal Pith anchors, citation health.
  • Formal links: Lean / Coq / proof-script anchors tied to specific paper claims.
  • Signed events: author attestations, citation signatures, corrections, replications, storage proofs.
  • Timestamp proofs: Merkle inclusion proofs and OpenTimestamps receipts.

Where it can live

  • Pith resolver: /pith/<id>/bundle.json.
  • Paper-hosted copies: arXiv ancillary files, journal supplements, institutional repositories.
  • Decentralized storage: IPFS, Arweave, Filecoin, or any content-addressed mirror.
  • Public archives: Internet Archive, Zenodo, GitHub Releases, Hugging Face datasets.
  • Independent Pith-compatible resolvers run by libraries, labs, journals, or agents.

Why this is better than a registry dump

A public dump is only a snapshot. It goes stale as soon as an author attests a claim, a citation is signed, or a correction is issued. The bundle protocol is event-based: every update is a signed event, and every mirror can merge those events deterministically.

Fetch bundlesPith, arXiv ancillary, IPFS, university mirror
Verify eventshashes, signatures, timestamps, signer authority
Merge statededupe by event id, sort, apply supersession rules

The merge rule

Mirrors do not vote on truth and they do not rewrite records. They collect signed events and apply the same deterministic merge algorithm. Invalid signatures are ignored. Duplicate events collapse to one event. Later correction events supersede older claim events only when the signer has authority over that claim. Conflicting author events remain visible as an equivocation record rather than being silently hidden.

{
  "bundle_type": "pith_open_graph_bundle",
  "bundle_version": "1.0",
  "pith_number": "pith:2026:MGWAO3HY...",
  "canonical_record": { "...": "..." },
  "events": [
    {"event_type": "claim_extracted", "event_id": "sha256:...", "signature": "..."},
    {"event_type": "author_attestation", "event_id": "sha256:...", "signature": "..."},
    {"event_type": "citation_signature", "event_id": "sha256:...", "signature": "..."}
  ],
  "timestamp_proofs": [...],
  "resolver_hints": [...]
}

Build target: a bundle endpoint, a public schema, a tiny reference merge library, and mirror support. The product promise is not "trust Pith forever." The promise is: verify the graph without Pith.

Live now on the sample record

Open the resolver page for the pilot Pith Number to see every item below in production. Recompute the hash from the JSON and verify the Ed25519 signature against the published key.

Live

  • Computed Pith Number pith:MGWAO3HY, deterministic SHA-256 of the canonical record.
  • Pith Ed25519 signature on the canonical hash. Public key at /pith-signing-key.json.
  • Resolver page and JSON endpoint at the same URL. Content negotiation via Accept: application/ld+json.
  • OpenTimestamps Bitcoin anchor (3 calendars, Bitcoin block 949860).
  • Internet Archive Wayback capture of the paper URL.
  • Identity-backed author claim flow shared with the paper page.
  • Signed citation submissions and replication/falsification submissions.
  • Paper-page banner on 2605.15201.

Rolling out

  • Pith Numbers across the rest of the 1.53M indexed papers. The first 100 are live; backfill runs daily.
  • Claim-level and version-level citation targeting (.v2#C14).
  • Cascading citation health when a cited claim is corrected.
  • Author-direct uploads of new work (PDF, TeX archive, DOCX).
  • Independent mirror reference implementation and public protocol spec.

Verify the sample Pith Number yourself

Three shell commands. If the recomputed hash matches the published one, the record is intact end-to-end.

curl -sH 'Accept: application/ld+json' https://pith.science/pith/MGWAO3HY5DMZWHOMNAZYZQRCMD \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 61ac076cf8e8d99b1dcc68338cc22260df2550aef08c8069b0f61fdb7aa2371d

Two faces of every paper

Every paper in the scientific corpus has two possible states inside the Pith Registry. The distinction is the most important user-facing concept.

Observed

Pith Number, computed

Pith ingests the paper from arXiv, Crossref, OpenAlex, PubMed, Zenodo, or a direct upload. Pith computes the content hash and extracts the claims, references, and figures. The Pith Number exists. The author has not co-signed.

Default state for retroactively indexed papers. Read-only graph node. Trustworthy as a machine-extracted summary, not as an authoritative author statement.

Claimed

Pith Number, attested

An author verifies identity (ORCID, institutional SSO, GitHub, signed message from a registered email) and takes ownership. They accept the extracted claims, the load-bearing premises, the axioms, and the figures, or they dispute specific items.

Verified record. Machine-extracted scaffold, author-attested response. Eligible for signed corrections, supersession declarations, and formal-bridge attachments.

The visual and trust distinction is enforced everywhere: footer disclaimer, API responses, citation receipts, badge filters. A computed Pith Number can never masquerade as an attested one.

Anatomy of a Pith Number

A Pith Number is the SHA-256 of a canonical record. The record is a deterministic JSON object that any third party can reconstruct from the same inputs.

Identifier format:

pith:2026:Q5K3F8...M2          long form (full hash)
pith:MGWAO3HY                       short form (first 8 chars)
https://pith.science/pith/MGWAO3HY  resolver URL (short)
https://pith.science/pith/MGWAO3HY5DMZWHOMNAZYZQRCMD  resolver URL (full)

The hashed record commits to six surfaces:

Artifacts: PDF, source, figures, data, code, and supplementary files, all hashed into one Merkle root.
Metadata: Title, authors, abstract, categories, version, and original source such as arXiv or DOI.
Graph: Claims, references, formal proof links, figures, corrections, and downstream citation health.
Versioning: Parent versions, superseding records, corrections, retractions, and signed update history.
Anchors: Pith receipt plus optional OpenTimestamps, Bitcoin, Arweave, or other independent proofs.
Attestation: Empty for computed records; author-signed claim acceptance for attested records.
View technical record sketch
pith:2026:Q5K3F8...M2 │ ├─ artifact_root sha256:f2a9...b1 │ ├─ paper.pdf sha256:... │ ├─ paper.tex sha256:... │ ├─ figures/ merkle:... │ └─ data/ merkle:... │ ├─ metadata {title, authors, arxiv:2605.15201, ...} ├─ version_anchor parent=pith:2026:Q5K3F8...M1 (v1) ├─ claim_ledger [C1..Cn, signed by extractor] ├─ references [62 resolved works -> pith node IDs] ├─ formal_canon_links [Lean: alexander_duality_circle_linking] ├─ external_anchors [opentimestamps, bitcoin, arweave] ├─ storage_pointers [arxiv, archive.org, ipfs] ├─ pith_receipt Pith signature + UTC timestamp └─ author_attestation [empty | signed claim acceptance]

Claiming a Pith Number

The author-claim flow is where the registry adds the most value. Pith does not ask the author to upload anything. Pith asks the author to respond to what Pith already extracted.

Identity

Author signs in with ORCID, institutional SSO, GitHub, or a signed message from a registered email. Pith records the verification grade as one of four states.

  • orcid_verified for ORCID-backed author matches.
  • admin_verified for editor-verified author matches.
  • name_matched for weaker metadata matches.
  • self_claimed for unverified author assertions.

Reviewing claims

The author sees each machine-extracted load-bearing claim and marks it:

  • accept: the claim is faithfully extracted.
  • refine: the extracted text needs minor wording correction; author supplies revised text.
  • dispute: the extraction misrepresents the paper; author supplies the correct statement.
  • not load-bearing: demote a claim Pith over-weighted.

Premise and axiom acceptance

The same review applies to the load-bearing premises and the axioms. The author goes on record about what their paper actually depends on.

Signed attestation

The author signs the resulting object. The signature commits to the typed responses, not just the abstract identity claim. The Pith Number becomes attested.

Only ORCID-verified or admin-verified claims qualify for the attestation badge and the signed-annotation channel.

Citation signatures

live, basicclaim and version targeting are design intent

A DOI citation says "this document was referenced." A Pith citation can say "this specific claim was used in this specific way, signed by the citing author." Today the Pith Number resolver accepts signed citation submissions at the paper level with a typed relationship. Claim-level (#C14) and version-level (.v2) targeting are scaffolded in the data model and roll out next.

Citing paper: pith:B8K0A1
Citation intent: supports, extends, disputes
Cited claim: pith:Q5K3F8.v2#C14

Granularity

A Pith citation can target any of the following inside the cited paper:

  • Paper-level citation uses pith:Q5K3F8.
  • Version-level citation uses pith:Q5K3F8.v2.
  • Claim-level citation uses pith:Q5K3F8.v2#C14.
  • Evidence-level citation uses pith:Q5K3F8.v2#C14@proof:L3.
  • Figure-level citation uses pith:Q5K3F8.v2#fig:3.

Intent

Every citation carries a typed intent:

  • The supports intent means the citing author relies on the cited claim.
  • The extends intent means the citing paper goes beyond the cited result.
  • The uses-method intent marks a methodological dependency.
  • The reproduces intent claims an independent replication.
  • The contrasts intent disagrees in detail.
  • The disputes intent challenges correctness.
  • The corrects intent supersedes the cited claim.
  • The prior-art intent marks historical precedence.
  • The background intent marks contextual reference only.

Citation receipt

Each citation becomes a signed registry object:

{
  "citation_id": "pcs:2026:7F3A...9C",
  "citing_pith": "pith:2026:B8K0...A1",
  "cited_pith":  "pith:2026:Q5K3F8...M2",
  "cited_version": "v2",
  "cited_claim":   "C14",
  "intent": "supports",
  "quote": "We use the dimensional counting argument from §2 of...",
  "citing_location": "Section 3, paragraph 2",
  "signed_by": ["orcid:0000-0001-2345-6789"],
  "created_at": "2026-05-17T19:32:00Z",
  "status": "active"
}

Cascading citation health design intent

When a cited claim is corrected, retracted, or superseded, Pith will automatically mark every downstream citation that targeted it. A reader will see, in line:

This paper cites the claim pith:Q5K3F8.v2#C14 with supporting intent. That claim was corrected in version v3#C14 on 2026-08-04. Review recommended.

The data model supports this. The render path on citing-paper resolvers is rolling out alongside claim-level citation targeting.

Living annotations

Open commentary often creates noise. Pith relies on a small set of signed, structured annotations instead. Free-form discussion belongs elsewhere.

Replication / falsification

A third party signs a statement asserting that an independent run reproduces or falsifies a specific claim. Includes data hash, code hash, methodology pointer, and confidence.

Supersession / correction

An author signs a statement that one Pith Number now carries the work of another, or that a specific claim is corrected. Cascades to downstream citation health.

Formal bridge

A mathematician attaches a Lean module, Coq script, or other machine-checked proof to a specific claim or theorem in the paper. The proof itself is content-addressed and verified.

Metadata correction

Authors can flag typos, mis-extracted text, or mis-resolved references. Pith updates the rendered surface; the original record stays immutable. The history of corrections is preserved.

Every annotation is a separate signed registry object with its own Pith Number. The resolver composes the paper's current state from the annotation graph at query time. Nothing is silently mutated.

No custody, only verification

Pith does not store papers. Pith verifies that other parties store the exact paper identified by the Pith Number.

Centralized hosting creates liability and single points of failure. Pith's promise is verifiability, not preservation. Any future party with the artifact can independently verify it matches the Pith Number. Pith does not promise the artifact will be available; it promises that if it is found, it can be checked.

Third-party custody partners

  • arXiv (already hosts most physics, math, CS preprints).
  • Internet Archive.
  • IPFS pinning providers.
  • Arweave or Filecoin.
  • Institutional repositories.
  • Author-controlled storage (own website, S3, R2, GCS).
  • Zenodo and other CERN-backed archives.

Pith records the storage pointers and periodically re-checks that the hashes still resolve. If a mirror dies, the record is unchanged; the resolver simply shows that location as unavailable.

Built for AI agents and humans alike

AI agents are first-class citizens. The address space and the data model reflect that.

Content negotiation at the resolver

The same address serves humans and agents. A browser opens pith.science/pith/MGWAO3HY and gets the HTML review page. An agent sets Accept: application/ld+json on the same URL and gets the JSON record. The .json suffix is also available for clients that cannot set headers.

Single payload, rich graph

One agent call returns: the paper text, the claims, the load-bearing premises, the formal-canon links, the resolved references (with their Pith Numbers), the inbound citation list, the citation health summary, the annotation graph, and the version lineage. No scraping, parsing, or chained calls.

Writeable, not just readable

The Pith Registry is the canonical place where agent-generated scientific output gets cryptographically grounded. An agent that runs a Lean proof, derives a result, completes a replication, or audits a citation has a signed home for that work.

How authors register today

Most papers Pith Numbers cover are pulled in automatically from arXiv. Authors do not upload anything. The author-facing flow is one button on the paper page.

  1. Pith pulls your paper. arXiv ingestion runs daily. Pith computes the canonical hash, extracts claims, and resolves references.
  2. Sign in. ORCID, Apple, X, or email magic-link. Whichever path you pick, your identity grade follows.
  3. Click "This is my paper". If your ORCID is on the arXiv author list, the attestation is one-click and shows ORCID verified. Otherwise it lands as name match or self-claimed and can be strengthened later.
  4. Anchor and mirror. One click each. OpenTimestamps stamps the canonical hash; Internet Archive Wayback captures the paper URL. Both are free and confirm in seconds.
  5. Receipt. Pith Number, content hash, Pith Ed25519 signature, OpenTimestamps proof, Wayback URL. All resolvable from one URL.

The promise:

Anyone in the future can recompute the Pith Number from the canonical record, verify the Pith signature against the published public key, and verify the OpenTimestamps proof against the Bitcoin chain. Even if Pith disappears.

Direct upload of new work (PDF, TeX archive, DOCX) is on the roadmap; today, the registration entry points are arXiv ingestion and the paper-page claim button.

Coverage

Pith indexes the entire scientific corpus, not only papers submitted to Pith. arXiv, Crossref, OpenAlex, PubMed, and Zenodo are all sources. Every paper added refines the graph for every other paper.

Each Pith review resolves its reference list to other Pith nodes, so the bidirectional citation graph emerges as a side effect of normal indexing. An agent or researcher can ask:

  • Every paper citing a specific Lean theorem.
  • Every paper in a domain whose load-bearing premise is a specific argument.
  • All retractions and corrections in a given subfield, by date.

These queries are answered directly from the registry. No scraping required.

Pith Journal

The Pith Registry and the Pith Journal are separate products. The Journal is a customer of the Registry, not its parent.

What the Journal adds on top of a Pith Number

  • Journal signature on the publication object (accepted version, publication date, editorial decision).
  • Deep-review binding: full referee reports, claim ledger, AI-artifact audit, readiness score.
  • Storage and anchoring as part of the journal fee.
  • Public candidate window, endorsements, and objections.
  • Auto-publish of the linked peer-review ticket on acceptance.

A Pith Journal paper has all the same fields as any other Pith Number, plus a block named pith_journal_signature. A reader can verify Journal status without trusting Pith's UI; the signature is in the portable proof bundle.

For authors, the Journal promise is: Pith Journal publication includes Pith Number registration with storage, anchoring, and editorial signature, all verifiable without trusting Pith.

Open by design

The Pith Number protocol is open. The canonical resolver is Pith's. Other resolvers can exist. If pith.science disappears, the records remain independently verifiable.

Pith controls schema evolution, the resolver software, validation rules, Pith-signed receipts, badge policy at pith.science, and Pith Journal editorial decisions. Pith does not control whether a signed record exists, third-party mirrors, public-chain anchors once placed, or what another resolver says about the same record.

This balance keeps the canonical resolver authoritative without making the system a single point of failure.

Status. The Pith Number scheme and registry are in active development. Coverage of indexed papers, machine-extracted claims, and resolved citation graphs is growing daily. Author claiming, signed annotations, and external anchoring are rolling out in stages.