pith. sign in
Pith Number

pith:7W7UPEX4

pith:2026:7W7UPEX4EHDN6XLVP4EDJQ3TL4
not attested not anchored not stored refs resolved

Copyright Laundering Through the AI Ouroboros: Adapting the 'Fruit of the Poisonous Tree' Doctrine to Recursive AI Training

Anirban Mukherjee, Hannah Hanwen Chang

If a foundational AI model's training is infringing, later models derived from its outputs carry a rebuttable presumption of taint.

arxiv:2601.02631 v2 · 2026-01-06 · cs.CY

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{7W7UPEX4EHDN6XLVP4EDJQ3TL4}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

If a foundational AI model's training is adjudged infringing, then subsequent AI models principally derived from the foundational model's outputs or distilled weights carry a rebuttable presumption of taint, shifting the burden to downstream developers to demonstrate independent lawful lineage or curative rebuild.

C2weakest assumption

That courts can reliably determine when a model is 'principally derived' from tainted outputs and that technical mechanisms like verifiable unlearning can be implemented and audited at scale without excessive cost or false negatives.

C3one line summary

The paper introduces an AI-FOPT standard that presumes copyright infringement taint in models derived from an infringing foundational model unless developers prove independent lawful sourcing.

References

13 extracted · 13 resolved · 2 Pith anchors

[1] TOFU: A Task of Fictitious Unlearning for LLMs 2025 · arXiv:2401.06121
[2] (Order Granting Partial Summary Judgment). 56 Lemley, supra note 15, at 264–65. especially at the point of creation and deployment when the vast range of downstream applications is unknown. The AI Our 1918
[3] destruction under 17 U.S.C. § 503(b) of all GPT or other LLM models and training sets that incorporate Times Works 2023
[4] Derivation (Principally Derived): The plaintiff makes a prima facie showing that a challenged model is principally derived from the poisonous tree’s (or its successor models’) outputs or distilled wei
[5] The burden of production shifts under Fed
Receipt and verification
First computed 2026-05-18T02:45:12.104042Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

fdbf4792fc21c6df5d757f0834c3735f1d87832c6114c3bf2bc49a606867aedb

Aliases

arxiv: 2601.02631 · arxiv_version: 2601.02631v2 · doi: 10.48550/arxiv.2601.02631 · pith_short_12: 7W7UPEX4EHDN · pith_short_16: 7W7UPEX4EHDN6XLV · pith_short_8: 7W7UPEX4
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/7W7UPEX4EHDN6XLVP4EDJQ3TL4 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: fdbf4792fc21c6df5d757f0834c3735f1d87832c6114c3bf2bc49a606867aedb
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "d1ab77419d20c79d1c0b48d8c177129cc808b3462fadcc3eeac7712c8263c2ff",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CY",
    "submitted_at": "2026-01-06T01:02:50Z",
    "title_canon_sha256": "982932ab0ebf66aa3734526bb42a71d9f65228ab62d4238760f8eacbcaa8fbab"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2601.02631",
    "kind": "arxiv",
    "version": 2
  }
}