{"paper":{"title":"Copyright Laundering Through the AI Ouroboros: Adapting the 'Fruit of the Poisonous Tree' Doctrine to Recursive AI Training","license":"http://creativecommons.org/licenses/by/4.0/","headline":"If a foundational AI model's training is infringing, later models derived from its outputs carry a rebuttable presumption of taint.","cross_cats":[],"primary_cat":"cs.CY","authors_text":"Anirban Mukherjee, Hannah Hanwen Chang","submitted_at":"2026-01-06T01:02:50Z","abstract_excerpt":"Copyright enforcement rests on an evidentiary bargain: a plaintiff must show both the defendant's access to the work and substantial similarity in the challenged output. That bargain comes under strain when AI systems are trained through multi-generational pipelines with recursive synthetic data. As successive models are tuned on the outputs of its predecessors, any copyrighted material absorbed by an early model is diffused into deeper statistical abstractions. The result is an evidentiary blind spot where overlaps that emerge look coincidental, while the chain of provenance is too attenuated"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"If a foundational AI model's training is adjudged infringing, then subsequent AI models principally derived from the foundational model's outputs or distilled weights carry a rebuttable presumption of taint, shifting the burden to downstream developers to demonstrate independent lawful lineage or curative rebuild.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That courts can reliably determine when a model is 'principally derived' from tainted outputs and that technical mechanisms like verifiable unlearning can be implemented and audited at scale without excessive cost or false negatives.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"The paper introduces an AI-FOPT standard that presumes copyright infringement taint in models derived from an infringing foundational model unless developers prove independent lawful sourcing.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"If a foundational AI model's training is infringing, later models derived from its outputs carry a rebuttable presumption of taint.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"d70c4d6e8363be00020886da49a769063e8c4c606c913da21f211a24a8b36dcf"},"source":{"id":"2601.02631","kind":"arxiv","version":2},"verdict":{"id":"72a079c1-f96c-4f22-be9b-4725fb22602d","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T17:51:08.670938Z","strongest_claim":"If a foundational AI model's training is adjudged infringing, then subsequent AI models principally derived from the foundational model's outputs or distilled weights carry a rebuttable presumption of taint, shifting the burden to downstream developers to demonstrate independent lawful lineage or curative rebuild.","one_line_summary":"The paper introduces an AI-FOPT standard that presumes copyright infringement taint in models derived from an infringing foundational model unless developers prove independent lawful sourcing.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That courts can reliably determine when a model is 'principally derived' from tainted outputs and that technical mechanisms like verifiable unlearning can be implemented and audited at scale without excessive cost or false negatives.","pith_extraction_headline":"If a foundational AI model's training is infringing, later models derived from its outputs carry a rebuttable presumption of taint."},"references":{"count":13,"sample":[{"doi":"","year":2025,"title":"TOFU: A Task of Fictitious Unlearning for LLMs","work_id":"b05148d7-e817-49f0-be96-3e297fddec9b","ref_index":1,"cited_arxiv_id":"2401.06121","is_internal_anchor":true},{"doi":"","year":1918,"title":"(Order Granting Partial Summary Judgment). 56 Lemley, supra note 15, at 264–65. especially at the point of creation and deployment when the vast range of downstream applications is unknown. The AI Our","work_id":"bb5cfec1-aaeb-40ce-932d-f193e8ee1bba","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"destruction under 17 U.S.C. § 503(b) of all GPT or other LLM models and training sets that incorporate Times Works","work_id":"4548d67a-1db5-4381-9013-6819deef35bf","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Derivation (Principally Derived): The plaintiff makes a prima facie showing that a challenged model is principally derived from the poisonous tree’s (or its successor models’) outputs or distilled wei","work_id":"b4e8d234-5a92-4b6a-bac5-a9fe329ca355","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"The burden of production shifts under Fed","work_id":"d7686b2b-69d4-4ac8-b473-abb02fc65baf","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":13,"snapshot_sha256":"7cbc3c3d9bf9585d9ab4a64b0bf3951e2c8e3338b86c0f09dd068f1326d7c462","internal_anchors":2},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}