{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2024:OUY3FJZZ64YXYLASJSH3UPIBMZ","short_pith_number":"pith:OUY3FJZZ","schema_version":"1.0","canonical_sha256":"7531b2a739f7317c2c124c8fba3d01664e0a38bb3b28428c4c26f40666a89ed5","source":{"kind":"arxiv","id":"2404.05868","version":2},"attestation_state":"computed","paper":{"title":"Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Negative Preference Optimization unlearns large portions of LLM training data without catastrophic collapse.","cross_cats":["cs.AI","cs.CL","stat.ML"],"primary_cat":"cs.LG","authors_text":"Licong Lin, Ruiqi Zhang, Song Mei, Yu Bai","submitted_at":"2024-04-08T21:05:42Z","abstract_excerpt":"Large Language Models (LLMs) often memorize sensitive, private, or copyrighted data during pre-training. LLM unlearning aims to eliminate the influence of undesirable data from the pre-trained model while preserving the model's utilities on other tasks. Several practical methods have recently been proposed for LLM unlearning, mostly based on gradient ascent (GA) on the loss of undesirable data. However, on certain unlearning tasks, these methods either fail to effectively unlearn the target data or suffer from catastrophic collapse -- a drastic degradation of the model's utilities.\n  In this p"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":false},"canonical_record":{"source":{"id":"2404.05868","kind":"arxiv","version":2},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.LG","submitted_at":"2024-04-08T21:05:42Z","cross_cats_sorted":["cs.AI","cs.CL","stat.ML"],"title_canon_sha256":"3b37a71945c268e2b22937067c4ebb45934b5b6fca9614682c71e35e450d1684","abstract_canon_sha256":"33132e33fc41748dd9cbaae0d98204eb9a4368ecb3197d78ea82a656e5eaa285"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:46.389295Z","signature_b64":"7lEYEJL6O4pXtbfGQvZnUCxFmB1i9yIKKfIkq/6zpFAVrv38SiXFkx34oxW3w0wsiGf3CkT8SdNogYeSpIaQBw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"7531b2a739f7317c2c124c8fba3d01664e0a38bb3b28428c4c26f40666a89ed5","last_reissued_at":"2026-05-17T23:38:46.388851Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:46.388851Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Negative Preference Optimization unlearns large portions of LLM training data without catastrophic collapse.","cross_cats":["cs.AI","cs.CL","stat.ML"],"primary_cat":"cs.LG","authors_text":"Licong Lin, Ruiqi Zhang, Song Mei, Yu Bai","submitted_at":"2024-04-08T21:05:42Z","abstract_excerpt":"Large Language Models (LLMs) often memorize sensitive, private, or copyrighted data during pre-training. LLM unlearning aims to eliminate the influence of undesirable data from the pre-trained model while preserving the model's utilities on other tasks. Several practical methods have recently been proposed for LLM unlearning, mostly based on gradient ascent (GA) on the loss of undesirable data. However, on certain unlearning tasks, these methods either fail to effectively unlearn the target data or suffer from catastrophic collapse -- a drastic degradation of the model's utilities.\n  In this p"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Remarkably, on TOFU, NPO-based methods are the first to achieve reasonable unlearning results in forgetting 50% (or more) of the training data, whereas existing methods already struggle with forgetting 10% of training data.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The assumption that results on the synthetic data and TOFU benchmark will generalize to real-world unlearning of sensitive data in large production LLMs without introducing new failure modes.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"NPO enables stable unlearning of 50%+ training data in LLMs on TOFU by making collapse exponentially slower than gradient ascent, preserving sensible outputs where prior methods fail.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Negative Preference Optimization unlearns large portions of LLM training data without catastrophic collapse.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"6bb5a326d3b80ea65e0cabcdf28a1990f28c4262620658dc3808c7b37085c68c"},"source":{"id":"2404.05868","kind":"arxiv","version":2},"verdict":{"id":"c2db78a8-25e3-4edc-8ea2-860620f78f4a","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T22:23:06.857645Z","strongest_claim":"Remarkably, on TOFU, NPO-based methods are the first to achieve reasonable unlearning results in forgetting 50% (or more) of the training data, whereas existing methods already struggle with forgetting 10% of training data.","one_line_summary":"NPO enables stable unlearning of 50%+ training data in LLMs on TOFU by making collapse exponentially slower than gradient ascent, preserving sensible outputs where prior methods fail.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The assumption that results on the synthetic data and TOFU benchmark will generalize to real-world unlearning of sensitive data in large production LLMs without introducing new failure modes.","pith_extraction_headline":"Negative Preference Optimization unlearns large portions of LLM training data without catastrophic collapse."},"references":{"count":34,"sample":[{"doi":"","year":null,"title":"Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback","work_id":"a1f2574b-a899-4713-be60-c87ba332656c","ref_index":1,"cited_arxiv_id":"2204.05862","is_internal_anchor":true},{"doi":"","year":2021,"title":"Machine unlearning","work_id":"ec065f1e-75e5-43a4-a873-1a134cd387db","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2015,"title":"Towards making systems forget with machine unlearning","work_id":"f94eb5f4-9bee-4d94-a99f-86164ad870fc","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Quantifying Memorization Across Neural Language Models","work_id":"35487ec1-b90b-4ace-95bd-1bce30064b2e","ref_index":4,"cited_arxiv_id":"2202.07646","is_internal_anchor":true},{"doi":"","year":null,"title":"Unlearn what you want to forget: Efficient unlearning for llms","work_id":"903757fe-c1e7-41c0-9760-28e13aec8628","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":34,"snapshot_sha256":"bdf084fcd3e8d0b818522ea3136779ef94393f926f2c06111194d39a27bfb3d4","internal_anchors":7},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2404.05868","created_at":"2026-05-17T23:38:46.388922+00:00"},{"alias_kind":"arxiv_version","alias_value":"2404.05868v2","created_at":"2026-05-17T23:38:46.388922+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2404.05868","created_at":"2026-05-17T23:38:46.388922+00:00"},{"alias_kind":"pith_short_12","alias_value":"OUY3FJZZ64YX","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"OUY3FJZZ64YXYLAS","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"OUY3FJZZ","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":27,"internal_anchor_count":27,"sample":[{"citing_arxiv_id":"2505.16831","citing_title":"Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs","ref_index":43,"is_internal_anchor":true},{"citing_arxiv_id":"2512.00778","citing_title":"What Is Preference Optimization Doing, and Why?","ref_index":25,"is_internal_anchor":true},{"citing_arxiv_id":"2605.20915","citing_title":"Calibration vs Decision Making: Revisiting the Reliability Paradox in Unlearned Language Models","ref_index":69,"is_internal_anchor":true},{"citing_arxiv_id":"2605.15687","citing_title":"ASRU: Activation Steering Meets Reinforcement Unlearning for Multimodal Large Language Models","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18253","citing_title":"Machine Unlearning for Masked Diffusion Language Models","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2506.20941","citing_title":"Revisiting the Past: Data Unlearning with Model State History","ref_index":37,"is_internal_anchor":true},{"citing_arxiv_id":"2510.00761","citing_title":"Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2409.12917","citing_title":"Training Language Models to Self-Correct via Reinforcement Learning","ref_index":89,"is_internal_anchor":true},{"citing_arxiv_id":"2512.12469","citing_title":"Sparse Concept Anchoring for Interpretable and Controllable Neural Representations","ref_index":34,"is_internal_anchor":true},{"citing_arxiv_id":"2512.19728","citing_title":"Hard Negative Sample-Augmented DPO Post-Training for Small Language Models","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2602.23798","citing_title":"MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models","ref_index":15,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14404","citing_title":"Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14514","citing_title":"Defenses at Odds: Measuring and Explaining Defense Conflicts in Large Language Models","ref_index":54,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13595","citing_title":"Inducing Artificial Uncertainty in Language Models","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08800","citing_title":"PPU-Bench:Real World Benchmark for Personalized Partial Unlearning in Vision Language Models","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08765","citing_title":"Unlearners Can Lie: Evaluating and Improving Honesty in LLM Unlearning","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2605.05938","citing_title":"ICU-Bench:Benchmarking Continual Unlearning in Multimodal Large Language Models","ref_index":20,"is_internal_anchor":true},{"citing_arxiv_id":"2605.05909","citing_title":"Null Space Constrained Contrastive Visual Forgetting for MLLM Unlearning","ref_index":15,"is_internal_anchor":true},{"citing_arxiv_id":"2605.04653","citing_title":"Threshold-Guided Optimization for Visual Generative Models","ref_index":28,"is_internal_anchor":true},{"citing_arxiv_id":"2605.01129","citing_title":"Revisiting Privacy Leakage in Machine Unlearning: Membership Inference Beyond the Forgotten Set","ref_index":54,"is_internal_anchor":true},{"citing_arxiv_id":"2604.22076","citing_title":"PrivUn: Unveiling Latent Ripple Effects and Shallow Forgetting in Privacy Unlearning","ref_index":21,"is_internal_anchor":true},{"citing_arxiv_id":"2604.18966","citing_title":"Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2604.07962","citing_title":"Is your algorithm unlearning or untraining?","ref_index":35,"is_internal_anchor":true},{"citing_arxiv_id":"2605.02196","citing_title":"DurableUn: Quantization-Induced Recovery Attacks in Machine Unlearning","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07242","citing_title":"MEMOREPAIR: Barrier-First Cascade Repair in Agentic Memory","ref_index":20,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/OUY3FJZZ64YXYLASJSH3UPIBMZ","json":"https://pith.science/pith/OUY3FJZZ64YXYLASJSH3UPIBMZ.json","graph_json":"https://pith.science/api/pith-number/OUY3FJZZ64YXYLASJSH3UPIBMZ/graph.json","events_json":"https://pith.science/api/pith-number/OUY3FJZZ64YXYLASJSH3UPIBMZ/events.json","paper":"https://pith.science/paper/OUY3FJZZ"},"agent_actions":{"view_html":"https://pith.science/pith/OUY3FJZZ64YXYLASJSH3UPIBMZ","download_json":"https://pith.science/pith/OUY3FJZZ64YXYLASJSH3UPIBMZ.json","view_paper":"https://pith.science/paper/OUY3FJZZ","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2404.05868&json=true","fetch_graph":"https://pith.science/api/pith-number/OUY3FJZZ64YXYLASJSH3UPIBMZ/graph.json","fetch_events":"https://pith.science/api/pith-number/OUY3FJZZ64YXYLASJSH3UPIBMZ/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/OUY3FJZZ64YXYLASJSH3UPIBMZ/action/timestamp_anchor","attest_storage":"https://pith.science/pith/OUY3FJZZ64YXYLASJSH3UPIBMZ/action/storage_attestation","attest_author":"https://pith.science/pith/OUY3FJZZ64YXYLASJSH3UPIBMZ/action/author_attestation","sign_citation":"https://pith.science/pith/OUY3FJZZ64YXYLASJSH3UPIBMZ/action/citation_signature","submit_replication":"https://pith.science/pith/OUY3FJZZ64YXYLASJSH3UPIBMZ/action/replication_record"}},"created_at":"2026-05-17T23:38:46.388922+00:00","updated_at":"2026-05-17T23:38:46.388922+00:00"}