{"paper":{"title":"Collaborative Parameter Learning: Mitigating Forgetting via Parameter-Level Gradient Analysis","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Collaborative Parameter Learning freezes conflicting parameters to let large language models acquire new knowledge while retaining old capabilities.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Haolin Li, Jiandong Gao, Ji Wu, Kaili Zheng, Kaiwen Wang, Mutian Yang, Qi Wang, Yuguang Wang, Yutong Chen, Zisen Zhan","submitted_at":"2026-01-29T11:42:30Z","abstract_excerpt":"Catastrophic forgetting during knowledge injection impairs the ability of large language models to acquire new knowledge without overwriting previously mastered knowledge. Recent studies analyze forgetting from a gradient similarity perspective and mitigate forgetting through vector projection. However, these methods primarily characterize gradient similarity at the aggregate direction level, leaving the parameter wise contributions to forgetting underexplored. In this paper, we decompose gradient similarity into parameter wise contributions and identify two types of parameters during forgetti"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Experiments comparing CPL with seven baseline methods show that CPL learns 20.2% to 48.2% more questions with negligible forgetting, while reducing peak VRAM by approximately 3 GB per billion model parameters and computation time by 16.5 percent.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the parameter-wise gradient contributions observed during a single training run reliably classify parameters as conflicting or collaborative in a way that generalizes across models, tasks, and future updates without requiring per-run reclassification.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Collaborative Parameter Learning freezes 50-75% of parameters whose updates cause forgetting and updates only the 25-50% that mitigate it, allowing LLMs to learn 20-48% more new questions with negligible forgetting and lower compute cost.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Collaborative Parameter Learning freezes conflicting parameters to let large language models acquire new knowledge while retaining old capabilities.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"340bad8e379841dd1bc1fcd15144fbffb04ac0471f6f977ebf763afae3851ab9"},"source":{"id":"2601.21577","kind":"arxiv","version":2},"verdict":{"id":"0af9760b-685b-4a6d-b3ba-6fff5f4f9580","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T09:31:25.940756Z","strongest_claim":"Experiments comparing CPL with seven baseline methods show that CPL learns 20.2% to 48.2% more questions with negligible forgetting, while reducing peak VRAM by approximately 3 GB per billion model parameters and computation time by 16.5 percent.","one_line_summary":"Collaborative Parameter Learning freezes 50-75% of parameters whose updates cause forgetting and updates only the 25-50% that mitigate it, allowing LLMs to learn 20-48% more new questions with negligible forgetting and lower compute cost.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the parameter-wise gradient contributions observed during a single training run reliably classify parameters as conflicting or collaborative in a way that generalizes across models, tasks, and future updates without requiring per-run reclassification.","pith_extraction_headline":"Collaborative Parameter Learning freezes conflicting parameters to let large language models acquire new knowledge while retaining old capabilities."},"references":{"count":23,"sample":[{"doi":"","year":null,"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","ref_index":1,"cited_arxiv_id":"2303.08774","is_internal_anchor":true},{"doi":"","year":null,"title":"Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge","work_id":"28ea1282-d657-4c61-a83c-f1249be6d6b1","ref_index":2,"cited_arxiv_id":"1803.05457","is_internal_anchor":true},{"doi":"","year":null,"title":"Time sensitive knowledge editing through efficient finetuning.arXiv preprint arXiv:2406.04496,","work_id":"0f44370a-c8cc-46a5-9746-386137b2fc54","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","ref_index":4,"cited_arxiv_id":"2407.21783","is_internal_anchor":true},{"doi":"","year":null,"title":"Adams, Jens-Michalis Papaioannou, Paul Grundmann, Tom Oberhauser, Alexei Figueroa, Alexander Löser, Daniel Truhn, and Keno K","work_id":"e04d7f5e-abf5-46d9-8a8a-1707106f84bd","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":23,"snapshot_sha256":"c50336c5b84511fe4e9134aa77475265ffae3d58bf18e123e2148078e8bb9f6a","internal_anchors":8},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}