Forget What's Sensitive, Remember What Matters: Token-Level Differential Privacy in Memory Sculpting for Continual Learning

Bihao Zhan; Hang Yan; Jie Zhou; Junsong Li; Liang He; Qianjun Pan; Qin Chen; Shilian Chen; Wen Wu; Xingjiao Wu

arxiv: 2509.12958 · v2 · pith:SL447NJTnew · submitted 2025-09-16 · 💻 cs.AI

Forget What's Sensitive, Remember What Matters: Token-Level Differential Privacy in Memory Sculpting for Continual Learning

Bihao Zhan , Jie Zhou , Junsong Li , Yutao Yang , Shilian Chen , Qianjun Pan , Xin Li , Wen Wu

show 4 more authors

Xingjiao Wu Qin Chen Hang Yan Liang He

This is my paper

Pith reviewed 2026-05-25 08:26 UTC · model grok-4.3

classification 💻 cs.AI

keywords continual learningdifferential privacytoken-level privacymemory sculptingprivacy preservationcatastrophic forgettingmodel utilitysemantic sensitivity

0 comments

The pith

PeCL allocates privacy budgets by token sensitivity and sculpts memory to forget sensitive details while retaining task-invariant knowledge in continual learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Continual learning models gather knowledge over sequential tasks but risk exposing private information from accumulated data. The paper introduces PeCL, which first applies a token-level dynamic differential privacy method to assign smaller noise to non-sensitive tokens and stronger protection to sensitive ones based on semantic analysis. It then uses the same sensitivity signals in a memory sculpting module to remove private entities from model parameters while preserving general historical knowledge that prevents forgetting earlier tasks. This dual mechanism seeks to reduce the utility loss that uniform privacy methods cause. Experiments indicate the approach maintains higher accuracy on prior tasks alongside stronger privacy guarantees than standard baselines.

Core claim

The central claim is that a token-level dynamic differential privacy strategy, which adaptively allocates budgets according to each token's semantic sensitivity, can be integrated with a privacy-guided memory sculpting module that selectively forgets sensitive information from memory and parameters while explicitly retaining task-invariant historical knowledge, thereby achieving robust privacy without the severe accuracy drops seen in uniform differential privacy approaches for continual learning.

What carries the argument

The token-level dynamic Differential Privacy strategy that assigns privacy budgets by semantic sensitivity of tokens, paired with the privacy-guided memory sculpting module that removes sensitive content while preserving general knowledge.

If this is right

Models maintain high accuracy on previous tasks while applying stronger protection to sensitive tokens.
Noise injection is minimized on non-sensitive general knowledge, reducing overall utility degradation.
The same sensitivity analysis serves dual purposes in privacy allocation and memory management.
Outperforms uniform differential privacy baselines in balancing privacy and continual learning performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be tested in domains where token sensitivity varies sharply, such as medical or legal text sequences, to measure real-world privacy-utility trade-offs.
If sensitivity detection proves reliable, similar adaptive mechanisms might apply to other privacy techniques beyond differential privacy in sequential models.
The sculpting step suggests a general principle that privacy mechanisms can double as selective forgetting tools, potentially linking to broader research on controlled memory in neural networks.

Load-bearing premise

Semantic sensitivity of individual tokens can be accurately and reliably determined to guide both adaptive privacy budget allocation and selective memory sculpting without harming overall task performance.

What would settle it

A controlled test in which token sensitivity labels are replaced by random assignments, after which either accuracy on previous tasks falls below baseline levels or private token information leaks at rates comparable to non-private models.

Figures

Figures reproduced from arXiv: 2509.12958 by Bihao Zhan, Hang Yan, Jie Zhou, Junsong Li, Liang He, Qianjun Pan, Qin Chen, Shilian Chen, Wen Wu, Xingjiao Wu, Xin Li, Yutao Yang.

**Figure 1.** Figure 1: The framework of our PeCL, which is designed to balance privacy and utility by dynamically applying token-level differential privacy and intelligently sculpting model memory to retain crucial knowledge while forgetting sensitive information. The framework comprises two core modules: (a) Token-level Dynamic Differential Privacy, which adaptively injects noise into token embeddings based on their calculated… view at source ↗

**Figure 2.** Figure 2: Results of different task order. Further Analysis Influence of Task Order. We consider three task orders: a natural progression (order1: 1 → 2 → ... → 6), the reversed sequence (order2: 6 → 5 →... → 1), and a deliberately shuffled permutation (order3: 4 → 5 → 1 → 3 → 6 → 2) [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: The performance of α. 0 1 3 5 7 unlearn 0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Performance -0.133 -0.093 -0.127 -0.101 -0.093 0.544 0.573 0.534 0.501 0.549 0.524 0.535 0.511 0.454 0.513 BWT Last Avg [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: The Impact of λunlearn. trade-off between model uncertainty Score1(ti) and contextual informativeness Score2(ti) in the privacy sensitivity score. As shown in [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Continual Learning (CL) models, while adept at sequential knowledge acquisition, face significant and often overlooked privacy challenges due to accumulating diverse information. Traditional privacy methods, like a uniform Differential Privacy (DP) budget, indiscriminately protect all data, leading to substantial model utility degradation and hindering CL deployment in privacy-sensitive areas. To overcome this, we propose a privacy-enhanced continual learning (PeCL) framework that forgets what's sensitive and remembers what matters. Our approach first introduces a token-level dynamic Differential Privacy strategy that adaptively allocates privacy budgets based on the semantic sensitivity of individual tokens. This ensures robust protection for private entities while minimizing noise injection for non-sensitive, general knowledge. Second, we integrate a privacy-guided memory sculpting module. This module leverages the sensitivity analysis from our dynamic DP mechanism to intelligently forget sensitive information from the model's memory and parameters, while explicitly preserving the task-invariant historical knowledge crucial for mitigating catastrophic forgetting. Extensive experiments show that PeCL achieves a superior balance between privacy preserving and model utility, outperforming baseline models by maintaining high accuracy on previous tasks while ensuring robust privacy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Token-level dynamic DP paired with sensitivity-guided memory sculpting is the core new idea, but the unvalidated sensitivity scorer is a load-bearing assumption that needs checking.

read the letter

The main things to know are that the paper introduces a token-level dynamic differential privacy mechanism that assigns per-token privacy budgets according to semantic sensitivity, then feeds that analysis into a memory sculpting step that removes sensitive content while trying to retain task-invariant knowledge for continual learning. The combination is presented as new relative to uniform DP baselines in the CL literature. The motivation is handled cleanly: uniform noise hurts utility across the board, and CL's sequential nature makes blanket protection especially costly. That part of the framing is reasonable and directly addresses a practical deployment barrier. The approach also tries to be explicit about preserving what matters for avoiding catastrophic forgetting, which is a sensible goal. The soft spots sit mainly with the sensitivity component. The framework depends on reliable per-token sensitivity labels to set both the local epsilon and the sculpting decisions. The abstract supplies no description of how those labels are produced, no ablation on the classifier, and no check against human annotations or attack models. If that step is noisy, the claimed privacy guarantee and the utility numbers both become conditional. The stress-test note on this point holds up from what is visible. Experiments are asserted to show better accuracy-privacy balance than baselines, but without the actual numbers, baselines, or privacy measurement details, it is not possible to judge whether the gains are real or how sensitive the results are to the sensitivity oracle. The paper is aimed at people working on privacy mechanisms inside lifelong learning. A reader already thinking about targeted DP or memory management in sequential settings might pick up the framework as a starting point, but would need the full methods and results to decide whether to build on it. It deserves peer review so the sensitivity validation and the experimental evidence can be examined properly.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes PeCL, a continual learning framework that applies token-level dynamic differential privacy to adaptively allocate per-token privacy budgets according to semantic sensitivity, combined with a privacy-guided memory sculpting module that selectively removes sensitive information from memory and parameters while retaining task-invariant knowledge to mitigate catastrophic forgetting. It claims that extensive experiments demonstrate a superior privacy-utility tradeoff, with higher accuracy on prior tasks and robust privacy compared to baselines.

Significance. If the core mechanisms hold, the work could advance privacy-aware continual learning by avoiding the utility penalty of uniform DP budgets. The token-level adaptivity and sculpting approach target a genuine tension between privacy and retention in sequential learning settings. However, the significance is constrained by the absence of any validation for the sensitivity determination step that underpins both the DP allocation and sculpting decisions.

major comments (2)

[Abstract] Abstract: the central mechanism requires an accurate per-token semantic sensitivity label to set local privacy budgets ε_i and to select memory entries for sculpting, yet the manuscript provides no ablation on the sensitivity classifier, no comparison to human-annotated sensitivity, and no evaluation of how false negatives on private entities or false positives on task-critical tokens affect the reported accuracy/privacy tradeoff.
[Abstract] Abstract: the claim that 'extensive experiments show that PeCL achieves a superior balance' cannot be assessed because the manuscript supplies no quantitative results, no baseline descriptions, no tables of accuracy or privacy metrics, and no experimental protocol, leaving the performance assertions unverifiable.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We agree that the current abstract is too high-level and will revise it to improve verifiability while preserving its summary nature. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central mechanism requires an accurate per-token semantic sensitivity label to set local privacy budgets ε_i and to select memory entries for sculpting, yet the manuscript provides no ablation on the sensitivity classifier, no comparison to human-annotated sensitivity, and no evaluation of how false negatives on private entities or false positives on task-critical tokens affect the reported accuracy/privacy tradeoff.

Authors: We acknowledge the abstract provides no such details or ablations. The full manuscript describes the sensitivity classifier in Section 3.2 but indeed lacks dedicated ablations on its accuracy or error impact. We will revise the abstract to briefly note the classifier approach and add a new ablation subsection (and associated discussion of false positive/negative effects) in the experiments. revision: yes
Referee: [Abstract] Abstract: the claim that 'extensive experiments show that PeCL achieves a superior balance' cannot be assessed because the manuscript supplies no quantitative results, no baseline descriptions, no tables of accuracy or privacy metrics, and no experimental protocol, leaving the performance assertions unverifiable.

Authors: The abstract is a concise summary and therefore omits numbers, tables, and protocols; these appear in Section 4 of the full manuscript. We agree the abstract claim is currently unverifiable on its own and will revise it to include one or two key quantitative results (e.g., accuracy gains and privacy metrics versus baselines) along with a pointer to the experimental protocol. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The provided abstract and description introduce PeCL as a proposed framework that adds token-level dynamic DP (allocating budgets by semantic sensitivity) and a privacy-guided memory sculpting module. No equations, self-definitions, or fitted-input predictions are exhibited that reduce outputs to inputs by construction. No self-citation load-bearing steps, uniqueness theorems imported from the authors, or ansatzes smuggled via citation appear in the text. The sensitivity analysis is presented as an input mechanism to the method rather than derived from the method itself, and experimental claims are offered as external validation. The derivation chain is therefore self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on free parameters, axioms, or invented entities; all fields left empty due to lack of technical detail.

pith-pipeline@v0.9.0 · 5757 in / 1030 out tokens · 38982 ms · 2026-05-25T08:26:15.004670+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 5 internal anchors

[1]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

B.; Mironov, I.; Talwar, K.; and Zhang, L

Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H. B.; Mironov, I.; Talwar, K.; and Zhang, L. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, 308--318

work page 2016
[4]

Asghar, N. 2016. Yelp dataset challenge: Review rating prediction. arXiv preprint arXiv:1605.05362

work page internal anchor Pith review Pith/arXiv arXiv 2016
[5]

A.; Jia, H.; Travers, A.; Zhang, B.; Lie, D.; and Papernot, N

Bourtoule, L.; Chandrasekaran, V.; Choquette-Choo, C. A.; Jia, H.; Travers, A.; Zhang, B.; Lie, D.; and Papernot, N. 2021. Machine unlearning. In 2021 IEEE symposium on security and privacy (SP), 141--159. IEEE

work page 2021
[6]

Carlini, N.; Tramer, F.; Wallace, E.; Jagielski, M.; Herbert-Voss, A.; Lee, K.; Roberts, A.; Brown, T.; Song, D.; Erlingsson, U.; et al. 2021. Extracting training data from large language models. In 30th USENIX security symposium (USENIX Security 21), 2633--2650

work page 2021
[7]

B.; Mitchell, N.; Pillutla, K.; and Rush, K

Charles, Z.; Ganesh, A.; McKenna, R.; McMahan, H. B.; Mitchell, N.; Pillutla, K.; and Rush, K. 2024. Fine-tuning large language models with user-level differential privacy. arXiv preprint arXiv:2407.07737

work page arXiv 2024
[8]

K.; Ajanthan, T.; and Torr, P

Chaudhry, A.; Dokania, P. K.; Ajanthan, T.; and Torr, P. H. 2018. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European conference on computer vision (ECCV), 532--547

work page 2018
[9]

Chourasia, R.; and Shah, N. 2023. Forget unlearning: Towards true data-deletion in machine learning. In International conference on machine learning, 6028--6073. PMLR

work page 2023
[10]

Desai, P.; Lai, P.; Phan, N.; and Thai, M. T. 2021. Continual learning with differential privacy. In International Conference on Neural Information Processing, 334--343. Springer

work page 2021
[11]

Dwork, C. 2006. Differential privacy. In International colloquium on automata, languages, and programming, 1--12. Springer

work page 2006
[12]

Feldman, V. 2020. Does learning require memorization? a short tale about a long tail. In Proceedings of the 52nd annual ACM SIGACT symposium on theory of computing, 954--959

work page 2020
[13]

Flemings, J.; Razaviyayn, M.; and Annavaram, M. 2024. Differentially private next-token prediction of large language models. arXiv preprint arXiv:2403.15638

work page arXiv 2024
[14]

D.; and Van de Weijer, J

Gomez-Villa, A.; Twardowski, B.; Yu, L.; Bagdanov, A. D.; and Van de Weijer, J. 2022. Continually learning self-supervised representations with projected functional regularization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 3867--3877

work page 2022
[15]

J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W.; et al

Hu, E. J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W.; et al. 2022. Lora: Low-rank adaptation of large language models. ICLR, 1(2): 3

work page 2022
[16]

Huai, T.; Zhou, J.; Wu, X.; Chen, Q.; Bai, Q.; Zhou, Z.; and He, L. 2025. CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering. In Proceedings of the Computer Vision and Pattern Recognition Conference, 19608--19617

work page 2025
[17]

Huang, Q.; Lian, Z.; and Li, Q. 2022. Attention based adversarial attacks with low perturbations. In 2022 IEEE International Conference on Multimedia and Expo (ICME), 1--6. IEEE

work page 2022
[18]

J.; and Kim, J

Jung, M. J.; and Kim, J. 2024. Pmoe: Progressive mixture of experts with asymmetric transformer for continual learning. arXiv preprint arXiv:2407.21571

work page arXiv 2024
[19]

Kim, S.; Noci, L.; Orvieto, A.; and Hofmann, T. 2023. Achieving a Better Stability-Plasticity Trade-Off via Auxiliary Networks in Continual Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11930--11939

work page 2023
[20]

A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al

Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A. A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13): 3521--3526

work page 2017
[21]

Kuo, M.; Zhang, J.; Zhang, J.; Tang, M.; DiValentin, L.; Ding, A.; Sun, J.; Chen, W.; Hass, A.; Chen, T.; et al. 2025. Proactive privacy amnesia for large language models: Safeguarding PII with negligible impact on model utility. arXiv preprint arXiv:2502.17591

work page arXiv 2025
[22]

Y.; Xu, X.; Li, H.; et al

Liu, S.; Yao, Y.; Jia, J.; Casper, S.; Baracaldo, N.; Hase, P.; Yao, Y.; Liu, C. Y.; Xu, X.; Li, H.; et al. 2025. Rethinking machine unlearning for large language models. Nature Machine Intelligence, 1--14

work page 2025
[23]

Lopez-Paz, D.; and Ranzato, M. 2017. Gradient episodic memory for continual learning. Advances in neural information processing systems, 30

work page 2017
[24]

Mai, P.; Yan, R.; Huang, Z.; Yang, Y.; and Pang, Y. 2023. Split-and-denoise: Protect large language model inference with local differential privacy. arXiv preprint arXiv:2310.09130

work page arXiv 2023
[25]

Meisenbacher, S.; Chevli, M.; Vladika, J.; and Matthes, F. 2024. DP-MLM: Differentially private text rewriting using masked language models. arXiv preprint arXiv:2407.00637

work page arXiv 2024
[26]

Meng, W.; Guo, Z.; Wu, L.; Gong, C.; Liu, W.; Li, W.; Wei, C.; and Chen, W. 2025. Rr: Unveiling llm training privacy through recollection and ranking. arXiv preprint arXiv:2502.12658

work page arXiv 2025
[27]

Mulrooney, A.; Gupta, D.; Flemings, J.; Zhang, H.; Annavaram, M.; Razaviyayn, M.; and Zhang, X. 2025. Memory-Efficient Differentially Private Training with Gradient Random Projection. arXiv preprint arXiv:2506.15588

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

F.; Murtaza, G.; Zafar, S.; and Bano, A

Murtaza, H.; Ahmed, M.; Khan, N. F.; Murtaza, G.; Zafar, S.; and Bano, A. 2023. Synthetic data generation: State of the art in health care domain. Computer Science Review, 48: 100546

work page 2023
[29]

\"O zdenizci, O.; Rueckert, E.; and Legenstein, R. 2025. Privacy-aware lifelong learning. arXiv preprint arXiv:2505.10941

work page arXiv 2025
[30]

Rolnick, D.; Ahuja, A.; Schwarz, J.; Lillicrap, T.; and Wayne, G. 2019. Experience replay for continual learning. Advances in neural information processing systems, 32

work page 2019
[31]

Shah, A.; Paturi, S.; and Chava, S. 2023. Trillion dollar words: A new financial dataset, task & market analysis. arXiv preprint arXiv:2305.07972

work page arXiv 2023
[32]

Shi, H.; Xu, Z.; Wang, H.; Qin, W.; Wang, W.; Wang, Y.; Wang, Z.; Ebrahimi, S.; and Wang, H. 2024. Continual learning of large language models: A comprehensive survey. ACM Computing Surveys

work page 2024
[33]

Sun, M.; Liu, Z.; Bair, A.; and Kolter, J. Z. 2023. A simple and effective pruning approach for large language models. arXiv preprint arXiv:2306.11695

work page internal anchor Pith review Pith/arXiv arXiv 2023
[34]

P.; and Hasegawa, S

Takakura, S.; Liew, S. P.; and Hasegawa, S. 2025. Accelerating Differentially Private Federated Learning via Adaptive Extrapolation. arXiv preprint arXiv:2504.09850

work page arXiv 2025
[35]

Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Rozi \`e re, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971

work page internal anchor Pith review Pith/arXiv arXiv 2023
[36]

Wang, W.; Tian, Z.; Zhang, C.; and Yu, S. 2024. Machine unlearning: A comprehensive survey. arXiv preprint arXiv:2405.07406

work page internal anchor Pith review Pith/arXiv arXiv 2024
[37]

Wang, X.; Chen, T.; Ge, Q.; Xia, H.; Bao, R.; Zheng, R.; Zhang, Q.; Gui, T.; and Huang, X. 2023. Orthogonal subspace learning for language model continual learning. arXiv preprint arXiv:2310.14152

work page arXiv 2023
[38]

A.; and Bansal, M

Yadav, P.; Tam, D.; Choshen, L.; Raffel, C. A.; and Bansal, M. 2023. Ties-merging: Resolving interference when merging models. Advances in Neural Information Processing Systems, 36: 7093--7115

work page 2023
[39]

Yang, Y.; Zhou, J.; Ding, X.; Huai, T.; Liu, S.; Chen, Q.; Xie, Y.; and He, L. 2025. Recent advances of foundation language models-based continual learning: A survey. ACM Computing Surveys, 57(5): 1--38

work page 2025
[40]

A.; Kamath, G.; Kulkarni, J.; Lee, Y

Yu, D.; Naik, S.; Backurs, A.; Gopi, S.; Inan, H. A.; Kamath, G.; Kulkarni, J.; Lee, Y. T.; Manoel, A.; Wutschitz, L.; et al. 2021. Differentially private fine-tuning of language models. arXiv preprint arXiv:2110.06500

work page arXiv 2021
[41]

Zhang, X.; Zhao, J.; and LeCun, Y. 2015. Character-level convolutional networks for text classification. Advances in neural information processing systems, 28

work page 2015

[1] [1]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

B.; Mironov, I.; Talwar, K.; and Zhang, L

Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H. B.; Mironov, I.; Talwar, K.; and Zhang, L. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, 308--318

work page 2016

[4] [4]

Asghar, N. 2016. Yelp dataset challenge: Review rating prediction. arXiv preprint arXiv:1605.05362

work page internal anchor Pith review Pith/arXiv arXiv 2016

[5] [5]

A.; Jia, H.; Travers, A.; Zhang, B.; Lie, D.; and Papernot, N

Bourtoule, L.; Chandrasekaran, V.; Choquette-Choo, C. A.; Jia, H.; Travers, A.; Zhang, B.; Lie, D.; and Papernot, N. 2021. Machine unlearning. In 2021 IEEE symposium on security and privacy (SP), 141--159. IEEE

work page 2021

[6] [6]

Carlini, N.; Tramer, F.; Wallace, E.; Jagielski, M.; Herbert-Voss, A.; Lee, K.; Roberts, A.; Brown, T.; Song, D.; Erlingsson, U.; et al. 2021. Extracting training data from large language models. In 30th USENIX security symposium (USENIX Security 21), 2633--2650

work page 2021

[7] [7]

B.; Mitchell, N.; Pillutla, K.; and Rush, K

Charles, Z.; Ganesh, A.; McKenna, R.; McMahan, H. B.; Mitchell, N.; Pillutla, K.; and Rush, K. 2024. Fine-tuning large language models with user-level differential privacy. arXiv preprint arXiv:2407.07737

work page arXiv 2024

[8] [8]

K.; Ajanthan, T.; and Torr, P

Chaudhry, A.; Dokania, P. K.; Ajanthan, T.; and Torr, P. H. 2018. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European conference on computer vision (ECCV), 532--547

work page 2018

[9] [9]

Chourasia, R.; and Shah, N. 2023. Forget unlearning: Towards true data-deletion in machine learning. In International conference on machine learning, 6028--6073. PMLR

work page 2023

[10] [10]

Desai, P.; Lai, P.; Phan, N.; and Thai, M. T. 2021. Continual learning with differential privacy. In International Conference on Neural Information Processing, 334--343. Springer

work page 2021

[11] [11]

Dwork, C. 2006. Differential privacy. In International colloquium on automata, languages, and programming, 1--12. Springer

work page 2006

[12] [12]

Feldman, V. 2020. Does learning require memorization? a short tale about a long tail. In Proceedings of the 52nd annual ACM SIGACT symposium on theory of computing, 954--959

work page 2020

[13] [13]

Flemings, J.; Razaviyayn, M.; and Annavaram, M. 2024. Differentially private next-token prediction of large language models. arXiv preprint arXiv:2403.15638

work page arXiv 2024

[14] [14]

D.; and Van de Weijer, J

Gomez-Villa, A.; Twardowski, B.; Yu, L.; Bagdanov, A. D.; and Van de Weijer, J. 2022. Continually learning self-supervised representations with projected functional regularization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 3867--3877

work page 2022

[15] [15]

J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W.; et al

Hu, E. J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W.; et al. 2022. Lora: Low-rank adaptation of large language models. ICLR, 1(2): 3

work page 2022

[16] [16]

Huai, T.; Zhou, J.; Wu, X.; Chen, Q.; Bai, Q.; Zhou, Z.; and He, L. 2025. CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering. In Proceedings of the Computer Vision and Pattern Recognition Conference, 19608--19617

work page 2025

[17] [17]

Huang, Q.; Lian, Z.; and Li, Q. 2022. Attention based adversarial attacks with low perturbations. In 2022 IEEE International Conference on Multimedia and Expo (ICME), 1--6. IEEE

work page 2022

[18] [18]

J.; and Kim, J

Jung, M. J.; and Kim, J. 2024. Pmoe: Progressive mixture of experts with asymmetric transformer for continual learning. arXiv preprint arXiv:2407.21571

work page arXiv 2024

[19] [19]

Kim, S.; Noci, L.; Orvieto, A.; and Hofmann, T. 2023. Achieving a Better Stability-Plasticity Trade-Off via Auxiliary Networks in Continual Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11930--11939

work page 2023

[20] [20]

A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al

Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A. A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13): 3521--3526

work page 2017

[21] [21]

Kuo, M.; Zhang, J.; Zhang, J.; Tang, M.; DiValentin, L.; Ding, A.; Sun, J.; Chen, W.; Hass, A.; Chen, T.; et al. 2025. Proactive privacy amnesia for large language models: Safeguarding PII with negligible impact on model utility. arXiv preprint arXiv:2502.17591

work page arXiv 2025

[22] [22]

Y.; Xu, X.; Li, H.; et al

Liu, S.; Yao, Y.; Jia, J.; Casper, S.; Baracaldo, N.; Hase, P.; Yao, Y.; Liu, C. Y.; Xu, X.; Li, H.; et al. 2025. Rethinking machine unlearning for large language models. Nature Machine Intelligence, 1--14

work page 2025

[23] [23]

Lopez-Paz, D.; and Ranzato, M. 2017. Gradient episodic memory for continual learning. Advances in neural information processing systems, 30

work page 2017

[24] [24]

Mai, P.; Yan, R.; Huang, Z.; Yang, Y.; and Pang, Y. 2023. Split-and-denoise: Protect large language model inference with local differential privacy. arXiv preprint arXiv:2310.09130

work page arXiv 2023

[25] [25]

Meisenbacher, S.; Chevli, M.; Vladika, J.; and Matthes, F. 2024. DP-MLM: Differentially private text rewriting using masked language models. arXiv preprint arXiv:2407.00637

work page arXiv 2024

[26] [26]

Meng, W.; Guo, Z.; Wu, L.; Gong, C.; Liu, W.; Li, W.; Wei, C.; and Chen, W. 2025. Rr: Unveiling llm training privacy through recollection and ranking. arXiv preprint arXiv:2502.12658

work page arXiv 2025

[27] [27]

Mulrooney, A.; Gupta, D.; Flemings, J.; Zhang, H.; Annavaram, M.; Razaviyayn, M.; and Zhang, X. 2025. Memory-Efficient Differentially Private Training with Gradient Random Projection. arXiv preprint arXiv:2506.15588

work page internal anchor Pith review Pith/arXiv arXiv 2025

[28] [28]

F.; Murtaza, G.; Zafar, S.; and Bano, A

Murtaza, H.; Ahmed, M.; Khan, N. F.; Murtaza, G.; Zafar, S.; and Bano, A. 2023. Synthetic data generation: State of the art in health care domain. Computer Science Review, 48: 100546

work page 2023

[29] [29]

\"O zdenizci, O.; Rueckert, E.; and Legenstein, R. 2025. Privacy-aware lifelong learning. arXiv preprint arXiv:2505.10941

work page arXiv 2025

[30] [30]

Rolnick, D.; Ahuja, A.; Schwarz, J.; Lillicrap, T.; and Wayne, G. 2019. Experience replay for continual learning. Advances in neural information processing systems, 32

work page 2019

[31] [31]

Shah, A.; Paturi, S.; and Chava, S. 2023. Trillion dollar words: A new financial dataset, task & market analysis. arXiv preprint arXiv:2305.07972

work page arXiv 2023

[32] [32]

Shi, H.; Xu, Z.; Wang, H.; Qin, W.; Wang, W.; Wang, Y.; Wang, Z.; Ebrahimi, S.; and Wang, H. 2024. Continual learning of large language models: A comprehensive survey. ACM Computing Surveys

work page 2024

[33] [33]

Sun, M.; Liu, Z.; Bair, A.; and Kolter, J. Z. 2023. A simple and effective pruning approach for large language models. arXiv preprint arXiv:2306.11695

work page internal anchor Pith review Pith/arXiv arXiv 2023

[34] [34]

P.; and Hasegawa, S

Takakura, S.; Liew, S. P.; and Hasegawa, S. 2025. Accelerating Differentially Private Federated Learning via Adaptive Extrapolation. arXiv preprint arXiv:2504.09850

work page arXiv 2025

[35] [35]

Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Rozi \`e re, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971

work page internal anchor Pith review Pith/arXiv arXiv 2023

[36] [36]

Wang, W.; Tian, Z.; Zhang, C.; and Yu, S. 2024. Machine unlearning: A comprehensive survey. arXiv preprint arXiv:2405.07406

work page internal anchor Pith review Pith/arXiv arXiv 2024

[37] [37]

Wang, X.; Chen, T.; Ge, Q.; Xia, H.; Bao, R.; Zheng, R.; Zhang, Q.; Gui, T.; and Huang, X. 2023. Orthogonal subspace learning for language model continual learning. arXiv preprint arXiv:2310.14152

work page arXiv 2023

[38] [38]

A.; and Bansal, M

Yadav, P.; Tam, D.; Choshen, L.; Raffel, C. A.; and Bansal, M. 2023. Ties-merging: Resolving interference when merging models. Advances in Neural Information Processing Systems, 36: 7093--7115

work page 2023

[39] [39]

Yang, Y.; Zhou, J.; Ding, X.; Huai, T.; Liu, S.; Chen, Q.; Xie, Y.; and He, L. 2025. Recent advances of foundation language models-based continual learning: A survey. ACM Computing Surveys, 57(5): 1--38

work page 2025

[40] [40]

A.; Kamath, G.; Kulkarni, J.; Lee, Y

Yu, D.; Naik, S.; Backurs, A.; Gopi, S.; Inan, H. A.; Kamath, G.; Kulkarni, J.; Lee, Y. T.; Manoel, A.; Wutschitz, L.; et al. 2021. Differentially private fine-tuning of language models. arXiv preprint arXiv:2110.06500

work page arXiv 2021

[41] [41]

Zhang, X.; Zhao, J.; and LeCun, Y. 2015. Character-level convolutional networks for text classification. Advances in neural information processing systems, 28

work page 2015