Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning
Pith reviewed 2026-05-18 10:42 UTC · model grok-4.3
The pith
Downgrading the optimizer during LLM unlearning produces forgetting that resists later fine-tuning and quantization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The grade of the optimizer, measured by the amount of information it uses from zeroth-order (gradient-free) to first-order (gradient-based) to second-order (Hessian-based), controls how resilient the unlearned model becomes. Lower-grade optimizers generate less precise updates yet drive the model to loss basins that are harder to escape, giving natural resistance to perturbations such as quantization or fine-tuning. This advantage is further linked to the properties of randomized smoothing, and a hybrid optimizer that mixes first-order and zeroth-order steps is shown to deliver both effective forgetting and improved stability.
What carries the argument
Optimizer grade, defined by the level of information exploited (zeroth-order gradient-free through first-order gradient-based to second-order Hessian-based), which shapes update precision and steers convergence to harder-to-disturb basins in the loss landscape.
If this is right
- Zeroth-order and sign-based optimizers produce unlearning that survives weight quantization and additional fine-tuning steps.
- The hybrid first-order plus zeroth-order optimizer maintains high unlearning quality while increasing resistance to later perturbations.
- The robustness improvement holds across multiple unlearning algorithms when evaluated on MUSE and WMDP benchmarks.
- The link between zeroth-order methods and randomized smoothing supplies an inherent defense against small post-unlearning changes.
Where Pith is reading between the lines
- The same optimizer-downgrade approach could be tested in other settings where models must selectively forget information without full retraining.
- Examining the curvature or sharpness of the loss basins reached by different optimizer grades might reveal measurable predictors of robustness.
- Downgrading optimizers may offer a practical lever for improving stability in continual or incremental learning tasks.
Load-bearing premise
That noisier updates from lower-grade optimizers specifically cause convergence to more stable loss basins and that this robustness effect does not depend on the particular unlearning objective chosen.
What would settle it
Apply the same unlearning procedure to an LLM once with a standard first-order optimizer and once with a zeroth-order optimizer, then perform fine-tuning on data related to the forgotten content and measure whether the forgotten knowledge recovers substantially less in the zeroth-order case.
Figures
read the original abstract
Large language model (LLM) unlearning aims to surgically remove the influence of undesired data or knowledge from an existing model while preserving its utility on unrelated tasks. This paradigm has shown promise in addressing privacy and safety concerns. However, recent findings reveal that unlearning effects are often fragile: post-unlearning manipulations such as weight quantization or fine-tuning can quickly neutralize the intended forgetting. Prior efforts to improve robustness primarily reformulate unlearning objectives by explicitly assuming the role of vulnerability sources. In this work, we take a different perspective by investigating the role of the optimizer, independent of unlearning objectives and formulations, in shaping unlearning robustness. We show that the 'grade' of the optimizer, defined by the level of information it exploits, ranging from zeroth-order (gradient-free) to first-order (gradient-based) to second-order (Hessian-based), is tightly linked to the resilience of unlearning. Surprisingly, we find that downgrading the optimizer, such as using zeroth-order methods or compressed-gradient variants (e.g., gradient sign-based optimizers), often leads to stronger robustness. While these optimizers produce noisier and less precise updates, they encourage convergence to harder-to-disturb basins in the loss landscape, thereby resisting post-training perturbations. By connecting zeroth-order methods with randomized smoothing, we further highlight their natural advantage for robust unlearning. Motivated by these insights, we propose a hybrid optimizer that combines first-order and zeroth-order updates, preserving unlearning efficacy while enhancing robustness. Extensive experiments on the MUSE and WMDP benchmarks, across multiple LLM unlearning algorithms, validate that our approach achieves more resilient forgetting without sacrificing unlearning quality.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that in LLM unlearning, downgrading the optimizer from first-order gradient methods to zeroth-order or compressed-gradient variants (e.g., sign-based) produces more robust forgetting that resists post-training perturbations such as quantization or fine-tuning. The mechanism is that noisier updates drive convergence to harder-to-disturb basins in the loss landscape; this is linked to randomized smoothing. A hybrid first-order/zeroth-order optimizer is proposed, and the approach is validated through experiments on the MUSE and WMDP benchmarks across multiple unlearning algorithms.
Significance. If the empirical robustness gains hold under tighter controls, the work offers a simple, objective-independent lever for improving unlearning reliability—an important practical advance given the fragility of current unlearning methods. The randomized-smoothing connection supplies a plausible theoretical framing, and the hybrid optimizer is a constructive proposal. However, the absence of direct landscape analysis leaves the causal account conjectural.
major comments (2)
- Abstract: the central claim that 'noisier and less precise updates... encourage convergence to harder-to-disturb basins' is asserted without any supporting analysis (Hessian spectra, sharpness metrics, basin-width measurements, or parameter-space perturbation tests). This makes the proposed mechanism load-bearing yet unverified, even if robustness gains are observed empirically.
- Experiments (as described in the abstract): the reported 'extensive experiments across multiple algorithms and benchmarks' lack explicit controls for optimizer hyperparameters, baseline optimizer choices, and statistical significance testing. Without these, it is difficult to isolate the optimizer-grade effect from confounding factors such as convergence speed or implicit regularization.
minor comments (2)
- Clarify the precise definition and quantification of 'optimizer grade' (zeroth-order vs. first-order vs. second-order) in the methods section, including any implementation details for the sign-based and zeroth-order variants.
- Add a short discussion of alternative explanations for the observed robustness (e.g., slower convergence or reduced sensitivity to the unlearning loss) and how they were ruled out or could be tested.
Simulated Author's Rebuttal
We are grateful to the referee for their detailed review and insightful comments on our paper. Below we provide point-by-point responses to the major comments and indicate the changes we plan to incorporate in the revised manuscript.
read point-by-point responses
-
Referee: Abstract: the central claim that 'noisier and less precise updates... encourage convergence to harder-to-disturb basins' is asserted without any supporting analysis (Hessian spectra, sharpness metrics, basin-width measurements, or parameter-space perturbation tests). This makes the proposed mechanism load-bearing yet unverified, even if robustness gains are observed empirically.
Authors: We agree that direct landscape analysis would provide stronger causal support for the basin-convergence hypothesis. The randomized-smoothing connection supplies theoretical motivation, and the observed robustness gains are empirical, but we acknowledge the mechanistic account remains partly conjectural without explicit verification. In revision we will add parameter-space perturbation experiments (injecting controlled noise into converged parameters and measuring retention of unlearning) together with sharpness estimates on a subset of models to directly test basin stability. revision: partial
-
Referee: Experiments (as described in the abstract): the reported 'extensive experiments across multiple algorithms and benchmarks' lack explicit controls for optimizer hyperparameters, baseline optimizer choices, and statistical significance testing. Without these, it is difficult to isolate the optimizer-grade effect from confounding factors such as convergence speed or implicit regularization.
Authors: We conducted hyperparameter sweeps for each optimizer variant and chose baselines following prior unlearning literature, yet we accept that fuller documentation is needed to isolate the optimizer-grade effect. In the revised manuscript we will add an appendix detailing the full hyperparameter search ranges, report all results with mean and standard deviation over three random seeds, and include paired statistical significance tests (t-tests with Bonferroni correction) comparing robustness metrics across optimizer grades. revision: yes
Circularity Check
No significant circularity: empirical observations on optimizer downgrading for LLM unlearning robustness
full rationale
The paper advances its claims through experimental results on the MUSE and WMDP benchmarks across multiple unlearning algorithms, rather than any mathematical derivation chain. The observation that downgrading optimizers (zeroth-order or sign-based) yields stronger robustness is presented as a finding from direct comparisons of unlearning efficacy and post-perturbation resilience, not as a quantity predicted by or defined in terms of fitted parameters within the paper's own equations. The interpretive link to 'harder-to-disturb basins' and randomized smoothing is offered as a post-hoc explanation of the empirical pattern, without reducing to self-definition, self-citation load-bearing, or renaming of known results. No load-bearing step collapses to its inputs by construction; the work remains self-contained as an empirical investigation.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkatet al., “Gpt-4 technical report,”arXiv preprint arXiv:2303.08774,
work page internal anchor Pith review Pith/arXiv arXiv
-
[2]
Llama 2: Open Foundation and Fine-Tuned Chat Models
H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y . Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale et al., “Llama 2: Open foundation and fine-tuned chat models,”arXiv preprint arXiv:2307.09288,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lvet al., “Qwen3 technical report,” arXiv preprint arXiv:2505.09388,
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
M. Mazeika, L. Phan, X. Yin, A. Zou, Z. Wang, N. Mu, E. Sakhaee, N. Li, S. Basart, B. Liet al., “Harmbench: A standardized evaluation framework for automated red teaming and robust refusal,”arXiv preprint arXiv:2402.04249,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
N. Li, A. Pan, A. Gopal, S. Yue, D. Berrios, A. Gatti, J. D. Li, A.-K. Dombrowski, S. Goel, L. Phanet al., “The wmdp benchmark: Measuring and reducing malicious use with unlearning,”arXiv preprint arXiv:2403.03218,
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
TrustLLM: Trustworthiness in Large Language Models
Y . Huang, L. Sun, H. Wang, S. Wu, Q. Zhang, Y . Li, C. Gao, Y . Huang, W. Lyu, Y . Zhanget al., “Trustllm: Trustwor- thiness in large language models,”arXiv preprint arXiv:2401.05561,
work page internal anchor Pith review arXiv
-
[7]
Muse: Machine unlearning six-way evaluation for language models,
W. Shi, J. Lee, Y . Huang, S. Malladi, J. Zhao, A. Holtzman, D. Liu, L. Zettlemoyer, N. A. Smith, and C. Zhang, “Muse: Machine unlearning six-way evaluation for language models,”arXiv preprint arXiv:2407.06460,
-
[8]
Beyond memorization: Violating privacy via inference with large language models,
R. Staab, M. Vero, M. Balunovi´c, and M. Vechev, “Beyond memorization: Violating privacy via inference with large language models,”arXiv preprint arXiv:2310.07298,
-
[9]
C. Fan, J. Liu, L. Lin, J. Jia, R. Zhang, S. Mei, and S. Liu, “Simplicity prevails: Rethinking negative preference optimization for llm unlearning,”arXiv preprint arXiv:2410.07163,
-
[10]
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
R. Zhang, L. Lin, Y . Bai, and S. Mei, “Negative preference optimization: From catastrophic collapse to effective unlearning,”arXiv preprint arXiv:2404.05868,
work page internal anchor Pith review Pith/arXiv arXiv
-
[11]
SEUF: Is unlearning one expert enough for mixture-of-experts LLMs?
H. Zhuang, Y . Zhang, K. Guo, J. Jia, G. Liu, S. Liu, and X. Zhang, “SEUF: Is unlearning one expert enough for mixture-of-experts LLMs?” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), W. Che, J. Nabende, E. Shutova, and M. T. Pilehvar, Eds. Vienna, Austria: Association for Computational L...
-
[12]
Deep ignorance: Filtering pretraining data builds tamper-resistant safeguards into open-weight LLMs
K. O’Brien, S. Casper, Q. Anthony, T. Korbak, R. Kirk, X. Davies, I. Mishra, G. Irving, Y . Gal, and S. Bider- man, “Deep ignorance: Filtering pretraining data builds tamper-resistant safeguards into open-weight llms,”arXiv preprint arXiv:2508.06601,
-
[13]
Eight methods to evaluate robust unlearning in llms,
A. Lynch, P. Guo, A. Ewart, S. Casper, and D. Hadfield-Menell, “Eight methods to evaluate robust unlearning in llms,” arXiv preprint arXiv:2402.16835,
-
[14]
Jogging the memory of unlearned model through targeted relearning attack,
S. Hu, Y . Fu, Z. S. Wu, and V . Smith, “Jogging the memory of unlearned model through targeted relearning attack,” arXiv preprint arXiv:2406.13356,
-
[15]
Catastrophic failure of llm unlearning via quantization,
Z. Zhang, F. Wang, X. Li, Z. Wu, X. Tang, H. Liu, Q. He, W. Yin, and S. Wang, “Catastrophic failure of llm unlearning via quantization,”arXiv preprint arXiv:2410.16454,
-
[16]
Invariance makes llm unlearning resilient even to unanticipated downstream fine-tuning,
C. Wang, Y . Zhang, J. Jia, P. Ram, D. Wei, Y . Yao, S. Pal, N. Baracaldo, and S. Liu, “Invariance makes llm unlearning resilient even to unanticipated downstream fine-tuning,”arXiv preprint arXiv:2506.01339,
-
[17]
C. Fan, J. Jia, Y . Zhang, A. Ramakrishna, M. Hong, and S. Liu, “Towards llm unlearning resilient to relearning attacks: A sharpness-aware minimization perspective and beyond,”arXiv preprint arXiv:2502.05374,
-
[18]
Sharpness-Aware Minimization for Efficiently Improving Generalization
P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving general- ization,”arXiv preprint arXiv:2010.01412,
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[19]
Tamper- resistant safeguards for open-weight llms,
R. Tamirisa, B. Bharathi, L. Phan, A. Zhou, A. Gatti, T. Suresh, M. Lin, J. Wang, R. Wang, R. Arelet al., “Tamper- resistant safeguards for open-weight llms,”arXiv preprint arXiv:2408.00761,
-
[20]
M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz, “Invariant risk minimization,”arXiv preprint arXiv:1907.02893,
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[21]
TOFU: A Task of Fictitious Unlearning for LLMs
P. Maini, Z. Feng, A. Schwarzschild, Z. C. Lipton, and J. Z. Kolter, “Tofu: A task of fictitious unlearning for llms,” arXiv preprint arXiv:2401.06121,
work page internal anchor Pith review Pith/arXiv arXiv
-
[22]
DEPN: Detecting and editing privacy neurons in pretrained language models,
X. Wu, J. Li, M. Xu, W. Dong, S. Wu, C. Bian, and D. Xiong, “DEPN: Detecting and editing privacy neurons in pretrained language models,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, Dec. 2023, pp. 2875–2886. [Online]. A...
-
[23]
M. Kuo, J. Zhang, J. Zhang, M. Tang, L. DiValentin, A. Ding, J. Sun, W. Chen, A. Hass, T. Chenet al., “Proactive privacy amnesia for large language models: Safeguarding pii with negligible impact on model utility,”arXiv preprint arXiv:2502.17591,
-
[24]
Beyond single-value metrics: Evaluating and enhancing LLM unlearning with cognitive diagnosis,
Y . Lang, K. Guo, Y . Huang, Y . Zhou, H. Zhuang, T. Yang, Y . Su, and X. Zhang, “Beyond single-value metrics: Evaluating and enhancing LLM unlearning with cognitive diagnosis,” inFindings of the Association for Computational Linguistics: ACL 2025, W. Che, J. Nabende, E. Shutova, and M. T. Pilehvar, Eds. 10 Downgrade to Upgrade: Optimizer Simplification E...
work page 2025
-
[25]
Making harmful behaviors unlearnable for large language models,
[Online]. Available: https://aclanthology.org/2025.findings-acl.1102/ X. Zhou, Y . Lu, R. Ma, Y . Wei, T. Gui, Q. Zhang, and X. Huang, “Making harmful behaviors unlearnable for large language models,” inFindings of the Association for Computational Linguistics: ACL 2024, L.-W. Ku, A. Martins, and V . Srikumar, Eds. Bangkok, Thailand: Association for Compu...
work page 2025
-
[26]
Exploring criteria of loss reweighting to enhance llm unlearning,
[Online]. Available: https://aclanthology.org/2024.findings-acl.611/ P. Yang, Q. Wang, Z. Huang, T. Liu, C. Zhang, and B. Han, “Exploring criteria of loss reweighting to enhance llm unlearning,”arXiv preprint arXiv:2505.11953,
-
[27]
Guardrail baselines for unlearning in llms,
P. Thaker, Y . Maurya, S. Hu, Z. S. Wu, and V . Smith, “Guardrail baselines for unlearning in llms,”arXiv preprint arXiv:2403.03329,
-
[28]
Martin Pawelczyk, Seth Neel, and Himabindu Lakkaraju
M. Pawelczyk, S. Neel, and H. Lakkaraju, “In-context unlearning: Language models as few shot unlearners,”arXiv preprint arXiv:2310.07579,
-
[29]
Ucd: Unlearning in llms via contrastive decoding,
V . M. Suriyakumar, A. Sekhari, and A. Wilson, “Ucd: Unlearning in llms via contrastive decoding,”arXiv preprint arXiv:2506.12097,
-
[30]
Guard: Generation-time llm unlearning via adaptive restriction and detection,
Z. Deng, C. Y . Liu, Z. Pang, X. He, L. Feng, Q. Xuan, Z. Zhu, and J. Wei, “Guard: Generation-time llm unlearning via adaptive restriction and detection,”arXiv preprint arXiv:2505.13312,
-
[31]
K. Bhaila, M.-H. Van, and X. Wu, “Soft prompting for unlearning in large language models,” inProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), L. Chiruzzo, A. Ritter, and L. Wang, Eds. Albuquerque, New Mexico: Association for Comp...
-
[32]
Step-by-step reasoning attack: Reveal- ing’erased’knowledge in large language models,
Y . Sinha, M. Baser, M. Mandal, D. M. Divakaran, and M. Kankanhalli, “Step-by-step reasoning attack: Reveal- ing’erased’knowledge in large language models,”arXiv preprint arXiv:2506.17279,
-
[33]
H. Yuan, Z. Jin, P. Cao, Y . Chen, K. Liu, and J. Zhao, “Towards robust knowledge unlearning: An adversarial frame- work for assessing and improving unlearning robustness in large language models,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 24, 2025, pp. 25 769–25
work page 2025
-
[34]
Model tampering attacks enable more rigorous evaluations of llm capabilities,
Z. Che, S. Casper, R. Kirk, A. Satheesh, S. Slocum, L. E. McKinney, R. Gandikota, A. Ewart, D. Rosati, Z. Wuet al., “Model tampering attacks enable more rigorous evaluations of llm capabilities,”arXiv preprint arXiv:2502.05209,
-
[35]
Latent adversarial training improves robustness to persistent harmful behaviors in llms,
A. Sheshadri, A. Ewart, P. Guo, A. Lynch, C. Wu, V . Hebbar, H. Sleight, A. C. Stickland, E. Perez, D. Hadfield-Menell et al., “Latent adversarial training improves robustness to persistent harmful behaviors in llms,”arXiv preprint arXiv:2407.15549,
-
[36]
Unlearning that lasts: Utility-preserving, robust, and almost irre- versible forgetting in llms,
N. D. Singh, M. M ¨uller, F. Croce, and M. Hein, “Unlearning that lasts: Utility-preserving, robust, and almost irre- versible forgetting in llms,”arXiv preprint arXiv:2509.02820,
-
[37]
Sophia: A scalable stochastic second-order optimizer for language model pre-training
J. Jia, Y . Zhang, Y . Zhang, J. Liu, B. Runwal, J. Diffenderfer, B. Kailkhura, and S. Liu, “SOUL: Unlocking the power of second-order optimization for LLM unlearning,” inProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, Eds. Miami, Florida, USA: Association for Computationa...
-
[38]
Towards memory-efficient and sus- tainable machine unlearning on edge using zeroth-order optimizer,
C. Zhang, C. Yang, Q. Tan, J. Liu, A. Li, Y . Wang, J. Lu, J. Wang, and G. Yuan, “Towards memory-efficient and sus- tainable machine unlearning on edge using zeroth-order optimizer,” inProceedings of the Great Lakes Symposium on VLSI 2025, 2025, pp. 227–232. Y . Xiao, R. Ye, B. Liu, X. Ma, and B. Hui, “Efficient knowledge graph unlearning with zeroth-orde...
-
[39]
Revisiting zeroth-order optimization for memory-efficient llm fine-tuning: A benchmark,
Y . Zhang, P. Li, J. Hong, J. Li, Y . Zhang, W. Zheng, P.-Y . Chen, J. D. Lee, W. Yin, M. Honget al., “Revisiting zeroth-order optimization for memory-efficient llm fine-tuning: A benchmark,”arXiv preprint arXiv:2402.11592,
-
[40]
Harmony in divergence: Towards fast, accurate, and memory-efficient zeroth-order llm fine-tuning,
Q. Tan, J. Liu, Z. Zhan, C. Ding, Y . Wang, X. Ma, J. Lee, J. Lu, and G. Yuan, “Harmony in divergence: Towards fast, accurate, and memory-efficient zeroth-order llm fine-tuning,”arXiv preprint arXiv:2502.03304,
-
[41]
Z. Mi, Q. Tan, X. Yu, Z. Zhu, G. Yuan, and S. Huang, “Kerzoo: Kernel function informed zeroth-order optimization for accurate and accelerated llm fine-tuning,”arXiv preprint arXiv:2505.18886,
-
[42]
Continual learning and private unlearning,
B. Liu, Q. Liu, and P. Stone, “Continual learning and private unlearning,” inConference on Lifelong Learning Agents. PMLR, 2022, pp. 243–254. R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,”Advances in neural information processing systems, vol. 36,...
work page 2022
-
[43]
Do unlearning methods remove information from language model weights?
A. Deeb and F. Roger, “Do unlearning methods remove information from language model weights?”arXiv preprint arXiv:2410.08827,
-
[44]
Adam: A Method for Stochastic Optimization
[Online]. Available: https://openreview.net/forum?id=lHSeDYamnz D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980,
work page internal anchor Pith review Pith/arXiv arXiv
-
[45]
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
J. Jia, Y . Zhang, Y . Zhang, J. Liu, B. Runwal, J. Diffenderfer, B. Kailkhura, and S. Liu, “SOUL: Unlocking the power of second-order optimization for LLM unlearning,” inProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Miami, Florida, USA: Association for Computational Linguistics, Nov. 2024, pp. 4276–4292. J. Berns...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[46]
Signadam++: Learning confidences for deep neural networks,
D. Wang, Y . Liu, W. Tang, F. Shang, H. Liu, Q. Sun, and L. Jiao, “Signadam++: Learning confidences for deep neural networks,” in2019 International Conference on Data Mining Workshops (ICDMW). IEEE, 2019, pp. 186–195. S. J. Reddi, S. Kale, and S. Kumar, “On the convergence of adam and beyond,” inInternational Conference on Learning Representations,
work page 2019
-
[47]
Certified adversarial robustness via randomized smoothing,
J. Cohen, E. Rosenfeld, and Z. Kolter, “Certified adversarial robustness via randomized smoothing,” ininternational conference on machine learning. PMLR, 2019, pp. 1310–1320. S. Ma and H. Huang, “Revisiting zeroth-order optimization: Minimum-variance two-point estimators and directionally aligned perturbations,” inThe Thirteenth International Conference o...
work page 2019
-
[48]
Refining adaptive zeroth-order optimization at ease,
Y . Shu, Q. Zhang, K. He, and Z. Dai, “Refining adaptive zeroth-order optimization at ease,”arXiv preprint arXiv:2502.01014,
-
[49]
Linear mode connectivity and the lottery ticket hypothesis,
12 Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM UnlearningA PREPRINT J. Frankle, G. K. Dziugaite, D. Roy, and M. Carbin, “Linear mode connectivity and the lottery ticket hypothesis,” in International Conference on Machine Learning. PMLR, 2020, pp. 3259–3269. Y . Qin, C. Qian, J. Yi, W. Chen, Y . Lin, X. Han, Z. Liu, M. Sun, an...
-
[50]
Mechanistic mode connectivity,
E. S. Lubana, E. J. Bigelow, R. P. Dick, D. Krueger, and H. Tanaka, “Mechanistic mode connectivity,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 22 965–23
work page 2023
-
[51]
Llm unlearning reveals a stronger-than-expected coreset effect in current benchmarks,
S. Pal, C. Wang, J. Diffenderfer, B. Kailkhura, and S. Liu, “Llm unlearning reveals a stronger-than-expected coreset effect in current benchmarks,”arXiv preprint arXiv:2504.10185,
-
[52]
Measuring Massive Multitask Language Understanding
D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt, “Measuring massive multitask language understanding,”arXiv preprint arXiv:2009.03300,
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[53]
[Online]. Available: https://zenodo.org/records/12608602 S. Lin, J. Hilton, and O. Evans, “Truthfulqa: Measuring how models mimic human falsehoods,”arXiv preprint arXiv:2109.07958,
-
[54]
HellaSwag: Can a Machine Really Finish Your Sentence?
R. Zellers, A. Holtzman, Y . Bisk, A. Farhadi, and Y . Choi, “Hellaswag: Can a machine really finish your sentence?” arXiv preprint arXiv:1905.07830,
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[55]
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
P. Clark, I. Cowhey, O. Etzioni, T. Khot, A. Sabharwal, C. Schoenick, and O. Tafjord, “Think you have solved question answering? try arc, the ai2 reasoning challenge,”arXiv preprint arXiv:1803.05457,
work page internal anchor Pith review Pith/arXiv arXiv
-
[56]
13 Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM UnlearningA PREPRINT Appendix A Limitations While we conduct comprehensive experiments and in-depth analysis to show the role of optimizers in robust LLM unlearning, certain limitations persist in our study. There are other optimizers we did not include in our study,e.g., the Muo...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.