arXiv preprint arXiv:2502.03304 , year=

Harmony in divergence: Towards fast, accurate, memory-efficient zeroth-order llm fine-tuning , author= · 2025 · arXiv 2502.03304

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning

cs.LG · 2025-10-01 · conditional · novelty 6.0

Downgrading optimizers to lower-information variants during LLM unlearning yields more robust forgetting on MUSE and WMDP benchmarks by converging to harder-to-perturb loss basins.

Position: Zeroth-Order Optimization in Deep Learning Is Underexplored, Not Underpowered

cs.LG · 2026-05-15 · unverdicted · novelty 5.0

Zeroth-order optimization is underexplored rather than underpowered in deep learning, with limitations stemming from full-space designs that can be addressed via subspace, spectral, and systems-aware approaches.

Why Zeroth-Order Adaptation May Forget Less: A Randomized Shaping Theory

cs.LG · 2026-05-11 · unverdicted · novelty 5.0

Norm-matched zeroth-order adaptation preserves the isotropic retention floor while contracting only the anisotropic component, producing a quadratic forgetting gap that favors ZO precisely when the first-order direction has above-average retention curvature.

AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments

cs.LG · 2026-05-01 · unverdicted · novelty 5.0

AdaMeZO adapts Adam moment estimates to zeroth-order LLM fine-tuning without extra memory storage, outperforming MeZO with up to 70% fewer forward passes.

citing papers explorer

Showing 4 of 4 citing papers.

Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning cs.LG · 2025-10-01 · conditional · none · ref 40
Downgrading optimizers to lower-information variants during LLM unlearning yields more robust forgetting on MUSE and WMDP benchmarks by converging to harder-to-perturb loss basins.
Position: Zeroth-Order Optimization in Deep Learning Is Underexplored, Not Underpowered cs.LG · 2026-05-15 · unverdicted · none · ref 10
Zeroth-order optimization is underexplored rather than underpowered in deep learning, with limitations stemming from full-space designs that can be addressed via subspace, spectral, and systems-aware approaches.
Why Zeroth-Order Adaptation May Forget Less: A Randomized Shaping Theory cs.LG · 2026-05-11 · unverdicted · none · ref 26
Norm-matched zeroth-order adaptation preserves the isotropic retention floor while contracting only the anisotropic component, producing a quadratic forgetting gap that favors ZO precisely when the first-order direction has above-average retention curvature.
AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments cs.LG · 2026-05-01 · unverdicted · none · ref 57
AdaMeZO adapts Adam moment estimates to zeroth-order LLM fine-tuning without extra memory storage, outperforming MeZO with up to 70% fewer forward passes.

arXiv preprint arXiv:2502.03304 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer