GRZO: Group-Relative Zeroth-Order Optimization for Large Language Model Fine-Tuning

Liyan Tan; Ruijie Zhang; Xinling Yu; Yequan Zhao; Yifan Yang; Zheng Zhang

arxiv: 2606.02857 · v1 · pith:FANGPAH4new · submitted 2026-06-01 · 💻 cs.LG · cs.AI

GRZO: Group-Relative Zeroth-Order Optimization for Large Language Model Fine-Tuning

Liyan Tan , Yequan Zhao , Yifan Yang , Ruijie Zhang , Xinling Yu , Zheng Zhang This is my paper

classification 💻 cs.LG cs.AI

keywords grzogroup-relativemezozeroth-orderaveragebatchfine-tuninglanguage

0 comments

read the original abstract

Zeroth-order (ZO) optimization is a memory-efficient alternative to backpropagation for fine-tuning large language models, but its deployment is limited by the high variance of gradient estimation. We propose GRZO, a Group-Relative Zeroth-Order optimizer that draws one pseudo-independent perturbation per mini-batch example and aggregates the per-example losses through group-relative normalization, raising the effective gradient-direction count from one to the batch size at no additional forward cost while preserving inference-level memory. We prove that GRZO is directionally unbiased with variance shrinking proportionally to the batch size, yielding a tighter nonconvex convergence bound than MeZO. Across RoBERTa-large, Llama3-8B, and OPT-13B over multiple tasks, GRZO improves average accuracy on Llama3-8B by $+3.0$ over MeZO at $23\%$ lower peak GPU memory; as a drop-in replacement for the MeZO core, it lifts sparse, low-rank, and quantized ZO variants by $+6.0$ on average.

This paper has not been read by Pith yet.

GRZO: Group-Relative Zeroth-Order Optimization for Large Language Model Fine-Tuning

discussion (0)