Three Birds, One Stone: Solving the Communication-Memory-Privacy Trilemma in LLM Fine-tuning Over Wireless Networks with Zeroth-Order Optimization
Pith reviewed 2026-05-10 14:41 UTC · model grok-4.3
The pith
pAirZero uses zeroth-order optimization and over-the-air computation to solve the communication-memory-privacy trilemma in wireless federated LLM fine-tuning with low overhead and consistent privacy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
pAirZero enables resource-constrained devices to submit their local gradient with only bit-level communication loads while participating in federated fine-tuning of LLMs with inference-level memory costs. This approach not only eliminates the high memory requirements needed for LLM fine-tuning but also alleviates the strict synchronization requirements that plague conventional OTA methods.
Load-bearing premise
That zeroth-order optimization can achieve acceptable fine-tuning performance for LLMs without access to first-order gradients, and that the formulated optimization model for transmit power and noise can guarantee consistent privacy protection across varying channel conditions.
read the original abstract
Federated Learning (FL) offers a promising pathway for collaboratively fine-tuning Large Language Models (LLMs) at the edge; however, this paradigm faces a critical bottleneck: the prohibitive communication and memory overheads incurred by exchanging high-dimensional gradients. Furthermore, recent studies reveal that user training data can still be recovered from these local gradients, undermining the core privacy promise of FL. In this paper, we address this trilemma of communication, memory, and privacy by proposing pAirZero, a novel framework that synergizes Zeroth-Order (ZO) optimization with Over-the-Air (OTA) computation. Uniquely, pAirZero enables resource-constrained devices to submit their local gradient with only bit-level communication loads while participating in federated fine-tuning of LLMs with inference-level memory costs. This approach not only eliminates the high memory requirements needed for LLM fine-tuning but also alleviates the strict synchronization requirements that plague conventional OTA methods. We further formulate a rigorous optimization model to adaptively determine the optimal transmit power and noise levels, ensuring consistent privacy protection regardless of channel conditions. Numerical experiments demonstrate the superiority of pAirZero in enabling secure, efficient LLM fine-tuning over wireless networks, with only 25% peak memory cost on OPT-125M and communication load orders of magnitude lower than conventional methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes pAirZero, a framework that combines zeroth-order (ZO) optimization with over-the-air (OTA) computation for federated fine-tuning of LLMs over wireless networks. It claims to resolve the communication-memory-privacy trilemma by enabling bit-level communication loads for gradient submission, inference-level memory costs on devices, and consistent privacy via an optimization model that adaptively sets transmit power and noise levels independent of channel conditions. Numerical experiments on OPT-125M reportedly achieve 25% peak memory usage and orders-of-magnitude lower communication than conventional methods.
Significance. If the ZO-based updates deliver competitive fine-tuning performance and the privacy optimization holds under realistic conditions, this would enable practical federated LLM adaptation on resource-constrained edge devices, reducing both the memory barrier of backpropagation and the synchronization/privacy vulnerabilities of standard OTA FL. The explicit power/noise scheduler for channel-independent privacy is a potentially valuable technical contribution if the derivation is complete.
major comments (2)
- [Numerical Experiments] Numerical Experiments section: the reported results emphasize memory (25% peak) and communication reductions but provide no perplexity, accuracy, or iteration-count comparisons against first-order baselines; without these, it is impossible to determine whether the linear growth in ZO gradient variance with model dimension negates the claimed resource gains for LLM fine-tuning.
- [Optimization model] Optimization model (presumably §4 or equivalent): the claim that the formulated transmit-power and noise scheduler guarantees consistent privacy 'regardless of channel conditions' requires explicit statement of the threat model, channel statistics, and any assumptions on adversarial knowledge; without these, the privacy guarantee cannot be verified as load-bearing for the trilemma solution.
minor comments (1)
- [Abstract] The abstract and introduction should explicitly state the largest model scale tested and the number of local ZO queries per update, as these directly affect the practicality claims.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the contributions and limitations of our work. We respond to each major comment below and indicate the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: Numerical Experiments section: the reported results emphasize memory (25% peak) and communication reductions but provide no perplexity, accuracy, or iteration-count comparisons against first-order baselines; without these, it is impossible to determine whether the linear growth in ZO gradient variance with model dimension negates the claimed resource gains for LLM fine-tuning.
Authors: We agree that direct comparisons of fine-tuning performance are necessary to evaluate whether ZO variance growth undermines the resource advantages. Our experiments prioritize demonstrating the memory and communication reductions achievable with inference-level costs and bit-level uploads. In the revised manuscript, we will add perplexity, accuracy, and iteration-count results for OPT-125M against first-order baselines under identical tasks and wireless settings. This will allow readers to assess whether the trilemma solution preserves competitive convergence despite the known dimension-dependent variance of ZO estimators. revision: yes
-
Referee: Optimization model (presumably §4 or equivalent): the claim that the formulated transmit-power and noise scheduler guarantees consistent privacy 'regardless of channel conditions' requires explicit statement of the threat model, channel statistics, and any assumptions on adversarial knowledge; without these, the privacy guarantee cannot be verified as load-bearing for the trilemma solution.
Authors: The optimization in Section 4 is formulated to achieve channel-independent privacy by solving for transmit power and artificial noise under a worst-case channel realization drawn from a known distribution. We will revise the manuscript to explicitly state the threat model (passive eavesdropper observing only the aggregated OTA signal), the channel statistics (i.i.d. Rayleigh fading with known distribution but unknown instantaneous realizations at the scheduler), and the assumption that the adversary knows the optimization parameters but not per-device channels. These clarifications will make the privacy guarantee verifiable while preserving the claim that privacy holds independently of instantaneous channel conditions. revision: yes
Circularity Check
No circularity detected; claims rest on proposed ZO+OTA framework without self-referential reductions
full rationale
The abstract and available claims introduce pAirZero as a novel combination of zeroth-order optimization and over-the-air computation to address the trilemma, with a formulated optimization model for transmit power and noise. No equations, derivations, or self-citations are exhibited that reduce any prediction or result to its own inputs by construction, such as fitting a parameter and renaming it a prediction or smuggling an ansatz via prior self-work. The memory/communication reductions and privacy guarantees are asserted as outcomes of the framework rather than tautological. Per hard rules, absent specific quoted reductions in the provided text, the derivation chain is treated as self-contained.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.