pith. machine review for the scientific record. sign in

arxiv: 2605.09813 · v1 · submitted 2026-05-10 · 💻 cs.NI · cs.DC· cs.LG· cs.SY· eess.SY

Recognition: 2 theorem links

· Lean Theorem

Optimizing Server Placement for Vertical Federated Learning in Dynamic Edge/Fog Networks

H. Vincent Poor, Mung Chiang, Su Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:38 UTC · model grok-4.3

classification 💻 cs.NI cs.DCcs.LGcs.SYeess.SY
keywords vertical federated learningdynamic edge networksserver placementresource optimizationmixed-integer signomial programedge computingfederated learningmachine learning
0
0 comments X

The pith

Server-controlled vertical federated learning establishes stationary points each round to jointly optimize placement, power, frequency, and iterations in dynamic edge networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops SC-DN for vertical federated learning in networks where devices hold separate data features that can permanently enter or exit. It proves a global first-order stationary point exists for every training round and uses the result to optimize server placement, device transmit power, processor frequency, and local iterations together. This joint control reduces resource use while preserving model quality. A reader would care because heterogeneous and changing edge devices make standard vertical federated learning costly, and the method claims measurable gains in performance and efficiency on image and multi-modal data over greedy baselines.

Core claim

In dynamic edge/fog networks with heterogeneous data features, the SC-DN methodology establishes the existence of a global first-order stationary point for every global round of vertical federated learning, then formulates a joint optimization over server placement, device-to-server transmit power, local device processor frequency, and local training iterations per round as a mixed-integer signomial program, for which a general solver is developed, yielding superior classification and regression performance with lower resource consumption than greedy approaches.

What carries the argument

The existence proof for a global first-order stationary point per round, which enables the mixed-integer signomial program that couples server placement with transmit power, processor frequency, and local iterations for joint optimization.

Load-bearing premise

The dynamic network model of permanent feature entry and exit, together with the guaranteed global first-order stationary point each round, continues to hold under realistic hardware heterogeneity and channel variations.

What would settle it

A controlled experiment in which devices change features unpredictably while running the SC-DN solver, then checking whether the stationary-point condition fails or the reported performance and resource gains disappear relative to baselines.

Figures

Figures reproduced from arXiv: 2605.09813 by H. Vincent Poor, Mung Chiang, Su Wang.

Figure 8
Figure 8. Figure 8: The first two figures, Fig. 8a) and Fig. 8b), primarily focus on how the scaling of ψ G and ψ P influences the ML convergence aspects of (P r ), while the last two figures, Fig. 8c) and Fig. 8d), investigate the sensitivity of device-to￾server transmission and server movement energies to changes [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 31
Figure 31. Figure 31: 0 10 20 30 40 50 Network Devices (N) 0 20 40 60 80 100 120 Average Total Runtime (s) Optimization Wall-Clock Runtime Linear fit: 2.47N - 1.80 Measured (mean ± std) FIGURE 33: Wall-clock runtime of the per-round optimization solver (i.e., Algorithm 3) as a function of the number of network devices, N. Points denote the mean ± one standard deviation over 10 independent runs. The dashed line shows a linear f… view at source ↗
read the original abstract

We investigate the control and optimization of vertical federated learning (VFL), a class of distributed machine learning (ML) methods in which edge/fog devices contain separate data features, in dynamic edge/fog networks. Owing to heterogeneous data features and hardware across edge/fog networks, devices' contributions to VFL vary substantially, and, moreover, dynamic edge/fog networks can lead to the permanent exit or entry of select data features. In this setting, our proposed methodology, server controlled VFL in dynamic networks (SC-DN), first establishes the existence of a global first-order stationary point for every global round, and then leverages this result to jointly optimize ML model training and resource consumption based on four key control variables: (i) server placement, (ii) device-to-server transmit power, (iii) local device processor frequency, and (iv) local training iterations per global round. The resulting optimization formulation contains coupled variables as well as numerous forms of logarithmic constraints which we show is a mixed-integer signomial program, an NP-hard problem, and for which we develop a general solver. Finally, via experiments on both image and multi-modal datasets, we show that our methodology demonstrates superior classification/regression performance and resource consumption savings than even greedy methodologies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes SC-DN for vertical federated learning in dynamic edge/fog networks with heterogeneous devices and permanent feature entry/exit. It claims to establish existence of a global first-order stationary point for every global round, then jointly optimizes server placement, device-to-server transmit power, local processor frequency, and local training iterations per round. The resulting problem is cast as a mixed-integer signomial program (NP-hard) for which a general solver is developed. Experiments on image and multi-modal datasets report superior classification/regression accuracy and resource savings versus greedy baselines.

Significance. If the stationary-point result is valid under feature dynamics and the solver yields reliable solutions, the work could enable more practical and efficient VFL deployments in resource-constrained, time-varying edge environments by explicitly trading off model performance against communication and computation costs.

major comments (1)
  1. [Theoretical Analysis section] Theoretical Analysis section: the existence proof for a global first-order stationary point per round must explicitly accommodate permanent feature entry/exit, which alters the loss landscape, gradient structure, input dimension, and any Lipschitz/smoothness constants. If the argument treats the feature set or model dimension as fixed across rounds, the lemma does not carry over to the claimed dynamic setting and the subsequent optimization rests on an invalid premise.
minor comments (2)
  1. [Abstract and Experiments] Abstract and experimental sections: dataset sizes, number of runs, and error bars (or statistical significance) for the reported performance and resource gains are not provided, preventing verification of the claimed superiority.
  2. [Experiments section] Experiments section: comparisons are restricted to greedy methodologies; additional baselines (static placement, other heuristics, or non-optimized VFL) would better substantiate the gains from the joint optimization.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments on our manuscript. We address the major comment regarding the theoretical analysis below, and we believe the concerns can be resolved with clarifications and minor revisions.

read point-by-point responses
  1. Referee: [Theoretical Analysis section] Theoretical Analysis section: the existence proof for a global first-order stationary point per round must explicitly accommodate permanent feature entry/exit, which alters the loss landscape, gradient structure, input dimension, and any Lipschitz/smoothness constants. If the argument treats the feature set or model dimension as fixed across rounds, the lemma does not carry over to the claimed dynamic setting and the subsequent optimization rests on an invalid premise.

    Authors: The proof establishes the existence of a global first-order stationary point for each global round separately, with the feature set, model dimension, and associated constants (such as Lipschitz and smoothness) fixed within that round. Permanent feature entry/exit occurs between rounds, changing the landscape for the next round, but the per-round analysis holds for the current configuration. The subsequent optimization is performed per round using the current variables. To make this explicit, we will revise the Theoretical Analysis section to include a statement clarifying the per-round independence and the handling of dynamic feature sets. This addresses the concern without invalidating the premise. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper claims to first establish existence of a global first-order stationary point per round as an independent lemma, then leverage it for joint optimization of server placement, power, frequency, and iterations via the mixed-integer signomial program. No quoted equations or self-citations reduce the stationary-point result to a definition in terms of the optimized variables, fitted inputs renamed as predictions, or a closed self-citation loop. The dynamic feature entry/exit is modeled explicitly in the problem statement without the proof assuming fixed dimensions that would make the result tautological. The derivation chain is therefore self-contained and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review prevents enumeration of specific free parameters or axioms; no invented entities are mentioned.

pith-pipeline@v0.9.0 · 5537 in / 1211 out tokens · 57099 ms · 2026-05-12T02:38:00.684298+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    first establishes the existence of a global first-order stationary point for every global round, and then leverages this result to jointly optimize ML model training and resource consumption based on four key control variables: (i) server placement, (ii) device-to-server transmit power, (iii) local device processor frequency, and (iv) local training iterations per global round

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Assumption 1 (Smoothness). The gradients for loss functions ℓ(·) are Lipschitz continuous... ∥∇ℓ(Θ(r)₁)−∇ℓ(Θ(r)₂)∥≤Lʳ∥Θ(r)₁−Θ(r)₂∥

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

83 extracted references · 83 canonical work pages

  1. [1]

    Communication-efficient learning of deep networks from decentralized data,

    B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inProceedings of the 20th International Conference on Artificial Intelligence and Statistics. PMLR, 2017, pp. 1273–1282

  2. [2]

    Federated machine learning: Concept and applications,

    Q. Yang, Y . Liu, T. Chen, and Y . Tong, “Federated machine learning: Concept and applications,”ACM Transactions on Intelligent Systems and Technology (TIST), vol. 10, no. 2, pp. 1–19, 2019

  3. [3]

    Adaptive federated learning in resource constrained edge com- puting systems,

    S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and K. Chan, “Adaptive federated learning in resource constrained edge com- puting systems,”IEEE Journal on Selected Areas in Communications, vol. 37, no. 6, pp. 1205–1221, 2019

  4. [4]

    Vertical federated learning: Concepts, advances, and challenges,

    Y . Liu, Y . Kang, T. Zou, Y . Pu, Y . He, X. Ye, Y . Ouyang, Y .-Q. Zhang, and Q. Yang, “Vertical federated learning: Concepts, advances, and challenges,”IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 7, pp. 3615–3634, 2024

  5. [5]

    Fault-tolerant vertical federated learning on dynamic networks,

    S. Ganguli, Z. Zhou, C. G. Brinton, and D. I. Inouye, “Fault-tolerant vertical federated learning on dynamic networks,” arXiv:2312.16638, 2023

  6. [6]

    Flexible vertical federated learning with heterogeneous parties,

    T. Castiglia, S. Wang, and S. Patterson, “Flexible vertical federated learning with heterogeneous parties,”IEEE Transactions on Neural Networks and Learning Systems, 2023, to appear

  7. [7]

    Attribute-distributed learning: Models, limits, and algorithms,

    H. Zheng, S. R. Kulkarni, and H. V . Poor, “Attribute-distributed learning: Models, limits, and algorithms,”IEEE Transactions on Signal Process- ing, vol. 59, no. 1, pp. 386–398, 2010

  8. [8]

    Toward cooperative federated learning over heterogeneous edge/fog networks,

    S. Wang, S. Hosseinalipour, V . Aggarwal, C. G. Brinton, D. J. Love, W. Su, and M. Chiang, “Toward cooperative federated learning over heterogeneous edge/fog networks,”IEEE Communications Magazine, vol. 61, no. 12, pp. 54–60, 2023

  9. [9]

    Asyn- chronous multi-model dynamic federated learning over wireless net- works: Theory, modeling, and optimization,

    Z.-L. Chang, S. Hosseinalipour, M. Chiang, and C. G. Brinton, “Asyn- chronous multi-model dynamic federated learning over wireless net- works: Theory, modeling, and optimization,”IEEE Transactions on Cognitive Communications and Networking, 2024, to appear

  10. [10]

    Communication-efficient multimodal federated learning: Joint modality and client selection,

    L. Yuan, D.-J. Han, S. Wang, D. Upadhyay, and C. G. Brinton, “Communication-efficient multimodal federated learning: Joint modality and client selection,” arXiv:2401.16685, 2024

  11. [11]

    Adaptive vertical federated learning on unbalanced features,

    J. Zhang, S. Guo, Z. Qu, D. Zeng, H. Wang, Q. Liu, and A. Y . Zomaya, “Adaptive vertical federated learning on unbalanced features,”IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 12, pp. 4006–4018, 2022

  12. [12]

    A unified solution for privacy and communication efficiency in vertical federated learning,

    G. Wang, B. Gu, Q. Zhang, X. Li, B. Wang, and C. X. Ling, “A unified solution for privacy and communication efficiency in vertical federated learning,”Advances in Neural Information Processing Systems, vol. 36, 2024

  13. [13]

    Compressed-vfl: Communication-efficient learning with vertically partitioned data,

    T. J. Castiglia, A. Das, S. Wang, and S. Patterson, “Compressed-vfl: Communication-efficient learning with vertically partitioned data,” in Proceedings of the 39th International Conference on Machine Learning. PMLR, 2022, pp. 2738–2766

  14. [14]

    6G internet of things: A comprehensive survey,

    D. C. Nguyen, M. Ding, P. N. Pathirana, A. Seneviratne, J. Li, D. Niyato, O. Dobre, and H. V . Poor, “6G internet of things: A comprehensive survey,”IEEE Internet of Things Journal, vol. 9, no. 1, pp. 359–383, 2021

  15. [15]

    UA V-assisted online machine learning over multi-tiered networks: A hierarchical nested personalized federated learning approach,

    S. Wang, S. Hosseinalipour, M. Gorlatova, C. G. Brinton, and M. Chi- ang, “UA V-assisted online machine learning over multi-tiered networks: A hierarchical nested personalized federated learning approach,”IEEE Transactions on Network and Service Management, vol. 20, no. 2, pp. 1847–1865, 2022

  16. [16]

    Towards flexi- ble device participation in federated learning,

    Y . Ruan, X. Zhang, S.-C. Liang, and C. Joe-Wong, “Towards flexi- ble device participation in federated learning,” inProceedings of The 24th International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 3403–3411

  17. [17]

    Forest fire detection system using wireless sensor networks and machine learning,

    U. Dampage, L. Bandaranayake, R. Wanasinghe, K. Kottahachchi, and B. Jayasanka, “Forest fire detection system using wireless sensor networks and machine learning,”Scientific Reports, vol. 12, no. 1, p. 46, 2022

  18. [18]

    Wireless sensing networks for environmental monitoring: Two case studies from tropical forests,

    C. Rankine, M. M. do Espirito Santo, R. Fatland, M. Garciaet al., “Wireless sensing networks for environmental monitoring: Two case studies from tropical forests,” inProceedings of the Seventh IEEE International Conference on eScience. IEEE, 2011, pp. 70–76

  19. [19]

    Opportunities and challenges of wireless sensor networks in smart grid,

    V . C. Gungor, B. Lu, and G. P. Hancke, “Opportunities and challenges of wireless sensor networks in smart grid,”IEEE Transactions on Industrial Electronics, vol. 57, no. 10, pp. 3557–3564, 2010

  20. [20]

    Toward resilient modern power systems: From single-domain to cross-domain resilience enhancement,

    H. Huang, H. V . Poor, K. R. Davis, T. J. Overbye, A. Layton, A. E. Goulart, and S. Zonouz, “Toward resilient modern power systems: From single-domain to cross-domain resilience enhancement,”Proceedings of the IEEE, vol. 112, no. 4, pp. 365–398, 2024

  21. [21]

    Multi-source to multi- target decentralized federated domain adaptation,

    S. Wang, S. Hosseinalipour, and C. G. Brinton, “Multi-source to multi- target decentralized federated domain adaptation,”IEEE Transactions on Cognitive Communications and Networking, no. 3, pp. 1011–1025, 2024

  22. [22]

    Taming subnet-drift in d2d- enabled fog learning: A hierarchical gradient tracking approach,

    E. Chen, S. Wang, and C. G. Brinton, “Taming subnet-drift in d2d- enabled fog learning: A hierarchical gradient tracking approach,” inPro- ceedings of the 2024 IEEE Conference on Computer Communications. IEEE, 2024, pp. 2438–2447

  23. [23]

    Efficient coordination of federated learning and inference offloading at the edge: A proactive optimization paradigm,

    K. Luo, K. Zhao, T. Ouyang, X. Zhang, Z. Zhou, H. Wang, and X. Chen, “Efficient coordination of federated learning and inference offloading at the edge: A proactive optimization paradigm,”IEEE Transactions on Mobile Computing, 2024, to appear

  24. [24]

    Federated learning over wireless networks: Optimization model design and analysis,

    N. H. Tran, W. Bao, A. Zomaya, M. N. Nguyen, and C. S. Hong, “Federated learning over wireless networks: Optimization model design and analysis,” inProceedings of the 2019 IEEE Conference on Computer Communications. IEEE, 2019, pp. 1387–1395. 20

  25. [25]

    Federated learning meets multi-objective optimization,

    Z. Hu, K. Shaloudegi, G. Zhang, and Y . Yu, “Federated learning meets multi-objective optimization,”IEEE Transactions on Network Science and Engineering, vol. 9, no. 4, pp. 2039–2051, 2022

  26. [26]

    Min-max cost optimization for efficient hierarchical federated learning in wireless edge networks,

    J. Feng, L. Liu, Q. Pei, and K. Li, “Min-max cost optimization for efficient hierarchical federated learning in wireless edge networks,”IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 11, pp. 2687–2700, 2021

  27. [27]

    Incentives in federated learning: Equilibria, dynamics, and mechanisms for welfare maximization,

    A. Murhekar, Z. Yuan, B. Ray Chaudhury, B. Li, and R. Mehta, “Incentives in federated learning: Equilibria, dynamics, and mechanisms for welfare maximization,”Advances in Neural Information Processing Systems, vol. 36, 2024

  28. [28]

    Network-aware optimization of distributed learning for fog computing,

    S. Wang, Y . Ruan, Y . Tu, S. Wagle, C. G. Brinton, and C. Joe-Wong, “Network-aware optimization of distributed learning for fog computing,” IEEE/ACM Transactions on Networking, vol. 29, no. 5, pp. 2019–2032, 2021

  29. [29]

    Anarchic federated learning,

    H. Yang, X. Zhang, P. Khanduri, and J. Liu, “Anarchic federated learning,” inProceedings of the 39th International Conference on Machine Learning. PMLR, 2022, pp. 25 331–25 363

  30. [30]

    Stochastic client selection for federated learning with volatile clients,

    T. Huang, W. Lin, L. Shen, K. Li, and A. Y . Zomaya, “Stochastic client selection for federated learning with volatile clients,”IEEE Internet of Things Journal, vol. 9, no. 20, pp. 20 055–20 070, 2022

  31. [31]

    Fast federated learning in the presence of arbitrary device unavailability,

    X. Gu, K. Huang, J. Zhang, and L. Huang, “Fast federated learning in the presence of arbitrary device unavailability,”Advances in Neural Information Processing Systems, vol. 34, pp. 12 052–12 064, 2021

  32. [32]

    Ro- bust federated learning with connectivity failures: A semi-decentralized framework with collaborative relaying,

    M. Yemini, R. Saha, E. Ozfatura, D. G ¨und¨uz, and A. J. Goldsmith, “Ro- bust federated learning with connectivity failures: A semi-decentralized framework with collaborative relaying,” arXiv:2202.11850, 2022

  33. [33]

    Communication efficient distributed learning with feature partitioned data,

    B. Zhang, J. Geng, W. Xu, and L. Lai, “Communication efficient distributed learning with feature partitioned data,” inProceedings of the 52nd Annual Conference on Information Sciences and Systems (CISS). IEEE, 2018, pp. 1–6

  34. [34]

    VF-PS: How to select important participants in vertical feder- ated learning, efficiently and securely?

    J. Jiang, L. Burkhalter, F. Fu, B. Ding, B. Du, A. Hithnawi, B. Li, and C. Zhang, “VF-PS: How to select important participants in vertical feder- ated learning, efficiently and securely?”Advances in Neural Information Processing Systems, vol. 35, pp. 2088–2101, 2022

  35. [35]

    LESS-VFL: Communication-efficient feature selection for vertical federated learning,

    T. Castiglia, Y . Zhou, S. Wang, S. Kadhe, N. Baracaldo, and S. Patter- son, “LESS-VFL: Communication-efficient feature selection for vertical federated learning,” inProceedings of the 40th International Conference on Machine Learning. PMLR, 2023, pp. 3757–3781

  36. [36]

    Fedsdg-fs: Efficient and secure feature selection for vertical federated learning,

    A. Li, H. Peng, L. Zhang, J. Huang, Q. Guo, H. Yu, and Y . Liu, “Fedsdg-fs: Efficient and secure feature selection for vertical federated learning,” inProceedings of the 2023 IEEE Conference on Computer Communications. IEEE, 2023, pp. 1–10

  37. [37]

    V AFL: A method of vertical asynchronous federated learning,

    T. Chen, X. Jin, Y . Sun, and W. Yin, “V AFL: A method of vertical asynchronous federated learning,” inProceedings of the 2020 ICML Workshop on Federated Learning for User Privacy and Data Confiden- tiality, July 2020

  38. [38]

    Learning multiple layers of features from tiny images,

    A. Krizhevsky, G. Hintonet al., “Learning multiple layers of features from tiny images,” 2009

  39. [39]

    Distributed lifetime optimization in wireless sensor networks,

    J. M. Bahi, M. Haddad, M. Hakem, and H. Kheddouci, “Distributed lifetime optimization in wireless sensor networks,” inProceedings of the 2011 IEEE International Conference on High Performance Computing and Communications. IEEE, 2011, pp. 432–439

  40. [40]

    Failure data analysis with extended weibull distri- bution,

    T. Zhang and M. Xie, “Failure data analysis with extended weibull distri- bution,”Communications in Statistics—Simulation and Computation®, vol. 36, no. 3, pp. 579–592, 2007

  41. [41]

    A primer on spatial modeling and analysis in wireless networks,

    J. G. Andrews, R. K. Ganti, M. Haenggi, N. Jindal, and S. Weber, “A primer on spatial modeling and analysis in wireless networks,”IEEE Communications Magazine, vol. 48, no. 11, pp. 156–163, 2010

  42. [42]

    A survey of air-to-ground propagation channel modeling for unmanned aerial vehicles,

    W. Khawaja, I. Guvenc, D. W. Matolak, U.-C. Fiebig, and N. Schneck- enburger, “A survey of air-to-ground propagation channel modeling for unmanned aerial vehicles,”IEEE Communications Surveys & Tutorials, vol. 21, no. 3, pp. 2361–2391, 2019

  43. [43]

    Uwb air-to-ground propagation channel measurements and modeling using uavs,

    W. Khawaja, O. Ozdemir, F. Erden, I. Guvenc, and D. W. Matolak, “Uwb air-to-ground propagation channel measurements and modeling using uavs,” inProceedings of the 2019 IEEE Aerospace Conference. IEEE, 2019, pp. 1–10

  44. [44]

    Cellular uav-to-x com- munications: Design and optimization for multi-uav networks,

    S. Zhang, H. Zhang, B. Di, and L. Song, “Cellular uav-to-x com- munications: Design and optimization for multi-uav networks,”IEEE Transactions on Wireless Communications, vol. 18, no. 2, pp. 1346– 1359, 2019

  45. [45]

    Modeling air-to- ground path loss for low altitude platforms in urban environments,

    A. Al-Hourani, S. Kandeepan, and A. Jamalipour, “Modeling air-to- ground path loss for low altitude platforms in urban environments,” in Proceedings of the 2014 IEEE Global Communuications Conference. IEEE, 2014, pp. 2898–2904

  46. [46]

    Mobile unmanned aerial vehicles (uavs) for energy-efficient internet of things communica- tions,

    M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Mobile unmanned aerial vehicles (uavs) for energy-efficient internet of things communica- tions,”IEEE Transactions on Wireless Communications, vol. 16, no. 11, pp. 7574–7589, 2017

  47. [47]

    Parallel coordinate descent methods for big data optimization,

    P. Richt ´arik and M. Tak ´aˇc, “Parallel coordinate descent methods for big data optimization,”Mathematical Programming, vol. 156, pp. 433–484, 2016

  48. [48]

    Efficiency of coordinate descent methods on huge-scale optimization problems,

    Y . Nesterov, “Efficiency of coordinate descent methods on huge-scale optimization problems,”SIAM Journal on Optimization, vol. 22, no. 2, pp. 341–362, 2012

  49. [49]

    Tackling the objective inconsistency problem in heterogeneous federated optimiza- tion,

    J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V . Poor, “Tackling the objective inconsistency problem in heterogeneous federated optimiza- tion,”Advances in Neural Information Processing Systems, vol. 33, pp. 7611–7623, 2020

  50. [50]

    A proximal stochastic gradient method with progressive variance reduction,

    L. Xiao and T. Zhang, “A proximal stochastic gradient method with progressive variance reduction,”SIAM Journal on Optimization, vol. 24, no. 4, pp. 2057–2075, 2014

  51. [51]

    On the importance of initialization and momentum in deep learning,

    I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning,” inProceedings of the 30th International Conference on Machine Learning. PMLR, 2013, pp. 1139–1147

  52. [52]

    On the lambert w function,

    R. M. Corless, G. H. Gonnet, D. E. Hare, D. J. Jeffrey, and D. E. Knuth, “On the lambert w function,”Advances in Computational Mathematics, vol. 5, pp. 329–359, 1996

  53. [53]

    Implicit regularization for deep neural networks driven by an ornstein-uhlenbeck like process,

    G. Blanc, N. Gupta, G. Valiant, and P. Valiant, “Implicit regularization for deep neural networks driven by an ornstein-uhlenbeck like process,” inProceedings of the Thirty Third Conference on Learning Theory. PMLR, 2020, pp. 483–513

  54. [54]

    How to escape sharp minima with random perturbations,

    K. Ahn, A. Jadbabaie, and S. Sra, “How to escape sharp minima with random perturbations,” inProceedings of the 41st International Conference on Machine Learning. PMLR, 2024, pp. 597–618

  55. [55]

    Spectral normal- ization for generative adversarial networks,

    T. Miyato, T. Kataoka, M. Koyama, and Y . Yoshida, “Spectral normal- ization for generative adversarial networks,” inProceedings of the Sixth International Conference on Learning Representations, 2018

  56. [56]

    Improving lipschitz-constrained neural networks by learning activation functions,

    S. Ducotterd, A. Goujon, P. Bohra, D. Perdios, S. Neumayer, and M. Unser, “Improving lipschitz-constrained neural networks by learning activation functions,”Journal of Machine Learning Research, vol. 25, no. 65, pp. 1–30, 2024

  57. [57]

    arXiv preprint arXiv:1704.00805 , year=

    B. Gao and L. Pavel, “On the properties of the softmax function with ap- plication in game theory and reinforcement learning,” arXiv:1704.00805, 2017

  58. [58]

    On lipschitz bounds of general con- volutional neural networks,

    D. Zou, R. Balan, and M. Singh, “On lipschitz bounds of general con- volutional neural networks,”IEEE Transactions on Information Theory, vol. 66, no. 3, pp. 1738–1759, 2019

  59. [59]

    Federated learning over wireless networks: Convergence analysis and resource allocation,

    C. T. Dinh, N. H. Tran, M. N. Nguyen, C. S. Hong, W. Bao, A. Y . Zomaya, and V . Gramoli, “Federated learning over wireless networks: Convergence analysis and resource allocation,”IEEE/ACM Transactions on Networking, vol. 29, no. 1, pp. 398–409, 2020

  60. [60]

    Estimating training compute of deep learning models,

    J. Sevilla, L. Heim, M. Hobbhahn, T. Besiroglu, A. Ho, and P. Villalobos, “Estimating training compute of deep learning models,” 2022. [Online]. Available: https://epochai.org/blog/estimating-training-compute

  61. [61]

    Ai and compute,

    D. Amodei and D. Hernandez, “Ai and compute,” 2018. [Online]. Available: https://openai.com/research/ai-and-compute

  62. [62]

    M. L. Salby,Fundamentals of Atmospheric Physics. Elsevier, 1996

  63. [63]

    F. A. Administration,Aviation Weather Services. Aviation Supplies & Academics, 2001

  64. [64]

    Non-linear model predictive control for UA Vs with slung/swung load,

    F. Gonzalez, A. Heckmann, S. Notter, M. Zurn, J. Trachte, and A. Mcfadyen, “Non-linear model predictive control for UA Vs with slung/swung load,” inProceedings of the 2015 IEEE International Conference on Robotics and Automation, 2015, pp. 1–1

  65. [65]

    Energy-efficient UA V communication with tra- jectory optimization,

    Y . Zeng and R. Zhang, “Energy-efficient UA V communication with tra- jectory optimization,”IEEE Transactions on Wireless Communications, vol. 16, no. 6, pp. 3747–3760, 2017

  66. [66]

    Energy minimization for wireless communication with rotary-wing UA V,

    Y . Zeng, J. Xu, and R. Zhang, “Energy minimization for wireless communication with rotary-wing UA V,”IEEE Transactions on Wireless Communications, vol. 18, no. 4, pp. 2329–2345, 2019

  67. [67]

    CVXPY: A Python-embedded modeling lan- guage for convex optimization,

    S. Diamond and S. Boyd, “CVXPY: A Python-embedded modeling lan- guage for convex optimization,”Journal of Machine Learning Research, vol. 17, no. 83, pp. 1–5, 2016

  68. [68]

    Geometric programming for communication systems,

    M. Chiang, “Geometric programming for communication systems,” Foundations and Trends® in Communications and Information Theory, vol. 2, no. 1–2, pp. 1–154, 2005

  69. [69]

    Disciplined geometric program- ming,

    A. Agrawal, S. Diamond, and S. Boyd, “Disciplined geometric program- ming,”Optimization Letters, vol. 13, no. 5, pp. 961–976, 2019

  70. [70]

    Global optimization of signomial geometric programming problems,

    G. Xu, “Global optimization of signomial geometric programming problems,”European Journal of Operational Research, vol. 233, no. 3, pp. 500–510, 2014. 21

  71. [71]

    Reversed geometric programs treated by harmonic means,

    R. J. Duffin and E. L. Peterson, “Reversed geometric programs treated by harmonic means,”Indiana University Mathematics Journal, vol. 22, no. 6, pp. 531–550, 1972

  72. [72]

    Some bounds for the logarithmic function,

    F. TOPSØE 1, “Some bounds for the logarithmic function,”Inequality Theory and Applications, vol. 4, p. 137, 2007

  73. [73]

    Boyd and L

    S. Boyd and L. Vandenberghe,Convex Optimization. Cambridge University Press, 2004

  74. [74]

    The MNIST database of handwritten digit images for machine learning research,

    L. Deng, “The MNIST database of handwritten digit images for machine learning research,”IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141–142, 2012

  75. [75]

    Petfinder.my - pawpularity contest,

    A. Howard, M. Jedi, and R. Holbrook, “Petfinder.my - pawpularity contest,” https://kaggle.com/competitions/petfinder-pawpularity-score, 2021, kaggle

  76. [76]

    Imagenet classification with deep convolutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,”Advances in Neural Informa- tion Processing Systems, vol. 25, 2012

  77. [77]

    Loss functions for top-k error: Analysis and insights,

    M. Lapin, M. Hein, and B. Schiele, “Loss functions for top-k error: Analysis and insights,” inProceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016, pp. 1468–1477

  78. [78]

    Efficient algorithms for capacitated cloudlet placements,

    Z. Xu, W. Liang, W. Xu, M. Jia, and S. Guo, “Efficient algorithms for capacitated cloudlet placements,”IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 10, pp. 2866–2880, 2015

  79. [79]

    Durrett,Probability: Theory and Examples

    R. Durrett,Probability: Theory and Examples. Cambridge University Press, 2019, vol. 49

  80. [80]

    1 Br Br X i=1 ∇nℓ(Φr n(xi);f n(θ(r,q) n , xi n)|yi)− 1 Br Br X i=1 ∇nℓ(Φr n(xi);f n(θ(r,0) n ,x i n)|yi) 2# (55) (a) ≤ 1 Br Br X i=1 EBr

    J. Stewart,Calculus, 8th ed. Cengage Learning, 2015. 22 APPENDIX TABLE OFCONTENTS Appendix A: Proof of Lemma 1 23 Appendix B: Proof of Proposition 1 25 Appendix C: Proof of Theorem 1 28 Appendix D: Solution to Optimization 34 D-A Geometric Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....

Showing first 80 references.