General Multi-User Distributed Computing: A Learning-Theoretic RKHS Framework for Generic Nonlinear Target Functions with Topology-Aware Risk Analysis

Ali Khalesi

arxiv: 2511.20127 · v2 · submitted 2025-11-25 · 💻 cs.IT · math.IT

General Multi-User Distributed Computing: A Learning-Theoretic RKHS Framework for Generic Nonlinear Target Functions with Topology-Aware Risk Analysis

Ali Khalesi This is my paper

Pith reviewed 2026-05-17 05:10 UTC · model grok-4.3

classification 💻 cs.IT math.IT

keywords distributed computingreproducing kernel Hilbert spacemulti-user computationtopology-aware riskquenched regimeannealed regimenonlinear target functionscomputation-communication trade-off

0 comments

The pith

A general framework for multi-user distributed computing bounds reconstruction risk for nonlinear kernel functions by separating spectral approximation from topology-dependent coverage.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces a General Multi-User Distributed Computing model in which users request heterogeneous nonlinear target functions represented in the reproducing kernel Hilbert space of a shift-invariant kernel. Servers operate under per-server computation budget Gamma and communication budget Delta, and the model permits arbitrary task assignments and connectivity topologies. In the quenched regime with fixed assignment and topology the paper derives matching upper and lower bounds on reconstruction risk that isolate a spectral approximation term from a topology-dependent coverage term. In the annealed regime with random assignment and links it characterizes average-risk scaling together with a coverage threshold. A reader would care because the results supply explicit computation-communication-accuracy trade-offs that apply to generic nonlinear tasks and flexible network structures rather than only linear or disjoint-support cases.

Core claim

In the GMUDC model, heterogeneous nonlinear target functions lie in the RKHS of a shift-invariant kernel. Under per-server computation budget Gamma and communication budget Delta, the quenched regime with fixed assignment and topology yields upper and lower bounds on reconstruction risk that separate a spectral approximation term from a topology-dependent coverage term. The annealed regime with random assignment and links yields average-risk scaling together with a topology-dependent coverage threshold. These bounds identify fundamental limits up to constants and logarithmic factors and recover the tessellated distributed-computing benchmark in the shared linear isotropic regime.

What carries the argument

The GMUDC model that represents generic nonlinear targets in a shift-invariant-kernel RKHS and separates spectral approximation error from topology-dependent coverage error in the risk bounds for both fixed and random assignment regimes.

Load-bearing premise

Target functions admit representation in the reproducing-kernel Hilbert space of a shift-invariant kernel, and the computation and communication budgets permit clean separation of spectral and coverage contributions to risk.

What would settle it

A simulation that uses a concrete shift-invariant kernel, fixed budgets Gamma and Delta, and a chosen assignment topology in which the measured reconstruction risk fails to match the derived upper or lower bounds that separate spectral and coverage terms would falsify the central claim.

Figures

Figures reproduced from arXiv: 2511.20127 by Ali Khalesi.

**Figure 1.** Figure 1: The K-user, N-server, T-shot General Multi-User Distributed Computing framework considers a setting where each server n computes a subset of subfunctions Sn = {fin,1 (.), fin,2 (.), . . . , fin,|Sn| (.)} and communicates the corresponding results to a subset of users Tn. This operation is performed under the computational constraint |Sn| ≤ Γ ≤ L and the communication constraint |Tn| ≤ ∆ ≤ K, which correspo… view at source ↗

read the original abstract

This paper studies multi-user distributed computation over shared real-valued subfunctions under computation and communication constraints. We consider a \emph{General Multi-User Distributed Computing (GMUDC)} model in which different users request heterogeneous target functions represented in the reproducing-kernel Hilbert space of a shift-invariant kernel, thereby covering generic nonlinear target mappings beyond linearly separable tasks. Unlike tessellated distributed computing frameworks that rely on disjoint-support topologies in their native setting, the GMUDC model allows arbitrary task-assignment and connectivity topologies subject to per-server computation and communication budgets~$\Gamma$ and~$\Delta$. We analyze two complementary regimes. In the \emph{quenched} regime, the assignment and communication topology are fixed, and we derive upper and lower bounds on the resulting reconstruction risk that separate a spectral approximation term from a topology-dependent coverage term. In the \emph{annealed} regime, the assignment and links are drawn uniformly at random from a prescribed ensemble, and we characterize the corresponding average-risk scaling together with a topology-dependent coverage threshold. These results provide a topology-aware characterization of the computation--communication--accuracy trade-off for approximate multi-user distributed computation. They identify fundamental limits, up to constants and logarithmic factors, under the model assumptions adopted in the paper. In the shared linear/isotropic comparison regime, the framework also recovers the relevant tessellated distributed computing benchmark, while the broader GMUDC formulation applies to generic nonlinear target functions in kernel RKHSs and accommodates a wider range of task-assignment and communication topologies than disjoint-support constructions alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends distributed computing to nonlinear RKHS targets with arbitrary topologies and cleanly separates spectral and coverage risk terms while recovering the linear tessellated case.

read the letter

The main point is that this work introduces a General Multi-User Distributed Computing model that represents heterogeneous nonlinear targets in the RKHS of a shift-invariant kernel and analyzes reconstruction risk under per-server computation and communication budgets for any task assignment and connectivity pattern. It moves beyond the disjoint-support topologies common in earlier tessellated setups. In the quenched regime the bounds split the risk into a spectral approximation term and a topology-dependent coverage term; in the annealed regime they give average-risk scaling plus a coverage threshold. Both follow from the kernel inner-product structure and additive risk decomposition under the stated budgets. Recovering the prior linear isotropic benchmark up to logarithmic factors is a useful internal check. The assumptions on the kernel and the separation of terms are stated explicitly and do not appear circular. One soft spot is that the results are characterized up to constants and log factors, which is standard but leaves the precise tightness for specific kernels or topologies open. Another is that the framework stays theoretical; without numerical checks on how well the RKHS representation matches concrete nonlinear functions in sensor or edge settings, the practical reach is harder to judge. The model assumptions are reasonable for this style of analysis but could limit direct application when targets deviate strongly from shift-invariant kernels. This is for information theorists and distributed-systems researchers who already work on computation-communication trade-offs and want to handle nonlinear targets and flexible topologies. A reader looking for fundamental limits in more general settings would get value from the topology-aware separation. I would send it to peer review; the construction is coherent, the consistency check holds, and it fills a clear gap without load-bearing flaws.

Referee Report

0 major / 2 minor

Summary. The paper introduces a General Multi-User Distributed Computing (GMUDC) model for heterogeneous nonlinear target functions represented in the RKHS of a shift-invariant kernel, subject to per-server computation budget Γ and communication budget Δ. It derives upper and lower bounds on reconstruction risk in the quenched regime that separate a spectral approximation term from a topology-dependent coverage term, and characterizes average-risk scaling together with a coverage threshold in the annealed regime where assignments and links are drawn uniformly from a prescribed ensemble. The framework recovers the relevant tessellated benchmark in the linear/isotropic specialization while extending to generic nonlinear targets and arbitrary topologies.

Significance. If the derivations hold, the work supplies a topology-aware learning-theoretic characterization of computation-communication-accuracy trade-offs for approximate multi-user distributed computation of nonlinear functions. The explicit separation of spectral and coverage terms, the recovery of prior linear benchmarks up to logarithmic factors, and the internal consistency of the RKHS-based construction under the stated assumptions constitute clear strengths. These results could inform fundamental limits for distributed systems handling complex, non-linear tasks beyond disjoint-support or linearly separable settings.

minor comments (2)

Abstract: the phrase 'up to constants and logarithmic factors' is used for the fundamental limits; an explicit statement of the dependence on kernel parameters or ambient dimension would improve precision without altering the central claims.
§4 (quenched-regime bounds): the additive decomposition of the reconstruction risk into spectral and coverage terms is invoked repeatedly; a short reminder of the precise RKHS inner-product identity used for the separation would aid readers who skip the preliminary sections.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and significance assessment of our work on the General Multi-User Distributed Computing (GMUDC) framework. The referee accurately captures the key elements, including the RKHS representation of heterogeneous nonlinear targets, the separation of spectral approximation and topology-dependent coverage terms in the quenched regime, the average-risk scaling and coverage threshold in the annealed regime, and the recovery of linear/isotropic benchmarks. We appreciate the recognition that these results extend beyond disjoint-support settings and could inform fundamental limits for distributed nonlinear computation. Given the minor_revision recommendation and the absence of any specific major comments, we will conduct a careful pass over the manuscript to improve presentation and clarity where appropriate.

Circularity Check

0 steps flagged

Derivation is self-contained from RKHS structure and topology assumptions

full rationale

The quenched-regime bounds separate spectral approximation from topology-dependent coverage via the RKHS inner-product structure and additive risk decomposition under fixed per-server budgets Γ and Δ. The annealed-regime scaling and coverage threshold follow from the uniform measure over the assignment ensemble and shift-invariance of the kernel, without any reduction to fitted parameters or self-defined quantities. The linear/isotropic specialization recovers the tessellated benchmark as an internal consistency check rather than a load-bearing premise. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the derivation chain; the results remain independent of the target claims under the stated model assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on the assumption that target functions lie in an RKHS of a shift-invariant kernel and that computation/communication budgets permit additive separation of approximation and coverage error terms.

axioms (2)

domain assumption Target functions admit representation in the RKHS of a shift-invariant kernel
Stated in abstract as the setting that covers generic nonlinear mappings.
domain assumption Per-server computation budget Γ and communication budget Δ constrain the model
Used to define feasible assignments and links in both regimes.

pith-pipeline@v0.9.0 · 5581 in / 1283 out tokens · 36291 ms · 2026-05-17T05:10:02.490985+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

quenched population risk ... separate a spectral approximation term from a topology-dependent coverage term
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

B² ∑_{j > m_k(T)} λ_j ∨ ε_cov,k(A,L)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages

[1]

A survey on distributed machine learning,

J. Verbraeken, M. Wolting, J. Katzy, J. Kloppenburg, T. Verbelen, and J. S. Rellermeyer, “A survey on distributed machine learning,”ACM Computing Surveys (CSUR), vol. 53, no. 2, pp. 1–33, 2020. 33

work page 2020
[2]

Private retrieval, computing and learning: Recent progress and future challenges,

S. Ulukus, S. Avestimehr, M. Gastpar, S. Jafar, R. Tandon, and C. Tian, “Private retrieval, computing and learning: Recent progress and future challenges,”IEEE Journal on Selected Areas in Communications, 2022

work page 2022
[3]

Fundamental limits of coded linear transform,

S. Wang, J. Liu, N. Shroff, and P. Yang, “Fundamental limits of coded linear transform,” inProceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, K. Chaudhuri and M. Sugiyama, Eds., vol. 89. PMLR, 16–18 Apr 2018, pp. 577–585. [Online]. Available: https://proceedin...

work page 2018
[4]

A fundamental tradeoff between computation and communication in distributed computing,

S. Li, M. A. Maddah-Ali, Q. Yu, and A. S. Avestimehr, “A fundamental tradeoff between computation and communication in distributed computing,”IEEE Transactions on Information Theory, vol. 64, no. 1, pp. 109–128, 2017

work page 2017
[5]

Polynomial codes: an optimal design for high-dimensional coded matrix multiplication,

Q. Yu, M. Maddah-Ali, and S. Avestimehr, “Polynomial codes: an optimal design for high-dimensional coded matrix multiplication,”Advances in Neural Information Processing Systems, vol. 30, 2017

work page 2017
[6]

Codedreduce: A fast and robust framework for gradient aggregation in distributed learning,

A. Reisizadeh, S. Prakash, R. Pedarsani, and A. S. Avestimehr, “Codedreduce: A fast and robust framework for gradient aggregation in distributed learning,”IEEE/ACM Transactions on Networking, 2021

work page 2021
[7]

Distributed learning in wireless networks: Recent progress and future challenges,

M. Chen, D. Gündüz, K. Huang, W. Saad, M. Bennis, A. V. Feljan, and H. V. Poor, “Distributed learning in wireless networks: Recent progress and future challenges,”IEEE Journal on Selected Areas in Communications, vol. 39, no. 12, pp. 3579–3605, 2021

work page 2021
[8]

Tessellated distributed computing,

A. Khalesi and P. Elia, “Tessellated distributed computing,”IEEE Transactions on Information Theory, vol. 71, no. 6, pp. 4754–4784, 2025

work page 2025
[9]

A distributed computationally aware quantizer design via hyper binning,

D. Malak and M. Médard, “A distributed computationally aware quantizer design via hyper binning,”IEEE Transactions on Signal Processing, vol. 71, pp. 76–91, 2023

work page 2023
[10]

Sketching as a tool for numerical linear algebra,

D. P. Woodruffet al., “Sketching as a tool for numerical linear algebra,”Foundations and Trends®in Theoretical Computer Science, vol. 10, no. 1–2, pp. 1–157, 2014

work page 2014
[11]

Codedsketch: A coding scheme for distributed computation of approximated matrix multiplication,

T. Jahani-Nezhad and M. A. Maddah-Ali, “Codedsketch: A coding scheme for distributed computation of approximated matrix multiplication,”IEEE Transactions on Information Theory, vol. 67, no. 6, pp. 4185–4196, 2021

work page 2021
[12]

Random sampling for distributed coded matrix multiplication,

W.-T. Chang and R. Tandon, “Random sampling for distributed coded matrix multiplication,” inICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 8187–8191

work page 2019
[13]

Approximate weightedCR coded matrix multiplication,

N. Charalambides, M. Pilanci, and A. O. Hero, “Approximate weightedCR coded matrix multiplication,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 5095–5099

work page 2021
[14]

Oversketch: Approximate matrix multiplication for the cloud,

V. Gupta, S. Wang, T. Courtade, and K. Ramchandran, “Oversketch: Approximate matrix multiplication for the cloud,” in2018 IEEE International Conference on Big Data (Big Data), 2018, pp. 298–304

work page 2018
[15]

Anytime coding for distributed computation,

N. S. Ferdinand and S. C. Draper, “Anytime coding for distributed computation,” in2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2016, pp. 954–960

work page 2016
[16]

A sequential approximation framework for coded distributed optimization,

J. Zhu, Y. Pu, V. Gupta, C. Tomlin, and K. Ramchandran, “A sequential approximation framework for coded distributed optimization,” in2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2017, pp. 1240–1247

work page 2017
[17]

Berrut approximated coded computing: Straggler resistance beyond polynomial computing,

T. Jahani-Nezhad and M. A. Maddah-Ali, “Berrut approximated coded computing: Straggler resistance beyond polynomial computing,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 111–122, 2023

work page 2023
[18]

ϵ-approximate coded matrix multiplication is 34 nearly twice as efficient as exact multiplication,

H. Jeong, A. Devulapalli, V. R. Cadambe, and F. P. Calmon, “ϵ-approximate coded matrix multiplication is 34 nearly twice as efficient as exact multiplication,”IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 3, pp. 845–854, 2021

work page 2021
[19]

Sparse randomKhatri-Rao product codes for distributed matrix multiplication,

R. Ji, A. Heidarzadeh, and K. R. Narayanan, “Sparse randomKhatri-Rao product codes for distributed matrix multiplication,” in2022 IEEE Information Theory Workshop (ITW), 2022, pp. 416–421

work page 2022
[20]

Compression-informed coded computing,

M. Rudow, N. Charalambides, A. O. Hero, and K. Rashmi, “Compression-informed coded computing,” in2023 IEEE International Symposium on Information Theory (ISIT), 2023, pp. 2177–2182

work page 2023
[21]

A learning-based approach to approximate coded computation,

N. Agrawal, Y. Qiu, M. Frey, I. Bjelakovic, S. Maghsudi, S. Stanczak, and J. Zhu, “A learning-based approach to approximate coded computation,” in2022 IEEE Information Theory Workshop (ITW), 2022, pp. 600–605

work page 2022
[22]

Optimal communication-computation trade-off in heterogeneous gradient coding,

T. Jahani-Nezhad and M. A. Maddah-Ali, “Optimal communication-computation trade-off in heterogeneous gradient coding,”IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 3, pp. 1002–1011, 2021

work page 2021
[23]

Codedsketch: Coded distributed computation of approximated matrix multiplication,

——, “Codedsketch: Coded distributed computation of approximated matrix multiplication,” in2019 IEEE International Symposium on Information Theory (ISIT), 2019, pp. 2489–2493

work page 2019
[24]

Hybrid-order distributed sgd: Balancing communication overhead, computational complexity, and convergence rate for distributed learning,

N. Omidvar, S. M. Hosseini, and M. A. Maddah-Ali, “Hybrid-order distributed sgd: Balancing communication overhead, computational complexity, and convergence rate for distributed learning,”Neurocomputing, vol. 599, p. 128020, 2024

work page 2024
[25]

Coded computing for resilient distributed computing: A learning- theoretic framework,

P. Moradi, B. Tahmasebi, and M. Maddah-Ali, “Coded computing for resilient distributed computing: A learning- theoretic framework,” inAdvances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024
[26]

General coded computing in a probabilistic straggler regime,

P. Moradi and M. Maddah-Ali, “General coded computing in a probabilistic straggler regime,”arXiv preprint arXiv:2502.00645, 2025

work page arXiv 2025
[27]

General coded computing: Adversarial settings,

P. Moradi, H. Akbarinodehi, and M. Maddah-Ali, “General coded computing: Adversarial settings,”arXiv preprint arXiv:2502.08058, 2025

work page arXiv 2025
[28]

Distributed linearly separable computation,

K. Wan, H. Sun, M. Ji, and G. Caire, “Distributed linearly separable computation,”IEEE Transactions on Information Theory, 2021

work page 2021
[29]

Perfect multi-user distributed computing,

A. Khalesi and P. Elia, “Perfect multi-user distributed computing,” inIEEE International Symposium on Information Theory (ISIT), Athens, Greece, 2024, pp. 1349–1354

work page 2024
[30]

Multi-user linearly-separable distributed computing,

——, “Multi-user linearly-separable distributed computing,” 2022. [Online]. Available: https://arxiv.org/abs/ 2206.11119

work page arXiv 2022
[31]

Universally decodable matrices for distributed matrix-vector multiplication,

A. Ramamoorthy, L. Tang, and P. O. Vontobel, “Universally decodable matrices for distributed matrix-vector multiplication,” in2019 IEEE International Symposium on Information Theory (ISIT), 2019, pp. 1777–1781

work page 2019
[32]

Communication-computation efficient gradient coding,

M. Ye and E. Abbe, “Communication-computation efficient gradient coding,” inInternational Conference on Machine Learning. PMLR, 2018, pp. 5610–5619

work page 2018
[33]

Gradient coding from cyclicMDS codes and expander graphs,

N. Raviv, I. Tamo, R. Tandon, and A. G. Dimakis, “Gradient coding from cyclicMDS codes and expander graphs,”IEEE Transactions on Information Theory, vol. 66, no. 12, pp. 7475–7489, 2020

work page 2020
[34]

Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding,

Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding,”IEEE Transactions on Information Theory, vol. 66, no. 3, pp. 1920–1933, 2020

work page 1920
[35]

Advances and open problems in federated learning,

P. Kairouz, H. B. McMahan, B. Aventet al., “Advances and open problems in federated learning,”Foundations and Trends in Machine Learning, vol. 14, no. 1–2, pp. 1–210, 2021

work page 2021
[36]

Communication-efficient distributed kernel regression with adaptive compression,

J. Park, Y. Y. Kim, T. Kim, and S.-H. Choi, “Communication-efficient distributed kernel regression with adaptive compression,”IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 11, pp. 4751–4763, 2021. 35

work page 2021
[37]

Distributed ridge regression: optimal rates and communication trade-offs,

J. D. Lee, Q. Liu, and Y. Sun, “Distributed ridge regression: optimal rates and communication trade-offs,”Annals of Statistics, vol. 49, no. 1, pp. 77–102, 2021

work page 2021
[38]

Consensus and cooperation in networked multi-agent systems,

R. Olfati-Saber, J. A. Fax, and R. M. Murray, “Consensus and cooperation in networked multi-agent systems,” Proceedings of the IEEE, vol. 95, no. 1, pp. 215–233, 2007

work page 2007
[39]

Diffusion strategies for distributed kalman filtering and smoothing,

F. S. Cattivelli and A. H. Sayed, “Diffusion strategies for distributed kalman filtering and smoothing,”IEEE Transactions on Automatic Control, vol. 55, no. 9, pp. 2069–2084, 2010

work page 2069
[40]

Coordination of uavs via coupled dynamic models,

R. W. Beard, T. W. McLain, M. A. Goodrich, and E. P. Anderson, “Coordination of uavs via coupled dynamic models,”Proceedings of the 2002 American Control Conference (ACC), pp. 2558–2563, 2002

work page 2002
[41]

Distributed aerodynamic field estimation for multiple uavs using consensus-based filters,

Q. Sun, Z. Lin, and H. Wang, “Distributed aerodynamic field estimation for multiple uavs using consensus-based filters,”Aerospace Science and Technology, vol. 87, pp. 567–577, 2019

work page 2019
[42]

Decentralized cooperative estimation for spacecraft formation flying with communication delays,

Z. Li, Q. Sun, and Y. Zhang, “Decentralized cooperative estimation for spacecraft formation flying with communication delays,”Acta Astronautica, vol. 175, pp. 220–232, 2020

work page 2020
[43]

Rademacher and gaussian complexities: Risk bounds and structural results,

P. L. Bartlett and S. Mendelson, “Rademacher and gaussian complexities: Risk bounds and structural results,” Journal of Machine Learning Research, vol. 3, pp. 463–482, 2002

work page 2002
[44]

Schölkopf and A

B. Schölkopf and A. J. Smola,Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA: MIT Press, 2002, chapter 2, especially Sections 2.3–2.4 on RKHS and the reproducing property

work page 2002
[45]

Berlinet and C

A. Berlinet and C. Thomas-Agnan,Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, 2004, see Chapter 2 on Mercer kernels and integral operators

work page 2004
[46]

Generalization properties of learning with random features,

A. Rudi and L. Rosasco, “Generalization properties of learning with random features,”Advances in Neural Information Processing Systems, vol. 30, 2017. [Online]. Available: https://proceedings.neurips.cc/paper/2017/ hash/f7228da70fb28752fe384c1dd17c0c08-Abstract.html

work page 2017
[47]

Wendland,Scattered Data Approximation

H. Wendland,Scattered Data Approximation. Cambridge: Cambridge University Press, 2005, see chapterPositive definite and conditionally positive definite functions(Bochner & Schoenberg)

work page 2005
[48]

Mohri, A

M. Mohri, A. Rostamizadeh, and A. Talwalkar,Foundations of Machine Learning, 2nd ed. Cambridge, MA: MIT Press, 2018, see chapter:Kernel Methods— subsection onpositive definite kernels and Gram matrices; explicitly notes Gram matrices are PSD, which applies toG=E[Φ(X)Φ(X)⊤]

work page 2018
[49]

Hastie, R

T. Hastie, R. Tibshirani, and J. Friedman,The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York: Springer, 2009, see chapters:Linear Methods for Regression(ridge; hat matrix) andModel Assessment and Selection(effective degrees of freedomtr(Sλ))

work page 2009
[50]

Schölkopf and A

B. Schölkopf and A. J. Smola,Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA: MIT Press, 2002

work page 2002
[51]

Pinkus,n-Widths in Approximation Theory, ser

A. Pinkus,n-Widths in Approximation Theory, ser. Ergebnisse der Mathematik und ihrer Grenzgebiete. Springer, 1985, vol. 93, see Chapter 1, especially sections 1.2–1.4 and Theorem 1.3.2

work page 1985
[52]

Carl and I

B. Carl and I. Stephani,Entropy, Compactness and the Approximation of Operators, ser. Cambridge Tracts in Mathematics. Cambridge University Press, 1990, vol. 98, see Chapter 2, Sections 2.3, especially Corollary 2.3.2

work page 1990
[53]

Novak and H

E. Novak and H. Woźniakowski,Tractability of Multivariate Problems. Volume I: Linear Information, ser. EMS Tracts in Mathematics. European Mathematical Society, 2008, vol. 6, see Chapter 4, Theorem 4.4

work page 2008
[54]

variance

R. Bhatia,Matrix Analysis, ser. Graduate Texts in Mathematics. New York: Springer, 1997, vol. 169. 36 Appendix A Proof of Lemma 1 Proof. Let u =w(x)and v =w(x′), and fix the maskS⊆[L]with |S|= Γand γ= Γ/L. Write ˜ω=ω⊙1S and ϕu ≜ϕ(x;ω,b,S) = √ 2 γcos ( ⟨˜ω,u⟩+b ) , ϕv ≜ϕ(x′;ω,b,S) = √ 2 γcos ( ⟨˜ω,v⟩+b ) . Set the single-atom kernel estimatorZ≜ϕuϕv. We ass...

work page 1997

[1] [1]

A survey on distributed machine learning,

J. Verbraeken, M. Wolting, J. Katzy, J. Kloppenburg, T. Verbelen, and J. S. Rellermeyer, “A survey on distributed machine learning,”ACM Computing Surveys (CSUR), vol. 53, no. 2, pp. 1–33, 2020. 33

work page 2020

[2] [2]

Private retrieval, computing and learning: Recent progress and future challenges,

S. Ulukus, S. Avestimehr, M. Gastpar, S. Jafar, R. Tandon, and C. Tian, “Private retrieval, computing and learning: Recent progress and future challenges,”IEEE Journal on Selected Areas in Communications, 2022

work page 2022

[3] [3]

Fundamental limits of coded linear transform,

S. Wang, J. Liu, N. Shroff, and P. Yang, “Fundamental limits of coded linear transform,” inProceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, K. Chaudhuri and M. Sugiyama, Eds., vol. 89. PMLR, 16–18 Apr 2018, pp. 577–585. [Online]. Available: https://proceedin...

work page 2018

[4] [4]

A fundamental tradeoff between computation and communication in distributed computing,

S. Li, M. A. Maddah-Ali, Q. Yu, and A. S. Avestimehr, “A fundamental tradeoff between computation and communication in distributed computing,”IEEE Transactions on Information Theory, vol. 64, no. 1, pp. 109–128, 2017

work page 2017

[5] [5]

Polynomial codes: an optimal design for high-dimensional coded matrix multiplication,

Q. Yu, M. Maddah-Ali, and S. Avestimehr, “Polynomial codes: an optimal design for high-dimensional coded matrix multiplication,”Advances in Neural Information Processing Systems, vol. 30, 2017

work page 2017

[6] [6]

Codedreduce: A fast and robust framework for gradient aggregation in distributed learning,

A. Reisizadeh, S. Prakash, R. Pedarsani, and A. S. Avestimehr, “Codedreduce: A fast and robust framework for gradient aggregation in distributed learning,”IEEE/ACM Transactions on Networking, 2021

work page 2021

[7] [7]

Distributed learning in wireless networks: Recent progress and future challenges,

M. Chen, D. Gündüz, K. Huang, W. Saad, M. Bennis, A. V. Feljan, and H. V. Poor, “Distributed learning in wireless networks: Recent progress and future challenges,”IEEE Journal on Selected Areas in Communications, vol. 39, no. 12, pp. 3579–3605, 2021

work page 2021

[8] [8]

Tessellated distributed computing,

A. Khalesi and P. Elia, “Tessellated distributed computing,”IEEE Transactions on Information Theory, vol. 71, no. 6, pp. 4754–4784, 2025

work page 2025

[9] [9]

A distributed computationally aware quantizer design via hyper binning,

D. Malak and M. Médard, “A distributed computationally aware quantizer design via hyper binning,”IEEE Transactions on Signal Processing, vol. 71, pp. 76–91, 2023

work page 2023

[10] [10]

Sketching as a tool for numerical linear algebra,

D. P. Woodruffet al., “Sketching as a tool for numerical linear algebra,”Foundations and Trends®in Theoretical Computer Science, vol. 10, no. 1–2, pp. 1–157, 2014

work page 2014

[11] [11]

Codedsketch: A coding scheme for distributed computation of approximated matrix multiplication,

T. Jahani-Nezhad and M. A. Maddah-Ali, “Codedsketch: A coding scheme for distributed computation of approximated matrix multiplication,”IEEE Transactions on Information Theory, vol. 67, no. 6, pp. 4185–4196, 2021

work page 2021

[12] [12]

Random sampling for distributed coded matrix multiplication,

W.-T. Chang and R. Tandon, “Random sampling for distributed coded matrix multiplication,” inICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 8187–8191

work page 2019

[13] [13]

Approximate weightedCR coded matrix multiplication,

N. Charalambides, M. Pilanci, and A. O. Hero, “Approximate weightedCR coded matrix multiplication,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 5095–5099

work page 2021

[14] [14]

Oversketch: Approximate matrix multiplication for the cloud,

V. Gupta, S. Wang, T. Courtade, and K. Ramchandran, “Oversketch: Approximate matrix multiplication for the cloud,” in2018 IEEE International Conference on Big Data (Big Data), 2018, pp. 298–304

work page 2018

[15] [15]

Anytime coding for distributed computation,

N. S. Ferdinand and S. C. Draper, “Anytime coding for distributed computation,” in2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2016, pp. 954–960

work page 2016

[16] [16]

A sequential approximation framework for coded distributed optimization,

J. Zhu, Y. Pu, V. Gupta, C. Tomlin, and K. Ramchandran, “A sequential approximation framework for coded distributed optimization,” in2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2017, pp. 1240–1247

work page 2017

[17] [17]

Berrut approximated coded computing: Straggler resistance beyond polynomial computing,

T. Jahani-Nezhad and M. A. Maddah-Ali, “Berrut approximated coded computing: Straggler resistance beyond polynomial computing,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 111–122, 2023

work page 2023

[18] [18]

ϵ-approximate coded matrix multiplication is 34 nearly twice as efficient as exact multiplication,

H. Jeong, A. Devulapalli, V. R. Cadambe, and F. P. Calmon, “ϵ-approximate coded matrix multiplication is 34 nearly twice as efficient as exact multiplication,”IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 3, pp. 845–854, 2021

work page 2021

[19] [19]

Sparse randomKhatri-Rao product codes for distributed matrix multiplication,

R. Ji, A. Heidarzadeh, and K. R. Narayanan, “Sparse randomKhatri-Rao product codes for distributed matrix multiplication,” in2022 IEEE Information Theory Workshop (ITW), 2022, pp. 416–421

work page 2022

[20] [20]

Compression-informed coded computing,

M. Rudow, N. Charalambides, A. O. Hero, and K. Rashmi, “Compression-informed coded computing,” in2023 IEEE International Symposium on Information Theory (ISIT), 2023, pp. 2177–2182

work page 2023

[21] [21]

A learning-based approach to approximate coded computation,

N. Agrawal, Y. Qiu, M. Frey, I. Bjelakovic, S. Maghsudi, S. Stanczak, and J. Zhu, “A learning-based approach to approximate coded computation,” in2022 IEEE Information Theory Workshop (ITW), 2022, pp. 600–605

work page 2022

[22] [22]

Optimal communication-computation trade-off in heterogeneous gradient coding,

T. Jahani-Nezhad and M. A. Maddah-Ali, “Optimal communication-computation trade-off in heterogeneous gradient coding,”IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 3, pp. 1002–1011, 2021

work page 2021

[23] [23]

Codedsketch: Coded distributed computation of approximated matrix multiplication,

——, “Codedsketch: Coded distributed computation of approximated matrix multiplication,” in2019 IEEE International Symposium on Information Theory (ISIT), 2019, pp. 2489–2493

work page 2019

[24] [24]

Hybrid-order distributed sgd: Balancing communication overhead, computational complexity, and convergence rate for distributed learning,

N. Omidvar, S. M. Hosseini, and M. A. Maddah-Ali, “Hybrid-order distributed sgd: Balancing communication overhead, computational complexity, and convergence rate for distributed learning,”Neurocomputing, vol. 599, p. 128020, 2024

work page 2024

[25] [25]

Coded computing for resilient distributed computing: A learning- theoretic framework,

P. Moradi, B. Tahmasebi, and M. Maddah-Ali, “Coded computing for resilient distributed computing: A learning- theoretic framework,” inAdvances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024

[26] [26]

General coded computing in a probabilistic straggler regime,

P. Moradi and M. Maddah-Ali, “General coded computing in a probabilistic straggler regime,”arXiv preprint arXiv:2502.00645, 2025

work page arXiv 2025

[27] [27]

General coded computing: Adversarial settings,

P. Moradi, H. Akbarinodehi, and M. Maddah-Ali, “General coded computing: Adversarial settings,”arXiv preprint arXiv:2502.08058, 2025

work page arXiv 2025

[28] [28]

Distributed linearly separable computation,

K. Wan, H. Sun, M. Ji, and G. Caire, “Distributed linearly separable computation,”IEEE Transactions on Information Theory, 2021

work page 2021

[29] [29]

Perfect multi-user distributed computing,

A. Khalesi and P. Elia, “Perfect multi-user distributed computing,” inIEEE International Symposium on Information Theory (ISIT), Athens, Greece, 2024, pp. 1349–1354

work page 2024

[30] [30]

Multi-user linearly-separable distributed computing,

——, “Multi-user linearly-separable distributed computing,” 2022. [Online]. Available: https://arxiv.org/abs/ 2206.11119

work page arXiv 2022

[31] [31]

Universally decodable matrices for distributed matrix-vector multiplication,

A. Ramamoorthy, L. Tang, and P. O. Vontobel, “Universally decodable matrices for distributed matrix-vector multiplication,” in2019 IEEE International Symposium on Information Theory (ISIT), 2019, pp. 1777–1781

work page 2019

[32] [32]

Communication-computation efficient gradient coding,

M. Ye and E. Abbe, “Communication-computation efficient gradient coding,” inInternational Conference on Machine Learning. PMLR, 2018, pp. 5610–5619

work page 2018

[33] [33]

Gradient coding from cyclicMDS codes and expander graphs,

N. Raviv, I. Tamo, R. Tandon, and A. G. Dimakis, “Gradient coding from cyclicMDS codes and expander graphs,”IEEE Transactions on Information Theory, vol. 66, no. 12, pp. 7475–7489, 2020

work page 2020

[34] [34]

Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding,

Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding,”IEEE Transactions on Information Theory, vol. 66, no. 3, pp. 1920–1933, 2020

work page 1920

[35] [35]

Advances and open problems in federated learning,

P. Kairouz, H. B. McMahan, B. Aventet al., “Advances and open problems in federated learning,”Foundations and Trends in Machine Learning, vol. 14, no. 1–2, pp. 1–210, 2021

work page 2021

[36] [36]

Communication-efficient distributed kernel regression with adaptive compression,

J. Park, Y. Y. Kim, T. Kim, and S.-H. Choi, “Communication-efficient distributed kernel regression with adaptive compression,”IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 11, pp. 4751–4763, 2021. 35

work page 2021

[37] [37]

Distributed ridge regression: optimal rates and communication trade-offs,

J. D. Lee, Q. Liu, and Y. Sun, “Distributed ridge regression: optimal rates and communication trade-offs,”Annals of Statistics, vol. 49, no. 1, pp. 77–102, 2021

work page 2021

[38] [38]

Consensus and cooperation in networked multi-agent systems,

R. Olfati-Saber, J. A. Fax, and R. M. Murray, “Consensus and cooperation in networked multi-agent systems,” Proceedings of the IEEE, vol. 95, no. 1, pp. 215–233, 2007

work page 2007

[39] [39]

Diffusion strategies for distributed kalman filtering and smoothing,

F. S. Cattivelli and A. H. Sayed, “Diffusion strategies for distributed kalman filtering and smoothing,”IEEE Transactions on Automatic Control, vol. 55, no. 9, pp. 2069–2084, 2010

work page 2069

[40] [40]

Coordination of uavs via coupled dynamic models,

R. W. Beard, T. W. McLain, M. A. Goodrich, and E. P. Anderson, “Coordination of uavs via coupled dynamic models,”Proceedings of the 2002 American Control Conference (ACC), pp. 2558–2563, 2002

work page 2002

[41] [41]

Distributed aerodynamic field estimation for multiple uavs using consensus-based filters,

Q. Sun, Z. Lin, and H. Wang, “Distributed aerodynamic field estimation for multiple uavs using consensus-based filters,”Aerospace Science and Technology, vol. 87, pp. 567–577, 2019

work page 2019

[42] [42]

Decentralized cooperative estimation for spacecraft formation flying with communication delays,

Z. Li, Q. Sun, and Y. Zhang, “Decentralized cooperative estimation for spacecraft formation flying with communication delays,”Acta Astronautica, vol. 175, pp. 220–232, 2020

work page 2020

[43] [43]

Rademacher and gaussian complexities: Risk bounds and structural results,

P. L. Bartlett and S. Mendelson, “Rademacher and gaussian complexities: Risk bounds and structural results,” Journal of Machine Learning Research, vol. 3, pp. 463–482, 2002

work page 2002

[44] [44]

Schölkopf and A

B. Schölkopf and A. J. Smola,Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA: MIT Press, 2002, chapter 2, especially Sections 2.3–2.4 on RKHS and the reproducing property

work page 2002

[45] [45]

Berlinet and C

A. Berlinet and C. Thomas-Agnan,Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, 2004, see Chapter 2 on Mercer kernels and integral operators

work page 2004

[46] [46]

Generalization properties of learning with random features,

A. Rudi and L. Rosasco, “Generalization properties of learning with random features,”Advances in Neural Information Processing Systems, vol. 30, 2017. [Online]. Available: https://proceedings.neurips.cc/paper/2017/ hash/f7228da70fb28752fe384c1dd17c0c08-Abstract.html

work page 2017

[47] [47]

Wendland,Scattered Data Approximation

H. Wendland,Scattered Data Approximation. Cambridge: Cambridge University Press, 2005, see chapterPositive definite and conditionally positive definite functions(Bochner & Schoenberg)

work page 2005

[48] [48]

Mohri, A

M. Mohri, A. Rostamizadeh, and A. Talwalkar,Foundations of Machine Learning, 2nd ed. Cambridge, MA: MIT Press, 2018, see chapter:Kernel Methods— subsection onpositive definite kernels and Gram matrices; explicitly notes Gram matrices are PSD, which applies toG=E[Φ(X)Φ(X)⊤]

work page 2018

[49] [49]

Hastie, R

T. Hastie, R. Tibshirani, and J. Friedman,The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York: Springer, 2009, see chapters:Linear Methods for Regression(ridge; hat matrix) andModel Assessment and Selection(effective degrees of freedomtr(Sλ))

work page 2009

[50] [50]

Schölkopf and A

B. Schölkopf and A. J. Smola,Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA: MIT Press, 2002

work page 2002

[51] [51]

Pinkus,n-Widths in Approximation Theory, ser

A. Pinkus,n-Widths in Approximation Theory, ser. Ergebnisse der Mathematik und ihrer Grenzgebiete. Springer, 1985, vol. 93, see Chapter 1, especially sections 1.2–1.4 and Theorem 1.3.2

work page 1985

[52] [52]

Carl and I

B. Carl and I. Stephani,Entropy, Compactness and the Approximation of Operators, ser. Cambridge Tracts in Mathematics. Cambridge University Press, 1990, vol. 98, see Chapter 2, Sections 2.3, especially Corollary 2.3.2

work page 1990

[53] [53]

Novak and H

E. Novak and H. Woźniakowski,Tractability of Multivariate Problems. Volume I: Linear Information, ser. EMS Tracts in Mathematics. European Mathematical Society, 2008, vol. 6, see Chapter 4, Theorem 4.4

work page 2008

[54] [54]

variance

R. Bhatia,Matrix Analysis, ser. Graduate Texts in Mathematics. New York: Springer, 1997, vol. 169. 36 Appendix A Proof of Lemma 1 Proof. Let u =w(x)and v =w(x′), and fix the maskS⊆[L]with |S|= Γand γ= Γ/L. Write ˜ω=ω⊙1S and ϕu ≜ϕ(x;ω,b,S) = √ 2 γcos ( ⟨˜ω,u⟩+b ) , ϕv ≜ϕ(x′;ω,b,S) = √ 2 γcos ( ⟨˜ω,v⟩+b ) . Set the single-atom kernel estimatorZ≜ϕuϕv. We ass...

work page 1997