pith. sign in

arxiv: 2511.20127 · v2 · submitted 2025-11-25 · 💻 cs.IT · math.IT

General Multi-User Distributed Computing: A Learning-Theoretic RKHS Framework for Generic Nonlinear Target Functions with Topology-Aware Risk Analysis

Pith reviewed 2026-05-17 05:10 UTC · model grok-4.3

classification 💻 cs.IT math.IT
keywords distributed computingreproducing kernel Hilbert spacemulti-user computationtopology-aware riskquenched regimeannealed regimenonlinear target functionscomputation-communication trade-off
0
0 comments X

The pith

A general framework for multi-user distributed computing bounds reconstruction risk for nonlinear kernel functions by separating spectral approximation from topology-dependent coverage.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces a General Multi-User Distributed Computing model in which users request heterogeneous nonlinear target functions represented in the reproducing kernel Hilbert space of a shift-invariant kernel. Servers operate under per-server computation budget Gamma and communication budget Delta, and the model permits arbitrary task assignments and connectivity topologies. In the quenched regime with fixed assignment and topology the paper derives matching upper and lower bounds on reconstruction risk that isolate a spectral approximation term from a topology-dependent coverage term. In the annealed regime with random assignment and links it characterizes average-risk scaling together with a coverage threshold. A reader would care because the results supply explicit computation-communication-accuracy trade-offs that apply to generic nonlinear tasks and flexible network structures rather than only linear or disjoint-support cases.

Core claim

In the GMUDC model, heterogeneous nonlinear target functions lie in the RKHS of a shift-invariant kernel. Under per-server computation budget Gamma and communication budget Delta, the quenched regime with fixed assignment and topology yields upper and lower bounds on reconstruction risk that separate a spectral approximation term from a topology-dependent coverage term. The annealed regime with random assignment and links yields average-risk scaling together with a topology-dependent coverage threshold. These bounds identify fundamental limits up to constants and logarithmic factors and recover the tessellated distributed-computing benchmark in the shared linear isotropic regime.

What carries the argument

The GMUDC model that represents generic nonlinear targets in a shift-invariant-kernel RKHS and separates spectral approximation error from topology-dependent coverage error in the risk bounds for both fixed and random assignment regimes.

Load-bearing premise

Target functions admit representation in the reproducing-kernel Hilbert space of a shift-invariant kernel, and the computation and communication budgets permit clean separation of spectral and coverage contributions to risk.

What would settle it

A simulation that uses a concrete shift-invariant kernel, fixed budgets Gamma and Delta, and a chosen assignment topology in which the measured reconstruction risk fails to match the derived upper or lower bounds that separate spectral and coverage terms would falsify the central claim.

Figures

Figures reproduced from arXiv: 2511.20127 by Ali Khalesi.

Figure 1
Figure 1. Figure 1: The K-user, N-server, T-shot General Multi-User Distributed Computing framework considers a setting where each server n computes a subset of subfunctions Sn = {fin,1 (.), fin,2 (.), . . . , fin,|Sn| (.)} and communicates the corresponding results to a subset of users Tn. This operation is performed under the computational constraint |Sn| ≤ Γ ≤ L and the communication constraint |Tn| ≤ ∆ ≤ K, which correspo… view at source ↗
read the original abstract

This paper studies multi-user distributed computation over shared real-valued subfunctions under computation and communication constraints. We consider a \emph{General Multi-User Distributed Computing (GMUDC)} model in which different users request heterogeneous target functions represented in the reproducing-kernel Hilbert space of a shift-invariant kernel, thereby covering generic nonlinear target mappings beyond linearly separable tasks. Unlike tessellated distributed computing frameworks that rely on disjoint-support topologies in their native setting, the GMUDC model allows arbitrary task-assignment and connectivity topologies subject to per-server computation and communication budgets~$\Gamma$ and~$\Delta$. We analyze two complementary regimes. In the \emph{quenched} regime, the assignment and communication topology are fixed, and we derive upper and lower bounds on the resulting reconstruction risk that separate a spectral approximation term from a topology-dependent coverage term. In the \emph{annealed} regime, the assignment and links are drawn uniformly at random from a prescribed ensemble, and we characterize the corresponding average-risk scaling together with a topology-dependent coverage threshold. These results provide a topology-aware characterization of the computation--communication--accuracy trade-off for approximate multi-user distributed computation. They identify fundamental limits, up to constants and logarithmic factors, under the model assumptions adopted in the paper. In the shared linear/isotropic comparison regime, the framework also recovers the relevant tessellated distributed computing benchmark, while the broader GMUDC formulation applies to generic nonlinear target functions in kernel RKHSs and accommodates a wider range of task-assignment and communication topologies than disjoint-support constructions alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper introduces a General Multi-User Distributed Computing (GMUDC) model for heterogeneous nonlinear target functions represented in the RKHS of a shift-invariant kernel, subject to per-server computation budget Γ and communication budget Δ. It derives upper and lower bounds on reconstruction risk in the quenched regime that separate a spectral approximation term from a topology-dependent coverage term, and characterizes average-risk scaling together with a coverage threshold in the annealed regime where assignments and links are drawn uniformly from a prescribed ensemble. The framework recovers the relevant tessellated benchmark in the linear/isotropic specialization while extending to generic nonlinear targets and arbitrary topologies.

Significance. If the derivations hold, the work supplies a topology-aware learning-theoretic characterization of computation-communication-accuracy trade-offs for approximate multi-user distributed computation of nonlinear functions. The explicit separation of spectral and coverage terms, the recovery of prior linear benchmarks up to logarithmic factors, and the internal consistency of the RKHS-based construction under the stated assumptions constitute clear strengths. These results could inform fundamental limits for distributed systems handling complex, non-linear tasks beyond disjoint-support or linearly separable settings.

minor comments (2)
  1. Abstract: the phrase 'up to constants and logarithmic factors' is used for the fundamental limits; an explicit statement of the dependence on kernel parameters or ambient dimension would improve precision without altering the central claims.
  2. §4 (quenched-regime bounds): the additive decomposition of the reconstruction risk into spectral and coverage terms is invoked repeatedly; a short reminder of the precise RKHS inner-product identity used for the separation would aid readers who skip the preliminary sections.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and significance assessment of our work on the General Multi-User Distributed Computing (GMUDC) framework. The referee accurately captures the key elements, including the RKHS representation of heterogeneous nonlinear targets, the separation of spectral approximation and topology-dependent coverage terms in the quenched regime, the average-risk scaling and coverage threshold in the annealed regime, and the recovery of linear/isotropic benchmarks. We appreciate the recognition that these results extend beyond disjoint-support settings and could inform fundamental limits for distributed nonlinear computation. Given the minor_revision recommendation and the absence of any specific major comments, we will conduct a careful pass over the manuscript to improve presentation and clarity where appropriate.

Circularity Check

0 steps flagged

Derivation is self-contained from RKHS structure and topology assumptions

full rationale

The quenched-regime bounds separate spectral approximation from topology-dependent coverage via the RKHS inner-product structure and additive risk decomposition under fixed per-server budgets Γ and Δ. The annealed-regime scaling and coverage threshold follow from the uniform measure over the assignment ensemble and shift-invariance of the kernel, without any reduction to fitted parameters or self-defined quantities. The linear/isotropic specialization recovers the tessellated benchmark as an internal consistency check rather than a load-bearing premise. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the derivation chain; the results remain independent of the target claims under the stated model assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on the assumption that target functions lie in an RKHS of a shift-invariant kernel and that computation/communication budgets permit additive separation of approximation and coverage error terms.

axioms (2)
  • domain assumption Target functions admit representation in the RKHS of a shift-invariant kernel
    Stated in abstract as the setting that covers generic nonlinear mappings.
  • domain assumption Per-server computation budget Γ and communication budget Δ constrain the model
    Used to define feasible assignments and links in both regimes.

pith-pipeline@v0.9.0 · 5581 in / 1283 out tokens · 36291 ms · 2026-05-17T05:10:02.490985+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages

  1. [1]

    A survey on distributed machine learning,

    J. Verbraeken, M. Wolting, J. Katzy, J. Kloppenburg, T. Verbelen, and J. S. Rellermeyer, “A survey on distributed machine learning,”ACM Computing Surveys (CSUR), vol. 53, no. 2, pp. 1–33, 2020. 33

  2. [2]

    Private retrieval, computing and learning: Recent progress and future challenges,

    S. Ulukus, S. Avestimehr, M. Gastpar, S. Jafar, R. Tandon, and C. Tian, “Private retrieval, computing and learning: Recent progress and future challenges,”IEEE Journal on Selected Areas in Communications, 2022

  3. [3]

    Fundamental limits of coded linear transform,

    S. Wang, J. Liu, N. Shroff, and P. Yang, “Fundamental limits of coded linear transform,” inProceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, K. Chaudhuri and M. Sugiyama, Eds., vol. 89. PMLR, 16–18 Apr 2018, pp. 577–585. [Online]. Available: https://proceedin...

  4. [4]

    A fundamental tradeoff between computation and communication in distributed computing,

    S. Li, M. A. Maddah-Ali, Q. Yu, and A. S. Avestimehr, “A fundamental tradeoff between computation and communication in distributed computing,”IEEE Transactions on Information Theory, vol. 64, no. 1, pp. 109–128, 2017

  5. [5]

    Polynomial codes: an optimal design for high-dimensional coded matrix multiplication,

    Q. Yu, M. Maddah-Ali, and S. Avestimehr, “Polynomial codes: an optimal design for high-dimensional coded matrix multiplication,”Advances in Neural Information Processing Systems, vol. 30, 2017

  6. [6]

    Codedreduce: A fast and robust framework for gradient aggregation in distributed learning,

    A. Reisizadeh, S. Prakash, R. Pedarsani, and A. S. Avestimehr, “Codedreduce: A fast and robust framework for gradient aggregation in distributed learning,”IEEE/ACM Transactions on Networking, 2021

  7. [7]

    Distributed learning in wireless networks: Recent progress and future challenges,

    M. Chen, D. Gündüz, K. Huang, W. Saad, M. Bennis, A. V. Feljan, and H. V. Poor, “Distributed learning in wireless networks: Recent progress and future challenges,”IEEE Journal on Selected Areas in Communications, vol. 39, no. 12, pp. 3579–3605, 2021

  8. [8]

    Tessellated distributed computing,

    A. Khalesi and P. Elia, “Tessellated distributed computing,”IEEE Transactions on Information Theory, vol. 71, no. 6, pp. 4754–4784, 2025

  9. [9]

    A distributed computationally aware quantizer design via hyper binning,

    D. Malak and M. Médard, “A distributed computationally aware quantizer design via hyper binning,”IEEE Transactions on Signal Processing, vol. 71, pp. 76–91, 2023

  10. [10]

    Sketching as a tool for numerical linear algebra,

    D. P. Woodruffet al., “Sketching as a tool for numerical linear algebra,”Foundations and Trends®in Theoretical Computer Science, vol. 10, no. 1–2, pp. 1–157, 2014

  11. [11]

    Codedsketch: A coding scheme for distributed computation of approximated matrix multiplication,

    T. Jahani-Nezhad and M. A. Maddah-Ali, “Codedsketch: A coding scheme for distributed computation of approximated matrix multiplication,”IEEE Transactions on Information Theory, vol. 67, no. 6, pp. 4185–4196, 2021

  12. [12]

    Random sampling for distributed coded matrix multiplication,

    W.-T. Chang and R. Tandon, “Random sampling for distributed coded matrix multiplication,” inICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 8187–8191

  13. [13]

    Approximate weightedCR coded matrix multiplication,

    N. Charalambides, M. Pilanci, and A. O. Hero, “Approximate weightedCR coded matrix multiplication,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 5095–5099

  14. [14]

    Oversketch: Approximate matrix multiplication for the cloud,

    V. Gupta, S. Wang, T. Courtade, and K. Ramchandran, “Oversketch: Approximate matrix multiplication for the cloud,” in2018 IEEE International Conference on Big Data (Big Data), 2018, pp. 298–304

  15. [15]

    Anytime coding for distributed computation,

    N. S. Ferdinand and S. C. Draper, “Anytime coding for distributed computation,” in2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2016, pp. 954–960

  16. [16]

    A sequential approximation framework for coded distributed optimization,

    J. Zhu, Y. Pu, V. Gupta, C. Tomlin, and K. Ramchandran, “A sequential approximation framework for coded distributed optimization,” in2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2017, pp. 1240–1247

  17. [17]

    Berrut approximated coded computing: Straggler resistance beyond polynomial computing,

    T. Jahani-Nezhad and M. A. Maddah-Ali, “Berrut approximated coded computing: Straggler resistance beyond polynomial computing,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 111–122, 2023

  18. [18]

    ϵ-approximate coded matrix multiplication is 34 nearly twice as efficient as exact multiplication,

    H. Jeong, A. Devulapalli, V. R. Cadambe, and F. P. Calmon, “ϵ-approximate coded matrix multiplication is 34 nearly twice as efficient as exact multiplication,”IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 3, pp. 845–854, 2021

  19. [19]

    Sparse randomKhatri-Rao product codes for distributed matrix multiplication,

    R. Ji, A. Heidarzadeh, and K. R. Narayanan, “Sparse randomKhatri-Rao product codes for distributed matrix multiplication,” in2022 IEEE Information Theory Workshop (ITW), 2022, pp. 416–421

  20. [20]

    Compression-informed coded computing,

    M. Rudow, N. Charalambides, A. O. Hero, and K. Rashmi, “Compression-informed coded computing,” in2023 IEEE International Symposium on Information Theory (ISIT), 2023, pp. 2177–2182

  21. [21]

    A learning-based approach to approximate coded computation,

    N. Agrawal, Y. Qiu, M. Frey, I. Bjelakovic, S. Maghsudi, S. Stanczak, and J. Zhu, “A learning-based approach to approximate coded computation,” in2022 IEEE Information Theory Workshop (ITW), 2022, pp. 600–605

  22. [22]

    Optimal communication-computation trade-off in heterogeneous gradient coding,

    T. Jahani-Nezhad and M. A. Maddah-Ali, “Optimal communication-computation trade-off in heterogeneous gradient coding,”IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 3, pp. 1002–1011, 2021

  23. [23]

    Codedsketch: Coded distributed computation of approximated matrix multiplication,

    ——, “Codedsketch: Coded distributed computation of approximated matrix multiplication,” in2019 IEEE International Symposium on Information Theory (ISIT), 2019, pp. 2489–2493

  24. [24]

    Hybrid-order distributed sgd: Balancing communication overhead, computational complexity, and convergence rate for distributed learning,

    N. Omidvar, S. M. Hosseini, and M. A. Maddah-Ali, “Hybrid-order distributed sgd: Balancing communication overhead, computational complexity, and convergence rate for distributed learning,”Neurocomputing, vol. 599, p. 128020, 2024

  25. [25]

    Coded computing for resilient distributed computing: A learning- theoretic framework,

    P. Moradi, B. Tahmasebi, and M. Maddah-Ali, “Coded computing for resilient distributed computing: A learning- theoretic framework,” inAdvances in Neural Information Processing Systems (NeurIPS), 2024

  26. [26]

    General coded computing in a probabilistic straggler regime,

    P. Moradi and M. Maddah-Ali, “General coded computing in a probabilistic straggler regime,”arXiv preprint arXiv:2502.00645, 2025

  27. [27]

    General coded computing: Adversarial settings,

    P. Moradi, H. Akbarinodehi, and M. Maddah-Ali, “General coded computing: Adversarial settings,”arXiv preprint arXiv:2502.08058, 2025

  28. [28]

    Distributed linearly separable computation,

    K. Wan, H. Sun, M. Ji, and G. Caire, “Distributed linearly separable computation,”IEEE Transactions on Information Theory, 2021

  29. [29]

    Perfect multi-user distributed computing,

    A. Khalesi and P. Elia, “Perfect multi-user distributed computing,” inIEEE International Symposium on Information Theory (ISIT), Athens, Greece, 2024, pp. 1349–1354

  30. [30]

    Multi-user linearly-separable distributed computing,

    ——, “Multi-user linearly-separable distributed computing,” 2022. [Online]. Available: https://arxiv.org/abs/ 2206.11119

  31. [31]

    Universally decodable matrices for distributed matrix-vector multiplication,

    A. Ramamoorthy, L. Tang, and P. O. Vontobel, “Universally decodable matrices for distributed matrix-vector multiplication,” in2019 IEEE International Symposium on Information Theory (ISIT), 2019, pp. 1777–1781

  32. [32]

    Communication-computation efficient gradient coding,

    M. Ye and E. Abbe, “Communication-computation efficient gradient coding,” inInternational Conference on Machine Learning. PMLR, 2018, pp. 5610–5619

  33. [33]

    Gradient coding from cyclicMDS codes and expander graphs,

    N. Raviv, I. Tamo, R. Tandon, and A. G. Dimakis, “Gradient coding from cyclicMDS codes and expander graphs,”IEEE Transactions on Information Theory, vol. 66, no. 12, pp. 7475–7489, 2020

  34. [34]

    Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding,

    Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding,”IEEE Transactions on Information Theory, vol. 66, no. 3, pp. 1920–1933, 2020

  35. [35]

    Advances and open problems in federated learning,

    P. Kairouz, H. B. McMahan, B. Aventet al., “Advances and open problems in federated learning,”Foundations and Trends in Machine Learning, vol. 14, no. 1–2, pp. 1–210, 2021

  36. [36]

    Communication-efficient distributed kernel regression with adaptive compression,

    J. Park, Y. Y. Kim, T. Kim, and S.-H. Choi, “Communication-efficient distributed kernel regression with adaptive compression,”IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 11, pp. 4751–4763, 2021. 35

  37. [37]

    Distributed ridge regression: optimal rates and communication trade-offs,

    J. D. Lee, Q. Liu, and Y. Sun, “Distributed ridge regression: optimal rates and communication trade-offs,”Annals of Statistics, vol. 49, no. 1, pp. 77–102, 2021

  38. [38]

    Consensus and cooperation in networked multi-agent systems,

    R. Olfati-Saber, J. A. Fax, and R. M. Murray, “Consensus and cooperation in networked multi-agent systems,” Proceedings of the IEEE, vol. 95, no. 1, pp. 215–233, 2007

  39. [39]

    Diffusion strategies for distributed kalman filtering and smoothing,

    F. S. Cattivelli and A. H. Sayed, “Diffusion strategies for distributed kalman filtering and smoothing,”IEEE Transactions on Automatic Control, vol. 55, no. 9, pp. 2069–2084, 2010

  40. [40]

    Coordination of uavs via coupled dynamic models,

    R. W. Beard, T. W. McLain, M. A. Goodrich, and E. P. Anderson, “Coordination of uavs via coupled dynamic models,”Proceedings of the 2002 American Control Conference (ACC), pp. 2558–2563, 2002

  41. [41]

    Distributed aerodynamic field estimation for multiple uavs using consensus-based filters,

    Q. Sun, Z. Lin, and H. Wang, “Distributed aerodynamic field estimation for multiple uavs using consensus-based filters,”Aerospace Science and Technology, vol. 87, pp. 567–577, 2019

  42. [42]

    Decentralized cooperative estimation for spacecraft formation flying with communication delays,

    Z. Li, Q. Sun, and Y. Zhang, “Decentralized cooperative estimation for spacecraft formation flying with communication delays,”Acta Astronautica, vol. 175, pp. 220–232, 2020

  43. [43]

    Rademacher and gaussian complexities: Risk bounds and structural results,

    P. L. Bartlett and S. Mendelson, “Rademacher and gaussian complexities: Risk bounds and structural results,” Journal of Machine Learning Research, vol. 3, pp. 463–482, 2002

  44. [44]

    Schölkopf and A

    B. Schölkopf and A. J. Smola,Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA: MIT Press, 2002, chapter 2, especially Sections 2.3–2.4 on RKHS and the reproducing property

  45. [45]

    Berlinet and C

    A. Berlinet and C. Thomas-Agnan,Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, 2004, see Chapter 2 on Mercer kernels and integral operators

  46. [46]

    Generalization properties of learning with random features,

    A. Rudi and L. Rosasco, “Generalization properties of learning with random features,”Advances in Neural Information Processing Systems, vol. 30, 2017. [Online]. Available: https://proceedings.neurips.cc/paper/2017/ hash/f7228da70fb28752fe384c1dd17c0c08-Abstract.html

  47. [47]

    Wendland,Scattered Data Approximation

    H. Wendland,Scattered Data Approximation. Cambridge: Cambridge University Press, 2005, see chapterPositive definite and conditionally positive definite functions(Bochner & Schoenberg)

  48. [48]

    Mohri, A

    M. Mohri, A. Rostamizadeh, and A. Talwalkar,Foundations of Machine Learning, 2nd ed. Cambridge, MA: MIT Press, 2018, see chapter:Kernel Methods— subsection onpositive definite kernels and Gram matrices; explicitly notes Gram matrices are PSD, which applies toG=E[Φ(X)Φ(X)⊤]

  49. [49]

    Hastie, R

    T. Hastie, R. Tibshirani, and J. Friedman,The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York: Springer, 2009, see chapters:Linear Methods for Regression(ridge; hat matrix) andModel Assessment and Selection(effective degrees of freedomtr(Sλ))

  50. [50]

    Schölkopf and A

    B. Schölkopf and A. J. Smola,Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA: MIT Press, 2002

  51. [51]

    Pinkus,n-Widths in Approximation Theory, ser

    A. Pinkus,n-Widths in Approximation Theory, ser. Ergebnisse der Mathematik und ihrer Grenzgebiete. Springer, 1985, vol. 93, see Chapter 1, especially sections 1.2–1.4 and Theorem 1.3.2

  52. [52]

    Carl and I

    B. Carl and I. Stephani,Entropy, Compactness and the Approximation of Operators, ser. Cambridge Tracts in Mathematics. Cambridge University Press, 1990, vol. 98, see Chapter 2, Sections 2.3, especially Corollary 2.3.2

  53. [53]

    Novak and H

    E. Novak and H. Woźniakowski,Tractability of Multivariate Problems. Volume I: Linear Information, ser. EMS Tracts in Mathematics. European Mathematical Society, 2008, vol. 6, see Chapter 4, Theorem 4.4

  54. [54]

    variance

    R. Bhatia,Matrix Analysis, ser. Graduate Texts in Mathematics. New York: Springer, 1997, vol. 169. 36 Appendix A Proof of Lemma 1 Proof. Let u =w(x)and v =w(x′), and fix the maskS⊆[L]with |S|= Γand γ= Γ/L. Write ˜ω=ω⊙1S and ϕu ≜ϕ(x;ω,b,S) = √ 2 γcos ( ⟨˜ω,u⟩+b ) , ϕv ≜ϕ(x′;ω,b,S) = √ 2 γcos ( ⟨˜ω,v⟩+b ) . Set the single-atom kernel estimatorZ≜ϕuϕv. We ass...