pith. sign in

arxiv: 2605.18554 · v1 · pith:ERR7MHNJnew · submitted 2026-05-18 · 💻 cs.LG · stat.ML

Federated Martingale Posterior Samping

Pith reviewed 2026-05-20 13:06 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords federated learningmartingale posteriorbayesian neural networkspredictive samplingdata embeddingscalibration
0
0 comments X

The pith

Clients upload small trainable data embeddings so a server can centrally recover full parameter uncertainty for federated Bayesian neural networks using martingale posteriors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces federated martingale posterior sampling to address the difficulty of specifying priors for overparameterized models in federated Bayesian neural networks. By replacing the prior-likelihood pair with a predictive distribution, the approach recovers parameter uncertainty through repeated predictive sampling and model refitting. In the federated version, clients send only a small set of trainable data embeddings to the server, which then runs the sampler centrally in a one-shot protocol. This avoids sharing full datasets while experiments on standard image datasets demonstrate performance close to centralized training and better calibration than consensus baselines.

Core claim

The central discovery is that a one-shot embarrassingly parallel protocol for federated martingale posterior sampling, relying on clients uploading small sets of trainable data embeddings, allows the server to perform predictive sampling centrally and recover uncertainty estimates that closely match those from centralized counterparts.

What carries the argument

Trainable data embeddings uploaded by clients, which enable the central server to simulate the effect of having access to full local datasets for the martingale posterior sampling process.

If this is right

  • The method provides a practical way to perform Bayesian inference in federated settings without data sharing.
  • It leads to improved predictive calibration compared to standard federated averaging or consensus methods.
  • It bypasses the need for eliciting meaningful priors on high-dimensional parameter spaces.
  • The approach is suitable for modern overparameterized models like those used in image classification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This could be extended to other domains where data privacy is critical, such as healthcare, by using embeddings to preserve privacy.
  • The size of the uploaded embeddings might be optimized further to balance communication cost and accuracy.
  • If the embeddings are learned jointly, it might allow for better adaptation to the predictive sampling procedure.

Load-bearing premise

A small set of trainable data embeddings uploaded by clients contains sufficient information for the central server to recover parameter uncertainty equivalent to running the predictive sampler on the full local datasets.

What would settle it

Running the predictive sampler on full local datasets versus using the uploaded embeddings and observing a large discrepancy in the resulting parameter uncertainty distributions would falsify the approach.

Figures

Figures reproduced from arXiv: 2605.18554 by Boning Zhang, Dongzhu Liu, Matteo Zecchin, Mingzhao Guo, Osvaldo Simeone.

Figure 1
Figure 1. Figure 1: In the proposed FMP protocol, each client compresses its e [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Accuracy (ACC) and ECE under heterogeneous client partitions [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
read the original abstract

Federated Bayesian neural networks require fixing a prior on the model parameters together with a likelihood. Eliciting meaningful priors on the weight space of modern overparameterized models is notoriously difficult, and misspecification of either component can severely degrade accuracy and calibration. Motivated by the rapid progress of predictive models such as large language models, the martingale posterior, also known as predictive Bayes, replaces the prior--likelihood pair with a predictive distribution and recovers parameter uncertainty by repeatedly drawing predictive samples and refitting the model. A direct federated implementation, however, would require clients to share the local data sets. This letter proposes {federated martingale posterior} (FMP) sampling, a one-shot embarrassingly parallel protocol in which each client uploads a small set of trainable data embeddings and the server runs the predictive sampler centrally. Experiments on MNIST, CIFAR-10, and CIFAR-100 show that FMP closely matches the centralized counterpart and significantly improves calibration over consensus-style baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Federated Martingale Posterior (FMP) sampling, a one-shot embarrassingly parallel protocol for federated Bayesian neural networks. Clients upload small sets of trainable data embeddings rather than raw local datasets; the server then centrally runs the martingale posterior sampler (repeated predictive draws followed by refits) to recover parameter uncertainty. Experiments on MNIST, CIFAR-10, and CIFAR-100 are reported to show that FMP closely matches the performance of the centralized martingale posterior while improving calibration relative to consensus-style federated baselines.

Significance. If the central assumption holds, the work would offer a practical route to well-calibrated uncertainty estimates in federated settings without requiring clients to share raw data or the server to elicit a prior on high-dimensional weights. The one-shot, embarrassingly parallel design and the reported calibration gains on standard image benchmarks constitute the main potential contribution.

major comments (2)
  1. [FMP protocol description] The load-bearing claim that a small set of trainable data embeddings uploaded by each client suffices for the server to recover parameter uncertainty equivalent to running the predictive sampler on the full local datasets receives no supporting analysis. No approximation-error bound, information-loss characterization, or description of embedding dimensionality and training objective relative to the original data distribution is provided (see the FMP protocol description).
  2. [Experiments] The experimental section reports that FMP matches centralized performance and improves calibration on MNIST, CIFAR-10, and CIFAR-100, yet contains no ablations on embedding size, number of embeddings per client, or the embedding training objective. Without these controls it is impossible to determine whether the reported calibration advantage is robust or an artifact of particular hyper-parameter choices.
minor comments (2)
  1. [Method] Clarify the precise training objective used to learn the client embeddings and how it relates to the predictive distribution employed by the martingale posterior.
  2. [Method] Add a short discussion of how the one-shot protocol interacts with the repeated refitting steps of the martingale posterior sampler.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions we intend to make.

read point-by-point responses
  1. Referee: [FMP protocol description] The load-bearing claim that a small set of trainable data embeddings uploaded by each client suffices for the server to recover parameter uncertainty equivalent to running the predictive sampler on the full local datasets receives no supporting analysis. No approximation-error bound, information-loss characterization, or description of embedding dimensionality and training objective relative to the original data distribution is provided (see the FMP protocol description).

    Authors: We acknowledge that the current manuscript provides no formal approximation-error bound or information-loss analysis for the embedding approximation. The protocol relies on the empirical observation that a modest number of trainable embeddings, optimized to match local predictive statistics, enable the server-side martingale sampler to recover uncertainty comparable to the centralized case. In revision we will expand the protocol description to specify embedding dimensionality, the exact training objective (a predictive matching loss), and its relation to the local data distribution. We will also add a short discussion of the empirical justification and the limitations of the approach. A complete theoretical characterization lies outside the scope of this short letter. revision: partial

  2. Referee: [Experiments] The experimental section reports that FMP matches centralized performance and improves calibration on MNIST, CIFAR-10, and CIFAR-100, yet contains no ablations on embedding size, number of embeddings per client, or the embedding training objective. Without these controls it is impossible to determine whether the reported calibration advantage is robust or an artifact of particular hyper-parameter choices.

    Authors: We agree that systematic ablations would strengthen the experimental claims. In the revised version we will add results varying the number of embeddings per client and embedding dimensionality on MNIST (and, space permitting, on CIFAR-10). These controls will be placed in the main text or an appendix. We will also clarify the embedding training objective in the methods section so that readers can assess sensitivity to these choices. revision: yes

standing simulated objections not resolved
  • A rigorous approximation-error bound or information-loss characterization for the data-embedding approximation used in the FMP protocol.

Circularity Check

0 steps flagged

No significant circularity; new federated protocol remains distinct from inputs

full rationale

The paper introduces a one-shot federated protocol in which clients upload trainable data embeddings and the server centrally runs the martingale posterior sampler. The abstract and provided text describe this as a direct adaptation of the existing predictive Bayes framework to avoid sharing full local datasets, with performance equivalence demonstrated via experiments on MNIST, CIFAR-10, and CIFAR-100. No equations, self-citations, or definitional reductions are present that would make the central claim equivalent to its inputs by construction. The load-bearing assumption about embedding sufficiency is presented as an empirical claim rather than a fitted or renamed quantity, leaving the derivation chain self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the method rests on the martingale posterior as a replacement for traditional prior-likelihood Bayesian inference and assumes embeddings suffice to substitute for full client datasets in central sampling.

axioms (1)
  • domain assumption Martingale posterior recovers parameter uncertainty via repeated predictive sampling and refitting.
    This underpins the entire approach as described in the abstract.

pith-pipeline@v0.9.0 · 5708 in / 1266 out tokens · 49123 ms · 2026-05-20T13:06:49.686900+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 2 internal anchors

  1. [1]

    C. P. Robert and G. Casella,Monte Carlo Statistical Methods. Springer, 1999, vol. 2. 5 [Online]. Available: https://doi.org/10.1007/978-1 -4757-3071-5

  2. [2]

    ‘Edge Exchangeable Models for In- teraction Networks’

    D. M. Blei, A. Kucukelbir, and J. D. McAuliffe, “Variational inference: A review for statisticians,” J. Amer. Statist. Assoc., vol. 112, no. 518, pp. 859–877, 2017. [Online]. Available: https: //doi.org/10.1080/01621459.2017.1285773

  3. [3]

    Bayesian deep learning via expectation maximization and turbo deep approximate message passing,

    W. Xu, A. Liu, Y . Zhang, and V . Lau, “Bayesian deep learning via expectation maximization and turbo deep approximate message passing,”IEEE Trans. Signal Process., vol. 72, pp. 3865–3878,

  4. [4]

    Available: https://doi.org/10.1109/ TSP.2024.3442858

    [Online]. Available: https://doi.org/10.1109/ TSP.2024.3442858

  5. [5]

    Simeone,Machine Learning for Engineers

    O. Simeone,Machine Learning for Engineers. Cambridge University Press, 2022. [Online]. Available: https://www.cambridge.org/highered ucation/books/machine-learning-for-engineers/7 FD8622836CAFCF5EDB169E7DC8A1ED4

  6. [6]

    A General Framework for Updating Belief Distributions

    P. G. Bissiri, C. C. Holmes, and S. G. Walker, “A general framework for updating belief distributions,”J. Roy. Statist. Soc. Ser. B, vol. 78, no. 5, pp. 1103–1130, 2016. [Online]. Available: https://doi.org/10.1111/rssb.12158

  7. [7]

    An optimization-centric view on bayes’ rule: Reviewing and generalizing variational inference,

    J. Knoblauch, J. Jewson, and T. Damoulas, “An optimization-centric view on bayes’ rule: Reviewing and generalizing variational inference,” J. Mach. Learn. Res., vol. 23, no. 132, pp. 1–109,

  8. [8]

    Available: https://jmlr.org/papers/ v23/19-1047.html

    [Online]. Available: https://jmlr.org/papers/ v23/19-1047.html

  9. [9]

    Robust pac m: Training ensemble models under misspecification and outliers,

    M. Zecchin, S. Park, O. Simeone, M. Kountouris, and D. Gesbert, “Robust pac m: Training ensemble models under misspecification and outliers,”IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 11, pp. 16 518–16 532, 2023. [Online]. Available: https://doi.org/10.1109/TNNLS.2023.3295168

  10. [10]

    Functional variational Bayesian neural networks,

    S. Sun, G. Zhang, J. Shi, and R. Grosse, “Functional variational Bayesian neural networks,” inProc. Int. Conf. Learn. Represent. (ICLR), 2019. [Online]. Available: https://openreview.net/forum?i d=rkxacs0qY7

  11. [11]

    On the Opportunities and Risks of Foundation Models

    R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernsteinet al., “On the opportunities and risks of foundation models,” arXiv:2108.07258, 2021. [Online]. Available: https: //arxiv.org/abs/2108.07258

  12. [12]

    Martingale posterior distributions,

    E. Fong, C. Holmes, and S. G. Walker, “Martingale posterior distributions,”J. Roy. Statist. Soc. Ser. B, vol. 85, no. 5, pp. 1357–1391, 2023. [Online]. Available: https://doi.org/10.1093/jrsssb/qkad005

  13. [13]

    Battiston and L

    M. Battiston and L. Cappello, “Bayesian predictive inference beyond martingales,”arXiv:2507.21874,

  14. [14]

    Available: https://arxiv.org/abs/25 07.21874

    [Online]. Available: https://arxiv.org/abs/25 07.21874

  15. [15]

    Federated generalized bayesian learning via distributed stein variational gradient descent,

    R. Kassab and O. Simeone, “Federated generalized bayesian learning via distributed stein variational gradient descent,”IEEE Trans. Signal Process., vol. 70, pp. 2180–2192, 2022. [Online]. Available: https://doi.org/10.1109/TSP.2022.3168490

  16. [16]

    Bayes and big data: The consensus monte carlo algorithm,

    S. L. Scott, A. W. Blocker, F. V . Bonassi, H. A. Chipman, E. I. George, and R. E. McCulloch, “Bayes and big data: The consensus monte carlo algorithm,”Int. J. Manag. Sci. Eng. Manag., vol. 11, no. 2, pp. 78–88, 2016. [Online]. Available: https://doi.org/10.1080/17509653.2016.1142191

  17. [17]

    Federated inference with reliable uncertainty quantification over wireless channels via conformal prediction,

    M. Zhu, M. Zecchin, S. Park, C. Guo, C. Feng, and O. Simeone, “Federated inference with reliable uncertainty quantification over wireless channels via conformal prediction,”IEEE Trans. Signal Process., vol. 72, pp. 1235–1250, 2024. [Online]. Available: https://doi.org/10.1109/TSP.2024.33586 15

  18. [18]

    Set transformer: A framework for attention-based permutation-invariant neural networks,

    J. Lee, Y . Lee, J. Kim, A. Kosiorek, S. Choi, and Y . W. Teh, “Set transformer: A framework for attention-based permutation-invariant neural networks,” inProc. Int. Conf. Mach. Learn. (ICML), 2019, pp. 3744–3753. [Online]. Available: https://proceedings.mlr.press/v97/lee19d.html

  19. [19]

    Martingale posterior neural processes,

    H. Lee, E. Yun, G. Nam, E. Fong, and J. Lee, “Martingale posterior neural processes,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2023. [Online]. Available: https://openreview.net/forum?i d=-9PVqZ-IR

  20. [20]

    Parallelized stochastic gradient descent,

    M. Zinkevich, M. Weimer, L. Li, and A. Smola, “Parallelized stochastic gradient descent,”Adv. Neural Inf. Process. Syst., vol. 23, 2010. [Online]. Available: https://papers.nips.cc/paper/4006-paralle lized-stochastic-gradient-descent

  21. [21]

    Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification

    T.-M. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non-identical data distribution for federated visual classification,”arXiv:1909.06335,

  22. [22]

    Available: https://arxiv.org/abs/19 09.06335

    [Online]. Available: https://arxiv.org/abs/19 09.06335

  23. [23]

    Decaf: A deep convolutional activation feature for generic visual recognition,

    J. Donahue, Y . Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, “Decaf: A deep convolutional activation feature for generic visual recognition,” inProc. Int. Conf. Mach. Learn. (ICML), 2014, pp. 647–655. [Online]. Available: https://proceedings.mlr.press/v32/donahue14.html

  24. [24]

    On calibration of modern neural networks,

    C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” in Proc. Int. Conf. Mach. Learn. (ICML), 2017, pp. 1321–1330. [Online]. Available: https://proceeding s.mlr.press/v70/guo17a.html