Federated Martingale Posterior Samping

Boning Zhang; Dongzhu Liu; Matteo Zecchin; Mingzhao Guo; Osvaldo Simeone

arxiv: 2605.18554 · v1 · pith:ERR7MHNJnew · submitted 2026-05-18 · 💻 cs.LG · stat.ML

Federated Martingale Posterior Samping

Boning Zhang , Matteo Zecchin , Mingzhao Guo , Dongzhu Liu , Osvaldo Simeone This is my paper

Pith reviewed 2026-05-20 13:06 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords federated learningmartingale posteriorbayesian neural networkspredictive samplingdata embeddingscalibration

0 comments

The pith

Clients upload small trainable data embeddings so a server can centrally recover full parameter uncertainty for federated Bayesian neural networks using martingale posteriors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces federated martingale posterior sampling to address the difficulty of specifying priors for overparameterized models in federated Bayesian neural networks. By replacing the prior-likelihood pair with a predictive distribution, the approach recovers parameter uncertainty through repeated predictive sampling and model refitting. In the federated version, clients send only a small set of trainable data embeddings to the server, which then runs the sampler centrally in a one-shot protocol. This avoids sharing full datasets while experiments on standard image datasets demonstrate performance close to centralized training and better calibration than consensus baselines.

Core claim

The central discovery is that a one-shot embarrassingly parallel protocol for federated martingale posterior sampling, relying on clients uploading small sets of trainable data embeddings, allows the server to perform predictive sampling centrally and recover uncertainty estimates that closely match those from centralized counterparts.

What carries the argument

Trainable data embeddings uploaded by clients, which enable the central server to simulate the effect of having access to full local datasets for the martingale posterior sampling process.

If this is right

The method provides a practical way to perform Bayesian inference in federated settings without data sharing.
It leads to improved predictive calibration compared to standard federated averaging or consensus methods.
It bypasses the need for eliciting meaningful priors on high-dimensional parameter spaces.
The approach is suitable for modern overparameterized models like those used in image classification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This could be extended to other domains where data privacy is critical, such as healthcare, by using embeddings to preserve privacy.
The size of the uploaded embeddings might be optimized further to balance communication cost and accuracy.
If the embeddings are learned jointly, it might allow for better adaptation to the predictive sampling procedure.

Load-bearing premise

A small set of trainable data embeddings uploaded by clients contains sufficient information for the central server to recover parameter uncertainty equivalent to running the predictive sampler on the full local datasets.

What would settle it

Running the predictive sampler on full local datasets versus using the uploaded embeddings and observing a large discrepancy in the resulting parameter uncertainty distributions would falsify the approach.

Figures

Figures reproduced from arXiv: 2605.18554 by Boning Zhang, Dongzhu Liu, Matteo Zecchin, Mingzhao Guo, Osvaldo Simeone.

**Figure 2.** Figure 2: Accuracy (ACC) and ECE under heterogeneous client partitions [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

Federated Bayesian neural networks require fixing a prior on the model parameters together with a likelihood. Eliciting meaningful priors on the weight space of modern overparameterized models is notoriously difficult, and misspecification of either component can severely degrade accuracy and calibration. Motivated by the rapid progress of predictive models such as large language models, the martingale posterior, also known as predictive Bayes, replaces the prior--likelihood pair with a predictive distribution and recovers parameter uncertainty by repeatedly drawing predictive samples and refitting the model. A direct federated implementation, however, would require clients to share the local data sets. This letter proposes {federated martingale posterior} (FMP) sampling, a one-shot embarrassingly parallel protocol in which each client uploads a small set of trainable data embeddings and the server runs the predictive sampler centrally. Experiments on MNIST, CIFAR-10, and CIFAR-100 show that FMP closely matches the centralized counterpart and significantly improves calibration over consensus-style baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a clean one-shot federated protocol for martingale posterior sampling that uses client embeddings instead of raw data, with experiments showing it tracks centralized performance and improves calibration.

read the letter

The main point is a practical adaptation of martingale posterior sampling to federated settings. Clients send small trainable embeddings rather than their full datasets, and the server runs the predictive draws and refits centrally in a single round. This sidesteps both prior specification and data sharing, which are real obstacles for Bayesian neural nets in distributed environments. The embarrassingly parallel design keeps communication minimal, which is a clear practical advantage over methods that would require multiple rounds or full data movement. Experiments on MNIST, CIFAR-10, and CIFAR-100 report that the federated version matches the centralized martingale posterior closely while beating consensus-style baselines on calibration. Those results are concrete and directly relevant to privacy-preserving applications. The soft spot is the embedding step itself. The central claim requires that these compressed representations let the server recover parameter uncertainty equivalent to using the complete local data. The reported matching performance is encouraging, but the work provides no explicit bounds on approximation error, no systematic ablations on embedding dimension or training objective, and limited analysis of what distributional features might be lost. If the embeddings under-represent tail behavior or dependencies that matter for the repeated refits, the uncertainty estimates could still be off even when average accuracy looks fine. This paper is aimed at researchers working on federated Bayesian methods who want to avoid prior elicitation. A reader already familiar with predictive Bayes ideas will see the extension clearly and can judge the empirical claims directly. It deserves a serious referee because the protocol is novel in this setting, the experiments are on standard benchmarks, and the practical motivation is sound, even though the embedding justification would benefit from tighter support in revision.

Referee Report

2 major / 2 minor

Summary. The paper proposes Federated Martingale Posterior (FMP) sampling, a one-shot embarrassingly parallel protocol for federated Bayesian neural networks. Clients upload small sets of trainable data embeddings rather than raw local datasets; the server then centrally runs the martingale posterior sampler (repeated predictive draws followed by refits) to recover parameter uncertainty. Experiments on MNIST, CIFAR-10, and CIFAR-100 are reported to show that FMP closely matches the performance of the centralized martingale posterior while improving calibration relative to consensus-style federated baselines.

Significance. If the central assumption holds, the work would offer a practical route to well-calibrated uncertainty estimates in federated settings without requiring clients to share raw data or the server to elicit a prior on high-dimensional weights. The one-shot, embarrassingly parallel design and the reported calibration gains on standard image benchmarks constitute the main potential contribution.

major comments (2)

[FMP protocol description] The load-bearing claim that a small set of trainable data embeddings uploaded by each client suffices for the server to recover parameter uncertainty equivalent to running the predictive sampler on the full local datasets receives no supporting analysis. No approximation-error bound, information-loss characterization, or description of embedding dimensionality and training objective relative to the original data distribution is provided (see the FMP protocol description).
[Experiments] The experimental section reports that FMP matches centralized performance and improves calibration on MNIST, CIFAR-10, and CIFAR-100, yet contains no ablations on embedding size, number of embeddings per client, or the embedding training objective. Without these controls it is impossible to determine whether the reported calibration advantage is robust or an artifact of particular hyper-parameter choices.

minor comments (2)

[Method] Clarify the precise training objective used to learn the client embeddings and how it relates to the predictive distribution employed by the martingale posterior.
[Method] Add a short discussion of how the one-shot protocol interacts with the repeated refitting steps of the martingale posterior sampler.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions we intend to make.

read point-by-point responses

Referee: [FMP protocol description] The load-bearing claim that a small set of trainable data embeddings uploaded by each client suffices for the server to recover parameter uncertainty equivalent to running the predictive sampler on the full local datasets receives no supporting analysis. No approximation-error bound, information-loss characterization, or description of embedding dimensionality and training objective relative to the original data distribution is provided (see the FMP protocol description).

Authors: We acknowledge that the current manuscript provides no formal approximation-error bound or information-loss analysis for the embedding approximation. The protocol relies on the empirical observation that a modest number of trainable embeddings, optimized to match local predictive statistics, enable the server-side martingale sampler to recover uncertainty comparable to the centralized case. In revision we will expand the protocol description to specify embedding dimensionality, the exact training objective (a predictive matching loss), and its relation to the local data distribution. We will also add a short discussion of the empirical justification and the limitations of the approach. A complete theoretical characterization lies outside the scope of this short letter. revision: partial
Referee: [Experiments] The experimental section reports that FMP matches centralized performance and improves calibration on MNIST, CIFAR-10, and CIFAR-100, yet contains no ablations on embedding size, number of embeddings per client, or the embedding training objective. Without these controls it is impossible to determine whether the reported calibration advantage is robust or an artifact of particular hyper-parameter choices.

Authors: We agree that systematic ablations would strengthen the experimental claims. In the revised version we will add results varying the number of embeddings per client and embedding dimensionality on MNIST (and, space permitting, on CIFAR-10). These controls will be placed in the main text or an appendix. We will also clarify the embedding training objective in the methods section so that readers can assess sensitivity to these choices. revision: yes

standing simulated objections not resolved

A rigorous approximation-error bound or information-loss characterization for the data-embedding approximation used in the FMP protocol.

Circularity Check

0 steps flagged

No significant circularity; new federated protocol remains distinct from inputs

full rationale

The paper introduces a one-shot federated protocol in which clients upload trainable data embeddings and the server centrally runs the martingale posterior sampler. The abstract and provided text describe this as a direct adaptation of the existing predictive Bayes framework to avoid sharing full local datasets, with performance equivalence demonstrated via experiments on MNIST, CIFAR-10, and CIFAR-100. No equations, self-citations, or definitional reductions are present that would make the central claim equivalent to its inputs by construction. The load-bearing assumption about embedding sufficiency is presented as an empirical claim rather than a fitted or renamed quantity, leaving the derivation chain self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the method rests on the martingale posterior as a replacement for traditional prior-likelihood Bayesian inference and assumes embeddings suffice to substitute for full client datasets in central sampling.

axioms (1)

domain assumption Martingale posterior recovers parameter uncertainty via repeated predictive sampling and refitting.
This underpins the entire approach as described in the abstract.

pith-pipeline@v0.9.0 · 5708 in / 1266 out tokens · 49123 ms · 2026-05-20T13:06:49.686900+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 2 internal anchors

[1]

C. P. Robert and G. Casella,Monte Carlo Statistical Methods. Springer, 1999, vol. 2. 5 [Online]. Available: https://doi.org/10.1007/978-1 -4757-3071-5

work page doi:10.1007/978-1 1999
[2]

‘Edge Exchangeable Models for In- teraction Networks’

D. M. Blei, A. Kucukelbir, and J. D. McAuliffe, “Variational inference: A review for statisticians,” J. Amer. Statist. Assoc., vol. 112, no. 518, pp. 859–877, 2017. [Online]. Available: https: //doi.org/10.1080/01621459.2017.1285773

work page doi:10.1080/01621459.2017.1285773 2017
[3]

Bayesian deep learning via expectation maximization and turbo deep approximate message passing,

W. Xu, A. Liu, Y . Zhang, and V . Lau, “Bayesian deep learning via expectation maximization and turbo deep approximate message passing,”IEEE Trans. Signal Process., vol. 72, pp. 3865–3878,

work page
[4]

Available: https://doi.org/10.1109/ TSP.2024.3442858

[Online]. Available: https://doi.org/10.1109/ TSP.2024.3442858

work page arXiv 2024
[5]

Simeone,Machine Learning for Engineers

O. Simeone,Machine Learning for Engineers. Cambridge University Press, 2022. [Online]. Available: https://www.cambridge.org/highered ucation/books/machine-learning-for-engineers/7 FD8622836CAFCF5EDB169E7DC8A1ED4

work page 2022
[6]

A General Framework for Updating Belief Distributions

P. G. Bissiri, C. C. Holmes, and S. G. Walker, “A general framework for updating belief distributions,”J. Roy. Statist. Soc. Ser. B, vol. 78, no. 5, pp. 1103–1130, 2016. [Online]. Available: https://doi.org/10.1111/rssb.12158

work page doi:10.1111/rssb.12158 2016
[7]

An optimization-centric view on bayes’ rule: Reviewing and generalizing variational inference,

J. Knoblauch, J. Jewson, and T. Damoulas, “An optimization-centric view on bayes’ rule: Reviewing and generalizing variational inference,” J. Mach. Learn. Res., vol. 23, no. 132, pp. 1–109,

work page
[8]

Available: https://jmlr.org/papers/ v23/19-1047.html

[Online]. Available: https://jmlr.org/papers/ v23/19-1047.html

work page
[9]

Robust pac m: Training ensemble models under misspecification and outliers,

M. Zecchin, S. Park, O. Simeone, M. Kountouris, and D. Gesbert, “Robust pac m: Training ensemble models under misspecification and outliers,”IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 11, pp. 16 518–16 532, 2023. [Online]. Available: https://doi.org/10.1109/TNNLS.2023.3295168

work page doi:10.1109/tnnls.2023.3295168 2023
[10]

Functional variational Bayesian neural networks,

S. Sun, G. Zhang, J. Shi, and R. Grosse, “Functional variational Bayesian neural networks,” inProc. Int. Conf. Learn. Represent. (ICLR), 2019. [Online]. Available: https://openreview.net/forum?i d=rkxacs0qY7

work page 2019
[11]

On the Opportunities and Risks of Foundation Models

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernsteinet al., “On the opportunities and risks of foundation models,” arXiv:2108.07258, 2021. [Online]. Available: https: //arxiv.org/abs/2108.07258

work page internal anchor Pith review Pith/arXiv arXiv 2021
[12]

Martingale posterior distributions,

E. Fong, C. Holmes, and S. G. Walker, “Martingale posterior distributions,”J. Roy. Statist. Soc. Ser. B, vol. 85, no. 5, pp. 1357–1391, 2023. [Online]. Available: https://doi.org/10.1093/jrsssb/qkad005

work page doi:10.1093/jrsssb/qkad005 2023
[13]

Battiston and L

M. Battiston and L. Cappello, “Bayesian predictive inference beyond martingales,”arXiv:2507.21874,

work page arXiv
[14]

Available: https://arxiv.org/abs/25 07.21874

[Online]. Available: https://arxiv.org/abs/25 07.21874

work page
[15]

Federated generalized bayesian learning via distributed stein variational gradient descent,

R. Kassab and O. Simeone, “Federated generalized bayesian learning via distributed stein variational gradient descent,”IEEE Trans. Signal Process., vol. 70, pp. 2180–2192, 2022. [Online]. Available: https://doi.org/10.1109/TSP.2022.3168490

work page doi:10.1109/tsp.2022.3168490 2022
[16]

Bayes and big data: The consensus monte carlo algorithm,

S. L. Scott, A. W. Blocker, F. V . Bonassi, H. A. Chipman, E. I. George, and R. E. McCulloch, “Bayes and big data: The consensus monte carlo algorithm,”Int. J. Manag. Sci. Eng. Manag., vol. 11, no. 2, pp. 78–88, 2016. [Online]. Available: https://doi.org/10.1080/17509653.2016.1142191

work page doi:10.1080/17509653.2016.1142191 2016
[17]

Federated inference with reliable uncertainty quantification over wireless channels via conformal prediction,

M. Zhu, M. Zecchin, S. Park, C. Guo, C. Feng, and O. Simeone, “Federated inference with reliable uncertainty quantification over wireless channels via conformal prediction,”IEEE Trans. Signal Process., vol. 72, pp. 1235–1250, 2024. [Online]. Available: https://doi.org/10.1109/TSP.2024.33586 15

work page doi:10.1109/tsp.2024.33586 2024
[18]

Set transformer: A framework for attention-based permutation-invariant neural networks,

J. Lee, Y . Lee, J. Kim, A. Kosiorek, S. Choi, and Y . W. Teh, “Set transformer: A framework for attention-based permutation-invariant neural networks,” inProc. Int. Conf. Mach. Learn. (ICML), 2019, pp. 3744–3753. [Online]. Available: https://proceedings.mlr.press/v97/lee19d.html

work page 2019
[19]

Martingale posterior neural processes,

H. Lee, E. Yun, G. Nam, E. Fong, and J. Lee, “Martingale posterior neural processes,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2023. [Online]. Available: https://openreview.net/forum?i d=-9PVqZ-IR

work page 2023
[20]

Parallelized stochastic gradient descent,

M. Zinkevich, M. Weimer, L. Li, and A. Smola, “Parallelized stochastic gradient descent,”Adv. Neural Inf. Process. Syst., vol. 23, 2010. [Online]. Available: https://papers.nips.cc/paper/4006-paralle lized-stochastic-gradient-descent

work page 2010
[21]

Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification

T.-M. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non-identical data distribution for federated visual classification,”arXiv:1909.06335,

work page internal anchor Pith review Pith/arXiv arXiv 1909
[22]

Available: https://arxiv.org/abs/19 09.06335

[Online]. Available: https://arxiv.org/abs/19 09.06335

work page
[23]

Decaf: A deep convolutional activation feature for generic visual recognition,

J. Donahue, Y . Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, “Decaf: A deep convolutional activation feature for generic visual recognition,” inProc. Int. Conf. Mach. Learn. (ICML), 2014, pp. 647–655. [Online]. Available: https://proceedings.mlr.press/v32/donahue14.html

work page 2014
[24]

On calibration of modern neural networks,

C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” in Proc. Int. Conf. Mach. Learn. (ICML), 2017, pp. 1321–1330. [Online]. Available: https://proceeding s.mlr.press/v70/guo17a.html

work page 2017

[1] [1]

C. P. Robert and G. Casella,Monte Carlo Statistical Methods. Springer, 1999, vol. 2. 5 [Online]. Available: https://doi.org/10.1007/978-1 -4757-3071-5

work page doi:10.1007/978-1 1999

[2] [2]

‘Edge Exchangeable Models for In- teraction Networks’

D. M. Blei, A. Kucukelbir, and J. D. McAuliffe, “Variational inference: A review for statisticians,” J. Amer. Statist. Assoc., vol. 112, no. 518, pp. 859–877, 2017. [Online]. Available: https: //doi.org/10.1080/01621459.2017.1285773

work page doi:10.1080/01621459.2017.1285773 2017

[3] [3]

Bayesian deep learning via expectation maximization and turbo deep approximate message passing,

W. Xu, A. Liu, Y . Zhang, and V . Lau, “Bayesian deep learning via expectation maximization and turbo deep approximate message passing,”IEEE Trans. Signal Process., vol. 72, pp. 3865–3878,

work page

[4] [4]

Available: https://doi.org/10.1109/ TSP.2024.3442858

[Online]. Available: https://doi.org/10.1109/ TSP.2024.3442858

work page arXiv 2024

[5] [5]

Simeone,Machine Learning for Engineers

O. Simeone,Machine Learning for Engineers. Cambridge University Press, 2022. [Online]. Available: https://www.cambridge.org/highered ucation/books/machine-learning-for-engineers/7 FD8622836CAFCF5EDB169E7DC8A1ED4

work page 2022

[6] [6]

A General Framework for Updating Belief Distributions

P. G. Bissiri, C. C. Holmes, and S. G. Walker, “A general framework for updating belief distributions,”J. Roy. Statist. Soc. Ser. B, vol. 78, no. 5, pp. 1103–1130, 2016. [Online]. Available: https://doi.org/10.1111/rssb.12158

work page doi:10.1111/rssb.12158 2016

[7] [7]

An optimization-centric view on bayes’ rule: Reviewing and generalizing variational inference,

J. Knoblauch, J. Jewson, and T. Damoulas, “An optimization-centric view on bayes’ rule: Reviewing and generalizing variational inference,” J. Mach. Learn. Res., vol. 23, no. 132, pp. 1–109,

work page

[8] [8]

Available: https://jmlr.org/papers/ v23/19-1047.html

[Online]. Available: https://jmlr.org/papers/ v23/19-1047.html

work page

[9] [9]

Robust pac m: Training ensemble models under misspecification and outliers,

M. Zecchin, S. Park, O. Simeone, M. Kountouris, and D. Gesbert, “Robust pac m: Training ensemble models under misspecification and outliers,”IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 11, pp. 16 518–16 532, 2023. [Online]. Available: https://doi.org/10.1109/TNNLS.2023.3295168

work page doi:10.1109/tnnls.2023.3295168 2023

[10] [10]

Functional variational Bayesian neural networks,

S. Sun, G. Zhang, J. Shi, and R. Grosse, “Functional variational Bayesian neural networks,” inProc. Int. Conf. Learn. Represent. (ICLR), 2019. [Online]. Available: https://openreview.net/forum?i d=rkxacs0qY7

work page 2019

[11] [11]

On the Opportunities and Risks of Foundation Models

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernsteinet al., “On the opportunities and risks of foundation models,” arXiv:2108.07258, 2021. [Online]. Available: https: //arxiv.org/abs/2108.07258

work page internal anchor Pith review Pith/arXiv arXiv 2021

[12] [12]

Martingale posterior distributions,

E. Fong, C. Holmes, and S. G. Walker, “Martingale posterior distributions,”J. Roy. Statist. Soc. Ser. B, vol. 85, no. 5, pp. 1357–1391, 2023. [Online]. Available: https://doi.org/10.1093/jrsssb/qkad005

work page doi:10.1093/jrsssb/qkad005 2023

[13] [13]

Battiston and L

M. Battiston and L. Cappello, “Bayesian predictive inference beyond martingales,”arXiv:2507.21874,

work page arXiv

[14] [14]

Available: https://arxiv.org/abs/25 07.21874

[Online]. Available: https://arxiv.org/abs/25 07.21874

work page

[15] [15]

Federated generalized bayesian learning via distributed stein variational gradient descent,

R. Kassab and O. Simeone, “Federated generalized bayesian learning via distributed stein variational gradient descent,”IEEE Trans. Signal Process., vol. 70, pp. 2180–2192, 2022. [Online]. Available: https://doi.org/10.1109/TSP.2022.3168490

work page doi:10.1109/tsp.2022.3168490 2022

[16] [16]

Bayes and big data: The consensus monte carlo algorithm,

S. L. Scott, A. W. Blocker, F. V . Bonassi, H. A. Chipman, E. I. George, and R. E. McCulloch, “Bayes and big data: The consensus monte carlo algorithm,”Int. J. Manag. Sci. Eng. Manag., vol. 11, no. 2, pp. 78–88, 2016. [Online]. Available: https://doi.org/10.1080/17509653.2016.1142191

work page doi:10.1080/17509653.2016.1142191 2016

[17] [17]

Federated inference with reliable uncertainty quantification over wireless channels via conformal prediction,

M. Zhu, M. Zecchin, S. Park, C. Guo, C. Feng, and O. Simeone, “Federated inference with reliable uncertainty quantification over wireless channels via conformal prediction,”IEEE Trans. Signal Process., vol. 72, pp. 1235–1250, 2024. [Online]. Available: https://doi.org/10.1109/TSP.2024.33586 15

work page doi:10.1109/tsp.2024.33586 2024

[18] [18]

Set transformer: A framework for attention-based permutation-invariant neural networks,

J. Lee, Y . Lee, J. Kim, A. Kosiorek, S. Choi, and Y . W. Teh, “Set transformer: A framework for attention-based permutation-invariant neural networks,” inProc. Int. Conf. Mach. Learn. (ICML), 2019, pp. 3744–3753. [Online]. Available: https://proceedings.mlr.press/v97/lee19d.html

work page 2019

[19] [19]

Martingale posterior neural processes,

H. Lee, E. Yun, G. Nam, E. Fong, and J. Lee, “Martingale posterior neural processes,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2023. [Online]. Available: https://openreview.net/forum?i d=-9PVqZ-IR

work page 2023

[20] [20]

Parallelized stochastic gradient descent,

M. Zinkevich, M. Weimer, L. Li, and A. Smola, “Parallelized stochastic gradient descent,”Adv. Neural Inf. Process. Syst., vol. 23, 2010. [Online]. Available: https://papers.nips.cc/paper/4006-paralle lized-stochastic-gradient-descent

work page 2010

[21] [21]

Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification

T.-M. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non-identical data distribution for federated visual classification,”arXiv:1909.06335,

work page internal anchor Pith review Pith/arXiv arXiv 1909

[22] [22]

Available: https://arxiv.org/abs/19 09.06335

[Online]. Available: https://arxiv.org/abs/19 09.06335

work page

[23] [23]

Decaf: A deep convolutional activation feature for generic visual recognition,

J. Donahue, Y . Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, “Decaf: A deep convolutional activation feature for generic visual recognition,” inProc. Int. Conf. Mach. Learn. (ICML), 2014, pp. 647–655. [Online]. Available: https://proceedings.mlr.press/v32/donahue14.html

work page 2014

[24] [24]

On calibration of modern neural networks,

C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” in Proc. Int. Conf. Mach. Learn. (ICML), 2017, pp. 1321–1330. [Online]. Available: https://proceeding s.mlr.press/v70/guo17a.html

work page 2017