Composing Non-Conjugate Factor Graphs with Closed-Form Variational Inference

Bert de Vries; Dmitry Bagaev; \.Ismail \c{S}en\"oz; Jeff Beck; Kyrylo Yemets; Mykola Lukashchuk; Wouter M. Kouw

arxiv: 2605.29467 · v1 · pith:VWFFZZDZnew · submitted 2026-05-28 · 💻 cs.LG · cs.AI

Composing Non-Conjugate Factor Graphs with Closed-Form Variational Inference

Mykola Lukashchuk , Kyrylo Yemets , Wouter M. Kouw , Dmitry Bagaev , \.Ismail \c{S}en\"oz , Jeff Beck , Bert de Vries This is my paper

Pith reviewed 2026-06-29 09:12 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords factor graphsvariational inferenceclosed-form inferencemessage passingmixture of expertsprobabilistic modelsGaussian messagesGamma messages

0 comments

The pith

Any model composed from five factor-graph primitives admits closed-form variational message passing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that stacking probabilistic building blocks typically breaks closed-form inference but that five specific primitives can be composed while preserving it. The primitives are a bilinear factor, an exponential link, a Gamma prior, a Gaussian likelihood, and an equality node. Under mean-field factorization each preserves a small set of message families so that Gaussian and Gamma messages remain closed and the exponential link stays tractable via the Gaussian moment-generating function and Gamma sufficient statistics. This construction supports models of increasing depth including input-dependent gating and split-branch routing that encodes decision trees, and it yields a Bayesian mixture of experts with inferred gating on time-series forecasting tasks.

Core claim

Any model composed from the five primitives admits closed-form variational message passing because each primitive preserves a small set of message families under mean-field factorization: messages on Gaussian variables remain Gaussian, messages on precision variables remain Gamma, and the exponential link remains tractable through the Gaussian moment-generating function and the sufficient statistics of the Gamma family.

What carries the argument

The five factor-graph primitives (bilinear factor, exponential link, Gamma prior, Gaussian likelihood, equality node) that each preserve Gaussian and Gamma message families under mean-field factorization.

If this is right

Stacking routing layers encodes arbitrary decision trees while retaining closed-form inference.
Universal function approximation is achieved with closed-form variational message passing.
A Bayesian mixture of experts arises in which gating functions are inferred rather than learned.
Applied to ensemble time-series forecasting the approach yields calibrated uncertainty over expert selection on benchmark datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same preservation of message families might extend to other link functions whose moment-generating functions admit closed-form expectations with Gamma statistics.
Deeper compositions could be tested to confirm whether the Gaussian and Gamma families remain closed at arbitrary depth.
The framework offers a route to build deep probabilistic models that avoid sampling while still encoding complex routing.

Load-bearing premise

Under mean-field factorization the only non-conjugate interface is the exponential link and it remains tractable through the Gaussian moment-generating function together with the sufficient statistics of the Gamma family.

What would settle it

A concrete counter-example model built only from the five primitives in which at least one variational message update requires numerical integration or approximation outside the claimed Gaussian and Gamma families.

Figures

Figures reproduced from arXiv: 2605.29467 by Bert de Vries, Dmitry Bagaev, \.Ismail \c{S}en\"oz, Jeff Beck, Kyrylo Yemets, Mykola Lukashchuk, Wouter M. Kouw.

**Figure 1.** Figure 1: The building blocks as factor graph nodes. Square nodes are factors; round nodes represent neighboring nodes to which [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Depth 0 factor graph (static ensemble), shown start [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The precision word π and Depth-1 model. (a) Internal structure: a softdot and exponential link connected by latent z, computing input-dependent precision γ from w, ϕ, and τ . (b) Compact notation; double border indicates a composite word, filled semi-circle marks the τ (input precision) side. (c) Depth 1 factor graph (Precision-Gated Experts) for expert i=1, observation j=1: compared to Depth 0 ( [PITH_FU… view at source ↗

**Figure 4.** Figure 4: Depth 2 factor graph (split-branch routing) for one expert, one observation. The router softdot produces [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Two modes of message computation. (a) Under mean-field factorization constraints, the message from the softdot toward z depends only on the marginal types of its other edges: q(w) ∈ N and q(ϕ) ∈ N (solid lines), q(τ ) ∈ G (dashed line). Which factors are connected on the other side of these edges is irrelevant; the line styles act as a type system. (b) The exp link uses belief propagation (BP); as a determ… view at source ↗

**Figure 6.** Figure 6: Posterior prediction for the XOR encoding with two [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Radar charts of log-transformed MSE and NLL averaged over all horizons. Each axis corresponds to a dataset; larger [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Noisy experts factor graph, shown for expert [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: Pareto frontier relating model size to radar-chart area. Static and Noisy Diagonal offer the strongest trade-off between [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 10.** Figure 10: Comparison of forecasting with confidence interval of Static, Noisy Diagonal and MoE ensembles on electricity dataset [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

**Figure 11.** Figure 11: Comparison of forecasting with confidence interval of Static, Noisy Diagonal and MoE ensembles on exchange rate [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗

read the original abstract

Stacking probabilistic building blocks into deeper architectures typically breaks closed-form inference. We show that closed-form inference can be preserved. We identify five factor-graph primitives: a bilinear factor, an exponential link, a Gamma prior, a Gaussian likelihood, and an equality node, and prove that any model composed from them admits closed-form variational message passing. The construction works because each primitive preserves a small set of message families: under mean-field factorization, messages on Gaussian variables remain Gaussian and messages on precision variables remain Gamma, while the only non-conjugate interface, the exponential link, remains tractable through the Gaussian moment-generating function and the sufficient statistics of the Gamma family. We demonstrate composition at increasing depth, from static ensembles through input-dependent gating to split-branch routing, and show that stacking routing layers encodes arbitrary decision trees, establishing universal function approximation with closed-form inference. Applied to ensemble time-series forecasting, the framework yields a Bayesian mixture of experts in which gating functions are inferred rather than learned, providing calibrated uncertainty over expert selection across five benchmark datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives five factor-graph primitives whose local message-family preservation lets you compose deeper models with closed-form VMP, including routing layers that act as decision trees.

read the letter

The new piece is the explicit list of five primitives—bilinear factor, exponential link, Gamma prior, Gaussian likelihood, equality node—and the claim that arbitrary compositions of them keep variational messages inside Gaussian and Gamma families under mean-field. The exponential link is handled by the Gaussian MGF plus Gamma sufficient statistics, and the routing construction is shown to encode decision trees, which gives a universal-approximation result with tractable inference. The forecasting experiment turns this into a Bayesian mixture of experts where the gating is inferred rather than point-estimated, and they report results on five time-series benchmarks.

The construction itself is the useful part. If the preservation really holds under composition, it supplies a practical way to build deeper structured models without losing closed-form updates, which matters for anyone who needs calibrated uncertainty in forecasting or decision systems.

The soft spot is exactly the composition step the stress-test flags. Local preservation per primitive does not automatically guarantee that the cavity distribution seen by an exponential link stays Gaussian once a bilinear factor or equality node is upstream; the abstract does not display the lemmas that would close that gap. Without those steps visible, it is hard to tell whether the global closure follows or whether extra approximations creep in at depth.

This is for people working on variational message passing and structured probabilistic models. A reader who already uses factor graphs for time-series or mixture models will find the primitives and the routing result directly usable. It is worth sending to referees because the central claim is sharp enough to check and the application gives concrete numbers to evaluate.

Referee Report

2 major / 1 minor

Summary. The manuscript identifies five factor-graph primitives (bilinear factor, exponential link, Gamma prior, Gaussian likelihood, equality node) and claims that any model composed from them admits closed-form variational message passing. It asserts that each primitive preserves Gaussian messages on variables and Gamma messages on precisions under mean-field factorization, with the sole non-conjugate interface (exponential link) remaining tractable via the Gaussian moment-generating function and Gamma sufficient statistics. The work demonstrates compositions of increasing depth (ensembles, input-dependent gating, split-branch routing) that encode arbitrary decision trees for universal approximation, and applies the framework to a Bayesian mixture-of-experts model for ensemble time-series forecasting with inferred gating on five benchmarks.

Significance. If the central preservation result holds under arbitrary compositions, the framework would enable scalable closed-form variational inference for deep non-conjugate architectures while retaining calibrated uncertainty, a meaningful advance for Bayesian deep learning. The explicit construction of decision-tree routing with tractable messages and the forecasting application are concrete strengths.

major comments (2)

[Section on primitives and message-family preservation] The central claim requires that local family preservation composes globally. The manuscript must supply an explicit lemma or inductive argument (in the section presenting the five primitives and their message updates) showing that cavity distributions seen by the exponential link remain exactly Gaussian when the link receives messages routed through bilinear factors or equality nodes in multi-branch wirings; mean-field factorization alone does not guarantee this closure without additional propagation rules.
[Experimental evaluation on time-series forecasting] Table or figure reporting the forecasting results: the claim of 'calibrated uncertainty over expert selection' requires quantitative verification (e.g., proper scoring rules or coverage of predictive intervals) that isolates the benefit of closed-form gating inference versus learned alternatives; without these metrics the application does not yet substantiate the broader methodological contribution.

minor comments (1)

Notation for message natural parameters and the precise definition of the exponential-link update should be introduced with a single consistent table or appendix to aid readability across the composition examples.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and will incorporate the suggested changes in the revision.

read point-by-point responses

Referee: [Section on primitives and message-family preservation] The central claim requires that local family preservation composes globally. The manuscript must supply an explicit lemma or inductive argument (in the section presenting the five primitives and their message updates) showing that cavity distributions seen by the exponential link remain exactly Gaussian when the link receives messages routed through bilinear factors or equality nodes in multi-branch wirings; mean-field factorization alone does not guarantee this closure without additional propagation rules.

Authors: We agree that an explicit inductive argument would strengthen the presentation of the central claim. In the revised manuscript we will insert a new lemma in the section on the five primitives. The lemma will prove by induction on composition depth that, under mean-field factorization, cavity distributions arriving at any exponential link remain exactly Gaussian even when messages are routed through arbitrary wirings of bilinear factors and equality nodes. The base case covers the local updates already stated; the inductive step shows that the message families are closed under the additional propagation rules induced by equality nodes and bilinear factors. revision: yes
Referee: [Experimental evaluation on time-series forecasting] Table or figure reporting the forecasting results: the claim of 'calibrated uncertainty over expert selection' requires quantitative verification (e.g., proper scoring rules or coverage of predictive intervals) that isolates the benefit of closed-form gating inference versus learned alternatives; without these metrics the application does not yet substantiate the broader methodological contribution.

Authors: We acknowledge that the current experimental section would benefit from additional quantitative verification of calibration. In the revision we will expand the forecasting results to include the Continuous Ranked Probability Score (CRPS) and the empirical coverage of 95% predictive intervals. These metrics will be reported for the closed-form variational gating model and compared against learned-gating baselines on the same five benchmarks, thereby isolating the contribution of the closed-form inference procedure. revision: yes

Circularity Check

0 steps flagged

No circularity: preservation property is asserted as a theorem to be proven from the primitives

full rationale

The paper identifies five factor-graph primitives and states that it proves any composition admits closed-form variational message passing because each primitive preserves Gaussian/Gamma message families (with the exponential link handled via MGF). No equations, fitted parameters, or self-citations are shown that would make the claimed closure reduce to a quantity defined by the same inputs. The central claim is a preservation theorem under mean-field factorization rather than a renaming, fit, or self-referential definition; the derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The result rests on the standard mean-field assumption and on the algebraic closure properties of the five chosen primitives; no free parameters or new entities are introduced in the abstract.

axioms (1)

domain assumption Mean-field factorization is assumed throughout.
The preservation of Gaussian and Gamma message families is stated to hold under mean-field factorization.

pith-pipeline@v0.9.1-grok · 5743 in / 1251 out tokens · 33464 ms · 2026-06-29T09:12:25.206909+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 19 canonical work pages · 2 internal anchors

[1]

Bagaev and B

D. Bagaev and B. De Vries. Reactive Message Passing for Scalable Bayesian Inference . Scientific Programming, 2023: 0 1--26, May 2023. ISSN 1875-919X, 1058-9244. doi:10.1155/2023/6601690. URL https://www.hindawi.com/journals/sp/2023/6601690/

work page doi:10.1155/2023/6601690 2023
[2]

Bergmann

R. Bergmann. Manopt.jl: Optimization on Manifolds in Julia . Journal of Open Source Software, 7 0 (70): 0 3866, 2022. doi:10.21105/joss.03866

work page doi:10.21105/joss.03866 2022
[3]

W. G. Cochran. Problems arising in the analysis of a series of similar experiments. Supplement to the Journal of the Royal Statistical Society, 4 0 (1): 0 102--118, 1937

1937
[4]

G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2 0 (4): 0 303--314, Dec. 1989. ISSN 1435-568X. doi:10.1007/BF02551274. URL https://doi.org/10.1007/BF02551274

work page doi:10.1007/bf02551274 1989
[5]

J. Dauwels. On Variational Message Passing on Factor Graphs . In IEEE International Symposium on Information Theory , pages 2546--2550, Nice, France, June 2007. doi:10.1109/ISIT.2007.4557602. URL http://ieeexplore.ieee.org/abstract/document/4557602

work page doi:10.1109/isit.2007.4557602 2007
[6]

G. Forney. Codes on graphs: normal realizations. IEEE Transactions on Information Theory, 47 0 (2): 0 520--548, Feb. 2001. ISSN 0018-9448. doi:10.1109/18.910573. URL https://ieeexplore.ieee.org/abstract/document/910573

work page doi:10.1109/18.910573 2001
[7]

J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33: 0 6840--6851, 2020

2020
[8]

Long short-term memory

S. Hochreiter and J. Schmidhuber. Long Short - Term Memory . Neural Comput., 9 0 (8): 0 1735--1780, Nov. 1997. ISSN 0899-7667. doi:10.1162/neco.1997.9.8.1735. URL https://doi.org/10.1162/neco.1997.9.8.1735

work page doi:10.1162/neco.1997.9.8.1735 1997
[9]

K. Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4 0 (2): 0 251--257, 1991. ISSN 0893-6080. doi:https://doi.org/10.1016/0893-6080(91)90009-T. URL https://www.sciencedirect.com/science/article/pii/089360809190009T

work page doi:10.1016/0893-6080(91)90009-t 1991
[10]

R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixtures of local experts. Neural computation, 3 0 (1): 0 79--87, 1991

1991
[11]

M. E. Khan. Information Geometry of Variational Bayes . Information Geometry, 8 0 (S1): 0 275--289, Nov. 2025. ISSN 2511-2481, 2511-249X. doi:10.1007/s41884-025-00174-3. URL https://link.springer.com/10.1007/s41884-025-00174-3

work page doi:10.1007/s41884-025-00174-3 2025
[12]

M. E. Khan and H. Rue. The Bayesian learning rule. Journal of Machine Learning Research, 24 0 (281): 0 1--46, 2023

2023
[13]

D. P. Kingma and M. Welling. Auto- Encoding Variational Bayes . arXiv:1312.6114 [cs, stat], Dec. 2013. URL http://arxiv.org/abs/1312.6114. arXiv: 1312.6114

work page internal anchor Pith review Pith/arXiv arXiv 2013
[14]

F. R. Kschischang, B. J. Frey, and H.-A. Loeliger. Factor graphs and the sum-product algorithm. IEEE Transactions on information theory, 47 0 (2): 0 498--519, 2001. doi:10.1109/18.910572. URL http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=910572

work page doi:10.1109/18.910572 2001
[15]

Deep learning.Nature, 521(7553):436– 444, 2015

Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521 0 (7553): 0 436--444, May 2015. ISSN 0028-0836, 1476-4687. doi:10.1038/nature14539. URL https://www.nature.com/articles/nature14539

work page doi:10.1038/nature14539 2015
[16]

Loeliger

H.-A. Loeliger. An introduction to factor graphs. Signal Processing Magazine, IEEE, 21 0 (1): 0 28--41, Jan. 2004. doi:10.1109/MSP.2004.1267047. URL https://ieeexplore.ieee.org/document/1267047

work page doi:10.1109/msp.2004.1267047 2004
[17]

Loeliger

H.-A. Loeliger. Factor Graphs and Message Passing Algorithms -- Part 1: Introduction , 2007. URL http://www.crm.sns.it/media/course/1524/Loeliger_A.pdf

2007
[18]

Loshchilov and F

I. Loshchilov and F. Hutter. Decoupled Weight Decay Regularization . In 7th International Conference on Learning Representations , ICLR 2019, New Orleans , LA , USA , May 6-9, 2019 . OpenReview.net, 2019. URL https://openreview.net/forum?id=Bkg6RiCqY7

2019
[19]

Lukashchuk, I

M. Lukashchuk, I. Senöz, and B. de Vries. Q-conjugate message passing for efficient bayesian inference. In International conference on probabilistic graphical models, pages 295--311. PMLR, 2024

2024
[20]

Lukashchuk, D

M. Lukashchuk, D. Bagaev, A. Podusenko, \. I . S en \"o z, and B. de Vries. ExponentialFamilyManifolds .jl: Representing exponential families as Riemannian manifolds. Proceedings of the JuliaCon Conferences, 7 0 (70): 0 179, 2025. doi:10.21105/jcon.00179. URL https://doi.org/10.21105/jcon.00179

work page doi:10.21105/jcon.00179 2025
[21]

R. M. Neal. MCMC using Hamiltonian dynamics . May 2011. doi:10.1201/b10905. URL http://arxiv.org/abs/1206.1901. arXiv:1206.1901 [physics, stat]

work page doi:10.1201/b10905 2011
[22]

W. W. L. Nuijten, D. Bagaev, and B. de Vries. GraphPPL .jl: A Probabilistic Programming Language for Graphical Models . Entropy, 26 0 (11), 2024. ISSN 1099-4300. doi:10.3390/e26110890. URL https://www.mdpi.com/1099-4300/26/11/890

work page doi:10.3390/e26110890 2024
[23]

Ranganath, S

R. Ranganath, S. Gerrish, and D. Blei. Black Box Variational Inference . In S. Kaski and J. Corander, editors, Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics , volume 33 of Proceedings of Machine Learning Research , pages 814--822, Reykjavik, Iceland, Apr. 2014. PMLR. URL https://proceedings.mlr.press/v33...

2014
[24]

D. J. Rezende and S. Mohamed. Variational Inference with Normalizing Flows . arXiv:1505.05770 [cs, stat], May 2015. URL http://arxiv.org/abs/1505.05770. arXiv: 1505.05770

work page internal anchor Pith review Pith/arXiv arXiv 2015
[25]

W. Rudin. Real and complex analysis. McGraw - Hill international editions Mathematics series. McGraw-Hill, New York, NY, 3. ed., internat. ed., [nachdr.] edition, 2013. ISBN 978-0-07-100276-9 978-0-07-054234-1. OCLC: 957461070

2013
[26]

Senöz, T

I. Senöz, T. van de Laar, D. Bagaev, and B. de Vries. Variational Message Passing and Local Constraint Manipulation in Factor Graphs . Entropy, 23 0 (7): 0 807, July 2021. ISSN 1099-4300. doi:10.3390/e23070807. URL https://www.mdpi.com/1099-4300/23/7/807

work page doi:10.3390/e23070807 2021
[27]

Smola, S

A. Smola, S. Vishwanathan, and E. Eskin. Laplace propagation. In Advances in neural information processing systems, volume 16. MIT Press, 2003. URL https://proceedings.neurips.cc/paper_files/paper/2003/file/7fd804295ef7f6a2822bf4c61f9dc4a8-Paper.pdf

2003
[28]

Trindade

A. Trindade. ElectricityLoadDiagrams20112014 . UCI Machine Learning Repository, 10: 0 C58C86, 2015

2015
[29]

van de Laar, M

T. van de Laar, M. Cox, I. Senoz, I. Bocharov, and B. de Vries. ForneyLab : A Toolbox for Biologically Plausible Free Energy Minimization in Dynamic Neural Models . In Conference on Complex Systems ( CCS ) , Thessaloniki, Greece, Sept. 2018

2018
[30]

L. A. Weber, P. T. Waade, N. Legrand, A. H. Møller, K. E. Stephan, and C. Mathys. The generalized Hierarchical Gaussian Filter . Mar. 2026. doi:10.7554/elife.110174.1. URL http://dx.doi.org/10.7554/eLife.110174.1

work page doi:10.7554/elife.110174.1 2026
[31]

Winn and C

J. Winn and C. M. Bishop. Variational Message Passing . Journal of Machine Learning Research, 6 0 (23): 0 661--694, 2005. ISSN 1533-7928. URL http://jmlr.org/papers/v6/winn05a.html

2005
[32]

J. S. Yedidia, W. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51 0 (7): 0 2282--2312, July 2005. ISSN 0018-9448. doi:10.1109/TIT.2005.850085. URL http://ieeexplore.ieee.org/abstract/document/1459044

work page doi:10.1109/tit.2005.850085 2005
[33]

A. Zeng, M. Chen, L. Zhang, and Q. Xu. Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence , volume 37, pages 11121--11128, 2023

2023
[34]

H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang. Informer: Beyond Efficient Transformer for Long Sequence Time - Series Forecasting . In The Thirty - Fifth AAAI Conference on Artificial Intelligence , AAAI 2021, Virtual Conference , volume 35, pages 11106--11115. AAAI Press, 2021

2021

[1] [1]

Bagaev and B

D. Bagaev and B. De Vries. Reactive Message Passing for Scalable Bayesian Inference . Scientific Programming, 2023: 0 1--26, May 2023. ISSN 1875-919X, 1058-9244. doi:10.1155/2023/6601690. URL https://www.hindawi.com/journals/sp/2023/6601690/

work page doi:10.1155/2023/6601690 2023

[2] [2]

Bergmann

R. Bergmann. Manopt.jl: Optimization on Manifolds in Julia . Journal of Open Source Software, 7 0 (70): 0 3866, 2022. doi:10.21105/joss.03866

work page doi:10.21105/joss.03866 2022

[3] [3]

W. G. Cochran. Problems arising in the analysis of a series of similar experiments. Supplement to the Journal of the Royal Statistical Society, 4 0 (1): 0 102--118, 1937

1937

[4] [4]

G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2 0 (4): 0 303--314, Dec. 1989. ISSN 1435-568X. doi:10.1007/BF02551274. URL https://doi.org/10.1007/BF02551274

work page doi:10.1007/bf02551274 1989

[5] [5]

J. Dauwels. On Variational Message Passing on Factor Graphs . In IEEE International Symposium on Information Theory , pages 2546--2550, Nice, France, June 2007. doi:10.1109/ISIT.2007.4557602. URL http://ieeexplore.ieee.org/abstract/document/4557602

work page doi:10.1109/isit.2007.4557602 2007

[6] [6]

G. Forney. Codes on graphs: normal realizations. IEEE Transactions on Information Theory, 47 0 (2): 0 520--548, Feb. 2001. ISSN 0018-9448. doi:10.1109/18.910573. URL https://ieeexplore.ieee.org/abstract/document/910573

work page doi:10.1109/18.910573 2001

[7] [7]

J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33: 0 6840--6851, 2020

2020

[8] [8]

Long short-term memory

S. Hochreiter and J. Schmidhuber. Long Short - Term Memory . Neural Comput., 9 0 (8): 0 1735--1780, Nov. 1997. ISSN 0899-7667. doi:10.1162/neco.1997.9.8.1735. URL https://doi.org/10.1162/neco.1997.9.8.1735

work page doi:10.1162/neco.1997.9.8.1735 1997

[9] [9]

K. Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4 0 (2): 0 251--257, 1991. ISSN 0893-6080. doi:https://doi.org/10.1016/0893-6080(91)90009-T. URL https://www.sciencedirect.com/science/article/pii/089360809190009T

work page doi:10.1016/0893-6080(91)90009-t 1991

[10] [10]

R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixtures of local experts. Neural computation, 3 0 (1): 0 79--87, 1991

1991

[11] [11]

M. E. Khan. Information Geometry of Variational Bayes . Information Geometry, 8 0 (S1): 0 275--289, Nov. 2025. ISSN 2511-2481, 2511-249X. doi:10.1007/s41884-025-00174-3. URL https://link.springer.com/10.1007/s41884-025-00174-3

work page doi:10.1007/s41884-025-00174-3 2025

[12] [12]

M. E. Khan and H. Rue. The Bayesian learning rule. Journal of Machine Learning Research, 24 0 (281): 0 1--46, 2023

2023

[13] [13]

D. P. Kingma and M. Welling. Auto- Encoding Variational Bayes . arXiv:1312.6114 [cs, stat], Dec. 2013. URL http://arxiv.org/abs/1312.6114. arXiv: 1312.6114

work page internal anchor Pith review Pith/arXiv arXiv 2013

[14] [14]

F. R. Kschischang, B. J. Frey, and H.-A. Loeliger. Factor graphs and the sum-product algorithm. IEEE Transactions on information theory, 47 0 (2): 0 498--519, 2001. doi:10.1109/18.910572. URL http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=910572

work page doi:10.1109/18.910572 2001

[15] [15]

Deep learning.Nature, 521(7553):436– 444, 2015

Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521 0 (7553): 0 436--444, May 2015. ISSN 0028-0836, 1476-4687. doi:10.1038/nature14539. URL https://www.nature.com/articles/nature14539

work page doi:10.1038/nature14539 2015

[16] [16]

Loeliger

H.-A. Loeliger. An introduction to factor graphs. Signal Processing Magazine, IEEE, 21 0 (1): 0 28--41, Jan. 2004. doi:10.1109/MSP.2004.1267047. URL https://ieeexplore.ieee.org/document/1267047

work page doi:10.1109/msp.2004.1267047 2004

[17] [17]

Loeliger

H.-A. Loeliger. Factor Graphs and Message Passing Algorithms -- Part 1: Introduction , 2007. URL http://www.crm.sns.it/media/course/1524/Loeliger_A.pdf

2007

[18] [18]

Loshchilov and F

I. Loshchilov and F. Hutter. Decoupled Weight Decay Regularization . In 7th International Conference on Learning Representations , ICLR 2019, New Orleans , LA , USA , May 6-9, 2019 . OpenReview.net, 2019. URL https://openreview.net/forum?id=Bkg6RiCqY7

2019

[19] [19]

Lukashchuk, I

M. Lukashchuk, I. Senöz, and B. de Vries. Q-conjugate message passing for efficient bayesian inference. In International conference on probabilistic graphical models, pages 295--311. PMLR, 2024

2024

[20] [20]

Lukashchuk, D

M. Lukashchuk, D. Bagaev, A. Podusenko, \. I . S en \"o z, and B. de Vries. ExponentialFamilyManifolds .jl: Representing exponential families as Riemannian manifolds. Proceedings of the JuliaCon Conferences, 7 0 (70): 0 179, 2025. doi:10.21105/jcon.00179. URL https://doi.org/10.21105/jcon.00179

work page doi:10.21105/jcon.00179 2025

[21] [21]

R. M. Neal. MCMC using Hamiltonian dynamics . May 2011. doi:10.1201/b10905. URL http://arxiv.org/abs/1206.1901. arXiv:1206.1901 [physics, stat]

work page doi:10.1201/b10905 2011

[22] [22]

W. W. L. Nuijten, D. Bagaev, and B. de Vries. GraphPPL .jl: A Probabilistic Programming Language for Graphical Models . Entropy, 26 0 (11), 2024. ISSN 1099-4300. doi:10.3390/e26110890. URL https://www.mdpi.com/1099-4300/26/11/890

work page doi:10.3390/e26110890 2024

[23] [23]

Ranganath, S

R. Ranganath, S. Gerrish, and D. Blei. Black Box Variational Inference . In S. Kaski and J. Corander, editors, Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics , volume 33 of Proceedings of Machine Learning Research , pages 814--822, Reykjavik, Iceland, Apr. 2014. PMLR. URL https://proceedings.mlr.press/v33...

2014

[24] [24]

D. J. Rezende and S. Mohamed. Variational Inference with Normalizing Flows . arXiv:1505.05770 [cs, stat], May 2015. URL http://arxiv.org/abs/1505.05770. arXiv: 1505.05770

work page internal anchor Pith review Pith/arXiv arXiv 2015

[25] [25]

W. Rudin. Real and complex analysis. McGraw - Hill international editions Mathematics series. McGraw-Hill, New York, NY, 3. ed., internat. ed., [nachdr.] edition, 2013. ISBN 978-0-07-100276-9 978-0-07-054234-1. OCLC: 957461070

2013

[26] [26]

Senöz, T

I. Senöz, T. van de Laar, D. Bagaev, and B. de Vries. Variational Message Passing and Local Constraint Manipulation in Factor Graphs . Entropy, 23 0 (7): 0 807, July 2021. ISSN 1099-4300. doi:10.3390/e23070807. URL https://www.mdpi.com/1099-4300/23/7/807

work page doi:10.3390/e23070807 2021

[27] [27]

Smola, S

A. Smola, S. Vishwanathan, and E. Eskin. Laplace propagation. In Advances in neural information processing systems, volume 16. MIT Press, 2003. URL https://proceedings.neurips.cc/paper_files/paper/2003/file/7fd804295ef7f6a2822bf4c61f9dc4a8-Paper.pdf

2003

[28] [28]

Trindade

A. Trindade. ElectricityLoadDiagrams20112014 . UCI Machine Learning Repository, 10: 0 C58C86, 2015

2015

[29] [29]

van de Laar, M

T. van de Laar, M. Cox, I. Senoz, I. Bocharov, and B. de Vries. ForneyLab : A Toolbox for Biologically Plausible Free Energy Minimization in Dynamic Neural Models . In Conference on Complex Systems ( CCS ) , Thessaloniki, Greece, Sept. 2018

2018

[30] [30]

L. A. Weber, P. T. Waade, N. Legrand, A. H. Møller, K. E. Stephan, and C. Mathys. The generalized Hierarchical Gaussian Filter . Mar. 2026. doi:10.7554/elife.110174.1. URL http://dx.doi.org/10.7554/eLife.110174.1

work page doi:10.7554/elife.110174.1 2026

[31] [31]

Winn and C

J. Winn and C. M. Bishop. Variational Message Passing . Journal of Machine Learning Research, 6 0 (23): 0 661--694, 2005. ISSN 1533-7928. URL http://jmlr.org/papers/v6/winn05a.html

2005

[32] [32]

J. S. Yedidia, W. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51 0 (7): 0 2282--2312, July 2005. ISSN 0018-9448. doi:10.1109/TIT.2005.850085. URL http://ieeexplore.ieee.org/abstract/document/1459044

work page doi:10.1109/tit.2005.850085 2005

[33] [33]

A. Zeng, M. Chen, L. Zhang, and Q. Xu. Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence , volume 37, pages 11121--11128, 2023

2023

[34] [34]

H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang. Informer: Beyond Efficient Transformer for Long Sequence Time - Series Forecasting . In The Thirty - Fifth AAAI Conference on Artificial Intelligence , AAAI 2021, Virtual Conference , volume 35, pages 11106--11115. AAAI Press, 2021

2021