Composing Non-Conjugate Factor Graphs with Closed-Form Variational Inference
Pith reviewed 2026-06-29 09:12 UTC · model grok-4.3
The pith
Any model composed from five factor-graph primitives admits closed-form variational message passing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Any model composed from the five primitives admits closed-form variational message passing because each primitive preserves a small set of message families under mean-field factorization: messages on Gaussian variables remain Gaussian, messages on precision variables remain Gamma, and the exponential link remains tractable through the Gaussian moment-generating function and the sufficient statistics of the Gamma family.
What carries the argument
The five factor-graph primitives (bilinear factor, exponential link, Gamma prior, Gaussian likelihood, equality node) that each preserve Gaussian and Gamma message families under mean-field factorization.
If this is right
- Stacking routing layers encodes arbitrary decision trees while retaining closed-form inference.
- Universal function approximation is achieved with closed-form variational message passing.
- A Bayesian mixture of experts arises in which gating functions are inferred rather than learned.
- Applied to ensemble time-series forecasting the approach yields calibrated uncertainty over expert selection on benchmark datasets.
Where Pith is reading between the lines
- The same preservation of message families might extend to other link functions whose moment-generating functions admit closed-form expectations with Gamma statistics.
- Deeper compositions could be tested to confirm whether the Gaussian and Gamma families remain closed at arbitrary depth.
- The framework offers a route to build deep probabilistic models that avoid sampling while still encoding complex routing.
Load-bearing premise
Under mean-field factorization the only non-conjugate interface is the exponential link and it remains tractable through the Gaussian moment-generating function together with the sufficient statistics of the Gamma family.
What would settle it
A concrete counter-example model built only from the five primitives in which at least one variational message update requires numerical integration or approximation outside the claimed Gaussian and Gamma families.
Figures
read the original abstract
Stacking probabilistic building blocks into deeper architectures typically breaks closed-form inference. We show that closed-form inference can be preserved. We identify five factor-graph primitives: a bilinear factor, an exponential link, a Gamma prior, a Gaussian likelihood, and an equality node, and prove that any model composed from them admits closed-form variational message passing. The construction works because each primitive preserves a small set of message families: under mean-field factorization, messages on Gaussian variables remain Gaussian and messages on precision variables remain Gamma, while the only non-conjugate interface, the exponential link, remains tractable through the Gaussian moment-generating function and the sufficient statistics of the Gamma family. We demonstrate composition at increasing depth, from static ensembles through input-dependent gating to split-branch routing, and show that stacking routing layers encodes arbitrary decision trees, establishing universal function approximation with closed-form inference. Applied to ensemble time-series forecasting, the framework yields a Bayesian mixture of experts in which gating functions are inferred rather than learned, providing calibrated uncertainty over expert selection across five benchmark datasets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript identifies five factor-graph primitives (bilinear factor, exponential link, Gamma prior, Gaussian likelihood, equality node) and claims that any model composed from them admits closed-form variational message passing. It asserts that each primitive preserves Gaussian messages on variables and Gamma messages on precisions under mean-field factorization, with the sole non-conjugate interface (exponential link) remaining tractable via the Gaussian moment-generating function and Gamma sufficient statistics. The work demonstrates compositions of increasing depth (ensembles, input-dependent gating, split-branch routing) that encode arbitrary decision trees for universal approximation, and applies the framework to a Bayesian mixture-of-experts model for ensemble time-series forecasting with inferred gating on five benchmarks.
Significance. If the central preservation result holds under arbitrary compositions, the framework would enable scalable closed-form variational inference for deep non-conjugate architectures while retaining calibrated uncertainty, a meaningful advance for Bayesian deep learning. The explicit construction of decision-tree routing with tractable messages and the forecasting application are concrete strengths.
major comments (2)
- [Section on primitives and message-family preservation] The central claim requires that local family preservation composes globally. The manuscript must supply an explicit lemma or inductive argument (in the section presenting the five primitives and their message updates) showing that cavity distributions seen by the exponential link remain exactly Gaussian when the link receives messages routed through bilinear factors or equality nodes in multi-branch wirings; mean-field factorization alone does not guarantee this closure without additional propagation rules.
- [Experimental evaluation on time-series forecasting] Table or figure reporting the forecasting results: the claim of 'calibrated uncertainty over expert selection' requires quantitative verification (e.g., proper scoring rules or coverage of predictive intervals) that isolates the benefit of closed-form gating inference versus learned alternatives; without these metrics the application does not yet substantiate the broader methodological contribution.
minor comments (1)
- Notation for message natural parameters and the precise definition of the exponential-link update should be introduced with a single consistent table or appendix to aid readability across the composition examples.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major point below and will incorporate the suggested changes in the revision.
read point-by-point responses
-
Referee: [Section on primitives and message-family preservation] The central claim requires that local family preservation composes globally. The manuscript must supply an explicit lemma or inductive argument (in the section presenting the five primitives and their message updates) showing that cavity distributions seen by the exponential link remain exactly Gaussian when the link receives messages routed through bilinear factors or equality nodes in multi-branch wirings; mean-field factorization alone does not guarantee this closure without additional propagation rules.
Authors: We agree that an explicit inductive argument would strengthen the presentation of the central claim. In the revised manuscript we will insert a new lemma in the section on the five primitives. The lemma will prove by induction on composition depth that, under mean-field factorization, cavity distributions arriving at any exponential link remain exactly Gaussian even when messages are routed through arbitrary wirings of bilinear factors and equality nodes. The base case covers the local updates already stated; the inductive step shows that the message families are closed under the additional propagation rules induced by equality nodes and bilinear factors. revision: yes
-
Referee: [Experimental evaluation on time-series forecasting] Table or figure reporting the forecasting results: the claim of 'calibrated uncertainty over expert selection' requires quantitative verification (e.g., proper scoring rules or coverage of predictive intervals) that isolates the benefit of closed-form gating inference versus learned alternatives; without these metrics the application does not yet substantiate the broader methodological contribution.
Authors: We acknowledge that the current experimental section would benefit from additional quantitative verification of calibration. In the revision we will expand the forecasting results to include the Continuous Ranked Probability Score (CRPS) and the empirical coverage of 95% predictive intervals. These metrics will be reported for the closed-form variational gating model and compared against learned-gating baselines on the same five benchmarks, thereby isolating the contribution of the closed-form inference procedure. revision: yes
Circularity Check
No circularity: preservation property is asserted as a theorem to be proven from the primitives
full rationale
The paper identifies five factor-graph primitives and states that it proves any composition admits closed-form variational message passing because each primitive preserves Gaussian/Gamma message families (with the exponential link handled via MGF). No equations, fitted parameters, or self-citations are shown that would make the claimed closure reduce to a quantity defined by the same inputs. The central claim is a preservation theorem under mean-field factorization rather than a renaming, fit, or self-referential definition; the derivation is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Mean-field factorization is assumed throughout.
Reference graph
Works this paper leans on
-
[1]
D. Bagaev and B. De Vries. Reactive Message Passing for Scalable Bayesian Inference . Scientific Programming, 2023: 0 1--26, May 2023. ISSN 1875-919X, 1058-9244. doi:10.1155/2023/6601690. URL https://www.hindawi.com/journals/sp/2023/6601690/
-
[2]
R. Bergmann. Manopt.jl: Optimization on Manifolds in Julia . Journal of Open Source Software, 7 0 (70): 0 3866, 2022. doi:10.21105/joss.03866
-
[3]
W. G. Cochran. Problems arising in the analysis of a series of similar experiments. Supplement to the Journal of the Royal Statistical Society, 4 0 (1): 0 102--118, 1937
1937
-
[4]
G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2 0 (4): 0 303--314, Dec. 1989. ISSN 1435-568X. doi:10.1007/BF02551274. URL https://doi.org/10.1007/BF02551274
-
[5]
J. Dauwels. On Variational Message Passing on Factor Graphs . In IEEE International Symposium on Information Theory , pages 2546--2550, Nice, France, June 2007. doi:10.1109/ISIT.2007.4557602. URL http://ieeexplore.ieee.org/abstract/document/4557602
-
[6]
G. Forney. Codes on graphs: normal realizations. IEEE Transactions on Information Theory, 47 0 (2): 0 520--548, Feb. 2001. ISSN 0018-9448. doi:10.1109/18.910573. URL https://ieeexplore.ieee.org/abstract/document/910573
-
[7]
J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33: 0 6840--6851, 2020
2020
-
[8]
S. Hochreiter and J. Schmidhuber. Long Short - Term Memory . Neural Comput., 9 0 (8): 0 1735--1780, Nov. 1997. ISSN 0899-7667. doi:10.1162/neco.1997.9.8.1735. URL https://doi.org/10.1162/neco.1997.9.8.1735
-
[9]
K. Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4 0 (2): 0 251--257, 1991. ISSN 0893-6080. doi:https://doi.org/10.1016/0893-6080(91)90009-T. URL https://www.sciencedirect.com/science/article/pii/089360809190009T
-
[10]
R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixtures of local experts. Neural computation, 3 0 (1): 0 79--87, 1991
1991
-
[11]
M. E. Khan. Information Geometry of Variational Bayes . Information Geometry, 8 0 (S1): 0 275--289, Nov. 2025. ISSN 2511-2481, 2511-249X. doi:10.1007/s41884-025-00174-3. URL https://link.springer.com/10.1007/s41884-025-00174-3
-
[12]
M. E. Khan and H. Rue. The Bayesian learning rule. Journal of Machine Learning Research, 24 0 (281): 0 1--46, 2023
2023
-
[13]
D. P. Kingma and M. Welling. Auto- Encoding Variational Bayes . arXiv:1312.6114 [cs, stat], Dec. 2013. URL http://arxiv.org/abs/1312.6114. arXiv: 1312.6114
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[14]
F. R. Kschischang, B. J. Frey, and H.-A. Loeliger. Factor graphs and the sum-product algorithm. IEEE Transactions on information theory, 47 0 (2): 0 498--519, 2001. doi:10.1109/18.910572. URL http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=910572
-
[15]
Deep learning.Nature, 521(7553):436– 444, 2015
Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521 0 (7553): 0 436--444, May 2015. ISSN 0028-0836, 1476-4687. doi:10.1038/nature14539. URL https://www.nature.com/articles/nature14539
-
[16]
H.-A. Loeliger. An introduction to factor graphs. Signal Processing Magazine, IEEE, 21 0 (1): 0 28--41, Jan. 2004. doi:10.1109/MSP.2004.1267047. URL https://ieeexplore.ieee.org/document/1267047
-
[17]
Loeliger
H.-A. Loeliger. Factor Graphs and Message Passing Algorithms -- Part 1: Introduction , 2007. URL http://www.crm.sns.it/media/course/1524/Loeliger_A.pdf
2007
-
[18]
Loshchilov and F
I. Loshchilov and F. Hutter. Decoupled Weight Decay Regularization . In 7th International Conference on Learning Representations , ICLR 2019, New Orleans , LA , USA , May 6-9, 2019 . OpenReview.net, 2019. URL https://openreview.net/forum?id=Bkg6RiCqY7
2019
-
[19]
Lukashchuk, I
M. Lukashchuk, I. Senöz, and B. de Vries. Q-conjugate message passing for efficient bayesian inference. In International conference on probabilistic graphical models, pages 295--311. PMLR, 2024
2024
-
[20]
M. Lukashchuk, D. Bagaev, A. Podusenko, \. I . S en \"o z, and B. de Vries. ExponentialFamilyManifolds .jl: Representing exponential families as Riemannian manifolds. Proceedings of the JuliaCon Conferences, 7 0 (70): 0 179, 2025. doi:10.21105/jcon.00179. URL https://doi.org/10.21105/jcon.00179
-
[21]
R. M. Neal. MCMC using Hamiltonian dynamics . May 2011. doi:10.1201/b10905. URL http://arxiv.org/abs/1206.1901. arXiv:1206.1901 [physics, stat]
-
[22]
W. W. L. Nuijten, D. Bagaev, and B. de Vries. GraphPPL .jl: A Probabilistic Programming Language for Graphical Models . Entropy, 26 0 (11), 2024. ISSN 1099-4300. doi:10.3390/e26110890. URL https://www.mdpi.com/1099-4300/26/11/890
-
[23]
Ranganath, S
R. Ranganath, S. Gerrish, and D. Blei. Black Box Variational Inference . In S. Kaski and J. Corander, editors, Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics , volume 33 of Proceedings of Machine Learning Research , pages 814--822, Reykjavik, Iceland, Apr. 2014. PMLR. URL https://proceedings.mlr.press/v33...
2014
-
[24]
D. J. Rezende and S. Mohamed. Variational Inference with Normalizing Flows . arXiv:1505.05770 [cs, stat], May 2015. URL http://arxiv.org/abs/1505.05770. arXiv: 1505.05770
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[25]
W. Rudin. Real and complex analysis. McGraw - Hill international editions Mathematics series. McGraw-Hill, New York, NY, 3. ed., internat. ed., [nachdr.] edition, 2013. ISBN 978-0-07-100276-9 978-0-07-054234-1. OCLC: 957461070
2013
-
[26]
I. Senöz, T. van de Laar, D. Bagaev, and B. de Vries. Variational Message Passing and Local Constraint Manipulation in Factor Graphs . Entropy, 23 0 (7): 0 807, July 2021. ISSN 1099-4300. doi:10.3390/e23070807. URL https://www.mdpi.com/1099-4300/23/7/807
-
[27]
Smola, S
A. Smola, S. Vishwanathan, and E. Eskin. Laplace propagation. In Advances in neural information processing systems, volume 16. MIT Press, 2003. URL https://proceedings.neurips.cc/paper_files/paper/2003/file/7fd804295ef7f6a2822bf4c61f9dc4a8-Paper.pdf
2003
-
[28]
Trindade
A. Trindade. ElectricityLoadDiagrams20112014 . UCI Machine Learning Repository, 10: 0 C58C86, 2015
2015
-
[29]
van de Laar, M
T. van de Laar, M. Cox, I. Senoz, I. Bocharov, and B. de Vries. ForneyLab : A Toolbox for Biologically Plausible Free Energy Minimization in Dynamic Neural Models . In Conference on Complex Systems ( CCS ) , Thessaloniki, Greece, Sept. 2018
2018
-
[30]
L. A. Weber, P. T. Waade, N. Legrand, A. H. Møller, K. E. Stephan, and C. Mathys. The generalized Hierarchical Gaussian Filter . Mar. 2026. doi:10.7554/elife.110174.1. URL http://dx.doi.org/10.7554/eLife.110174.1
-
[31]
Winn and C
J. Winn and C. M. Bishop. Variational Message Passing . Journal of Machine Learning Research, 6 0 (23): 0 661--694, 2005. ISSN 1533-7928. URL http://jmlr.org/papers/v6/winn05a.html
2005
-
[32]
J. S. Yedidia, W. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51 0 (7): 0 2282--2312, July 2005. ISSN 0018-9448. doi:10.1109/TIT.2005.850085. URL http://ieeexplore.ieee.org/abstract/document/1459044
-
[33]
A. Zeng, M. Chen, L. Zhang, and Q. Xu. Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence , volume 37, pages 11121--11128, 2023
2023
-
[34]
H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang. Informer: Beyond Efficient Transformer for Long Sequence Time - Series Forecasting . In The Thirty - Fifth AAAI Conference on Artificial Intelligence , AAAI 2021, Virtual Conference , volume 35, pages 11106--11115. AAAI Press, 2021
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.