Learning Markov Processes as Sum-of-Square Forms for Analytical Belief Propagation

Morteza Lahijanian; Peter Amorese

arxiv: 2604.07525 · v1 · submitted 2026-04-08 · 💻 cs.LG · cs.SY· eess.SY

Learning Markov Processes as Sum-of-Square Forms for Analytical Belief Propagation

Peter Amorese , Morteza Lahijanian This is my paper

Pith reviewed 2026-05-10 17:40 UTC · model grok-4.3

classification 💻 cs.LG cs.SYeess.SY

keywords Markov processesbelief propagationsum-of-squares polynomialsconditional density estimationanalytical propagationmachine learningdensity constraints

0 comments

The pith

A novel sum-of-squares form for conditional densities enables analytical belief propagation through Markov process models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a modeling framework that uses sparse sum-of-squares representations to estimate valid conditional densities in Markov models. Standard sum-of-squares forms face theoretical restrictions when applied to conditional densities, so the authors develop a new functional form that removes these limits while supporting simultaneous learning of basis functions and coefficients. The resulting models allow exact analytical propagation of beliefs without sampling or approximation, maintain non-negativity and normalization by construction during training, and demonstrate strong scaling behavior.

Core claim

By replacing standard sum-of-squares polynomials with a novel functional form designed for conditional densities, it becomes possible to learn Markov process models that admit closed-form belief propagation. The architecture learns both the functional bases and their coefficients jointly, enforces exact normalization and non-negativity through the training procedure, and produces predictions whose accuracy matches existing methods while using far less memory in low dimensions and continuing to function in state spaces up to 12 dimensions.

What carries the argument

The novel functional form for sum-of-squares conditional densities, which overcomes the theoretical restrictions that prevent standard SoS polynomials from representing valid conditional distributions while still permitting analytical propagation.

If this is right

Belief propagation becomes an exact, closed-form operation for any model learned in this representation.
Memory footprint remains low enough to handle problems that defeat sampling or grid-based methods beyond two dimensions.
Training guarantees exact satisfaction of the density axioms without post-hoc renormalization.
Simultaneous optimization of bases and coefficients becomes feasible without sacrificing the analytical propagation property.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar functional forms could be adapted for other positivity-constrained learning tasks such as policy optimization or generative modeling.
The scaling advantage suggests the approach may enable online filtering and prediction in robotics or autonomous systems with moderately high-dimensional continuous states.
Hybrid architectures that combine the sum-of-squares layer with neural feature extractors might extend the method to even higher dimensions.

Load-bearing premise

The new functional form truly removes the theoretical barriers of standard sum-of-squares representations for conditional densities without introducing any undetected violations of positivity or normalization.

What would settle it

A low-dimensional example in which the learned conditional density becomes negative somewhere or fails to integrate to one, or in which belief propagation through the model produces incorrect marginals compared to exact numerical integration.

Figures

Figures reproduced from arXiv: 2604.07525 by Morteza Lahijanian, Peter Amorese.

**Figure 2.** Figure 2: Visual Comparison for 6D Quadcopter system. (Left) Learned model. (Right) Monte Carlo simulation (ground [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Visual Comparison for 12D Quadcopter system. (Left) Learned model. (Right) Monte Carlo simulation (ground [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Visual comparison for the 6D quadcopter system across timesteps [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Visual comparison for the 12D stabilizing quadcopter system across timesteps [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

read the original abstract

Harnessing the predictive capability of Markov process models requires propagating probability density functions (beliefs) through the model. For many existing models however, belief propagation is analytically infeasible, requiring approximation or sampling to generate predictions. This paper proposes a functional modeling framework leveraging sparse Sum-of-Squares (SoS) forms for valid (conditional) density estimation. We study the theoretical restrictions of modeling conditional densities using the SoS form, and propose a novel functional form for addressing such limitations. The proposed architecture enables generalized simultaneous learning of basis functions and coefficients, while preserving analytical belief propagation. In addition, we propose a training method that allows for exact adherence to the normalization and non-negativity constraints. Our results show that the proposed method achieves accuracy comparable to state-of-the-art approaches while requiring significantly less memory in low-dimensional spaces, and it further scales to 12D systems when existing methods fail beyond 2D.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a modified sum-of-squares form for conditional densities that aims to enable exact analytical belief propagation in Markov models up to 12D, but the abstract provides almost no derivation or experimental detail to back the central guarantees.

read the letter

The main takeaway is a new functional form for sum-of-squares representations of conditional densities, paired with a training procedure that claims to enforce normalization and non-negativity exactly rather than approximately. This is positioned as a way to keep belief propagation closed-form in Markov processes where standard methods break down beyond low dimensions.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a functional modeling framework that uses sparse Sum-of-Squares (SoS) polynomial forms to represent conditional densities in Markov processes. It analyzes theoretical restrictions of standard SoS representations for conditionals, introduces a novel functional form that permits simultaneous learning of basis functions and coefficients while preserving analytical belief propagation, and presents a training procedure claimed to enforce exact normalization and non-negativity. The authors assert that the resulting models achieve accuracy comparable to state-of-the-art methods with substantially lower memory in low dimensions and scale to 12D systems where prior approaches fail beyond 2D.

Significance. If the novel SoS form and training procedure truly guarantee valid conditional densities p(y|x) (non-negative and exactly normalized for every x) that support closed-form propagation, the work would enable memory-efficient, exact inference in higher-dimensional Markov models without sampling or approximation. This would be a meaningful contribution to probabilistic modeling, with potential impact in control, robotics, and sequential decision-making. The explicit use of established SoS theory together with a constraint-enforcing training method is a constructive element; however, the absence of detailed experimental protocols, baselines, and verification of the normalization property limits the ability to gauge the result's robustness.

major comments (2)

[Abstract and §3] Abstract and §3 (Novel Functional Form): The central claim that the novel form plus training procedure produces conditional densities that integrate exactly to 1 over y for all x (not merely at training points) is load-bearing for both analytical propagation and the 12D scaling result, yet the manuscript provides no derivation, theorem, or post-training numerical verification (e.g., Monte-Carlo integration of the learned density at held-out x values) demonstrating that the x-dependent normalization constant remains identically 1 outside the training distribution.
[§5 and Table 1] §5 (Experiments) and Table 1: The statements of 'accuracy comparable to state-of-the-art' and successful scaling to 12D lack any reported baselines, error metrics, number of independent trials, or error bars. Without these, it is impossible to assess whether the memory savings and high-dimensional performance are statistically meaningful or whether the method actually outperforms or matches existing approaches on the claimed tasks.

minor comments (2)

[§3] Notation for the novel SoS form (e.g., how the conditioning variable x enters the polynomial coefficients) should be made fully explicit with an equation reference so that readers can verify the claimed closure under multiplication and marginalization.
[§4] The abstract mentions 'sparse' SoS forms; the manuscript should clarify the precise sparsity-inducing mechanism (regularization term, basis selection, etc.) and its effect on the number of terms retained in the 12D experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important aspects of our theoretical claims and experimental reporting that we will address in the revision. We provide point-by-point responses below.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Novel Functional Form): The central claim that the novel form plus training procedure produces conditional densities that integrate exactly to 1 over y for all x (not merely at training points) is load-bearing for both analytical propagation and the 12D scaling result, yet the manuscript provides no derivation, theorem, or post-training numerical verification (e.g., Monte-Carlo integration of the learned density at held-out x values) demonstrating that the x-dependent normalization constant remains identically 1 outside the training distribution.

Authors: We appreciate the referee highlighting the need for explicit justification of this central property. The training procedure in Section 3 is constructed to enforce exact normalization and non-negativity identically for all x through the SoS functional form and the constrained optimization, rather than only at sampled points. However, we acknowledge that the manuscript lacks a dedicated theorem and numerical verification on held-out data. In the revised version we will add a formal derivation proving that the learned conditional integrates to one for every x, together with Monte-Carlo integration checks at held-out x values to confirm the property holds outside the training distribution. revision: yes
Referee: [§5 and Table 1] §5 (Experiments) and Table 1: The statements of 'accuracy comparable to state-of-the-art' and successful scaling to 12D lack any reported baselines, error metrics, number of independent trials, or error bars. Without these, it is impossible to assess whether the memory savings and high-dimensional performance are statistically meaningful or whether the method actually outperforms or matches existing approaches on the claimed tasks.

Authors: We agree that the experimental presentation requires additional detail for proper evaluation. The revised manuscript will expand Section 5 and Table 1 to include direct comparisons against the relevant state-of-the-art baselines, specify the quantitative error metrics used, report the number of independent trials performed, and include error bars or standard deviations. These additions will allow readers to assess the statistical significance of the accuracy and scaling results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained on external SoS theory

full rationale

The paper builds on established Sum-of-Squares theory for density estimation, introduces a novel functional form to address conditional-density restrictions, and uses a training procedure to enforce normalization and non-negativity. No load-bearing steps reduce by construction to fitted inputs, self-citations, or ansatzes imported from the authors' prior work. The abstract and description present the validity constraints as independently enforced by training rather than equated to the target result, and the scaling claims rest on empirical comparison rather than definitional equivalence. This is the common honest outcome for papers that extend an external mathematical framework with a new parameterization whose properties are verified outside the fitted values.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review performed on abstract only; full derivations, assumptions, and experimental protocols are unavailable. The ledger therefore lists only the background elements explicitly referenced in the abstract.

axioms (1)

standard math Sum-of-Squares polynomials are non-negative and can represent valid densities under suitable normalization.
Invoked when the paper states it studies theoretical restrictions of SoS for conditional densities.

invented entities (1)

novel functional form for SoS conditional densities no independent evidence
purpose: Overcome limitations of standard SoS when representing conditional densities while preserving analytical propagation.
Introduced in the paper as the core architectural contribution.

pith-pipeline@v0.9.0 · 5458 in / 1294 out tokens · 96822 ms · 2026-05-10T17:40:41.229426+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

[1]

Akhiezer, N. I. (2020).The classical moment problem and some related questions in analysis. SIAM. Alspach, D. and Sorenson, H. (2003). Nonlinear bayesian estimation using gaussian sum approximations.IEEE transactions on automatic control, 17(4):439–448. Amorese, P. (2026). Bernsteinflow. https://github.com/ peteramorese/BernsteinFlow/tree/dev-anony-factor...

work page arXiv 2020
[2]

Papamakarios, G., Pavlakou, T., and Murray, I. (2017). Masked autoregressive flow for density estimation.Ad- vances in neural information processing systems,

work page 2017
[3]

Parrilo, P. A. (2003). Semidefinite programming relaxations for semialgebraic problems.Mathematical programming, 96(2):293–320. Parzen, E. (1962). On estimation of a probability density function and mode.The annals of mathematical statistics, 33(3):1065–1076. Putinar, M. (1993). Positive polynomials on compact semi- algebraic sets.Indiana University Mathe...

work page 2003
[4]

Beliefs are shown with corresponding Monte Carlo ground truth below

9 . Beliefs are shown with corresponding Monte Carlo ground truth below. Lahijanian (2026). The full state-space dynamics equations can be found in the supplementary code in Amorese (2026), with the used parameters. Regularization loss was applied in the form Lreg =c reg X i X m α2 i,m +β 2 i,m .(30) A regularization weight ofc reg = 10−4 withα, β∈[0.4,40...

work page 2026

[1] [1]

Akhiezer, N. I. (2020).The classical moment problem and some related questions in analysis. SIAM. Alspach, D. and Sorenson, H. (2003). Nonlinear bayesian estimation using gaussian sum approximations.IEEE transactions on automatic control, 17(4):439–448. Amorese, P. (2026). Bernsteinflow. https://github.com/ peteramorese/BernsteinFlow/tree/dev-anony-factor...

work page arXiv 2020

[2] [2]

Papamakarios, G., Pavlakou, T., and Murray, I. (2017). Masked autoregressive flow for density estimation.Ad- vances in neural information processing systems,

work page 2017

[3] [3]

Parrilo, P. A. (2003). Semidefinite programming relaxations for semialgebraic problems.Mathematical programming, 96(2):293–320. Parzen, E. (1962). On estimation of a probability density function and mode.The annals of mathematical statistics, 33(3):1065–1076. Putinar, M. (1993). Positive polynomials on compact semi- algebraic sets.Indiana University Mathe...

work page 2003

[4] [4]

Beliefs are shown with corresponding Monte Carlo ground truth below

9 . Beliefs are shown with corresponding Monte Carlo ground truth below. Lahijanian (2026). The full state-space dynamics equations can be found in the supplementary code in Amorese (2026), with the used parameters. Regularization loss was applied in the form Lreg =c reg X i X m α2 i,m +β 2 i,m .(30) A regularization weight ofc reg = 10−4 withα, β∈[0.4,40...

work page 2026