On additive averaging kernels for finite Markov chains
Pith reviewed 2026-05-10 14:42 UTC · model grok-4.3
The pith
Additive averaging kernels A_alpha = alpha P + (1-alpha) G accelerate Markov chain mixing via optimized partitions, with explicit trace formulas for Frobenius distance and convexity bounds for KL divergence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
suitable choice of both the partition and the parameter alpha can significantly accelerate convergence in total variation distance. We observe a consistent trade-off between local exploration and global averaging, with intermediate values of alpha achieving the best performance across regimes.
Load-bearing premise
The Gibbs kernel induced by a partition of the state space can be effectively combined with the baseline sampler P to improve mixing without introducing new biases or invalidating the geometric decay rates governed by the spectral gap of P.
read the original abstract
We study additive mixtures of Markov kernels of the form $A_\alpha = \alpha P + (1-\alpha)G$, where $\alpha \in [0,1]$, $P$ is a baseline sampler and $G$ is a Gibbs kernel induced by a partition of the state space. We first motivate the study of $A_\alpha$, which can be interpreted as the projection of a lifted Markov chain. We then consider the minimisation of distance to stationarity under two objectives: the squared Frobenius norm and the Kullback-Leibler (KL) divergence. For the Frobenius objective, we derive explicit trace formulas and identify a Cheeger-type functional that characterises optimal two-block partitions. This yields a structured combinatorial optimisation problem admitting a difference-of-submodular decomposition, enabling efficient approximation via majorisation-minimisation. We also obtain geometric decay rates governed by the absolute spectral gap of $P$. For the KL divergence, we establish convexity-based bounds showing that the divergence of $A_\alpha$ is controlled by those of both $P$ and $G$, thereby reducing partition selection to the Gibbs component. Numerical experiments on the Curie-Weiss model demonstrate that suitable choice of both the partition and the parameter $\alpha$ can significantly accelerate convergence in total variation distance. We observe a consistent trade-off between local exploration and global averaging, with intermediate values of $\alpha$ achieving the best performance across regimes.
Editorial analysis
A structured set of objections, weighed in public.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Markov kernels are row-stochastic matrices admitting a unique stationary distribution.
- standard math KL divergence is convex in the kernel.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.