On additive averaging kernels for finite Markov chains

Michael C.H. Choi; Ryan J.Y. Lim

arxiv: 2604.12334 · v1 · submitted 2026-04-14 · 🧮 math.PR · cs.IT· math.CO· math.IT· math.OC· stat.CO

On additive averaging kernels for finite Markov chains

Ryan J.Y. Lim , Michael C.H. Choi This is my paper

Pith reviewed 2026-05-10 14:42 UTC · model grok-4.3

classification 🧮 math.PR cs.ITmath.COmath.ITmath.OCstat.CO

keywords alphadivergencemarkovpartitionadditiveaveragingdistancefrobenius

0 comments

The pith

Additive averaging kernels A_alpha = alpha P + (1-alpha) G accelerate Markov chain mixing via optimized partitions, with explicit trace formulas for Frobenius distance and convexity bounds for KL divergence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Markov chains are random processes used to sample from hard-to-reach distributions in statistics, physics, and machine learning. The authors mix a standard transition rule P with a special averaging rule G that treats groups of states as blocks defined by a partition. By tuning how much to mix them with a weight alpha and choosing good blocks, the chain can reach its target distribution faster. They give closed-form expressions using matrix traces to find the best two-block partitions for one distance measure and show that the choice reduces to the averaging part for another measure using convexity. Experiments on a simple spin model reveal that medium mixing values balance detailed local steps with broad jumps across blocks, cutting the time to equilibrium in total variation distance.

Core claim

suitable choice of both the partition and the parameter alpha can significantly accelerate convergence in total variation distance. We observe a consistent trade-off between local exploration and global averaging, with intermediate values of alpha achieving the best performance across regimes.

Load-bearing premise

The Gibbs kernel induced by a partition of the state space can be effectively combined with the baseline sampler P to improve mixing without introducing new biases or invalidating the geometric decay rates governed by the spectral gap of P.

read the original abstract

We study additive mixtures of Markov kernels of the form $A_\alpha = \alpha P + (1-\alpha)G$, where $\alpha \in [0,1]$, $P$ is a baseline sampler and $G$ is a Gibbs kernel induced by a partition of the state space. We first motivate the study of $A_\alpha$, which can be interpreted as the projection of a lifted Markov chain. We then consider the minimisation of distance to stationarity under two objectives: the squared Frobenius norm and the Kullback-Leibler (KL) divergence. For the Frobenius objective, we derive explicit trace formulas and identify a Cheeger-type functional that characterises optimal two-block partitions. This yields a structured combinatorial optimisation problem admitting a difference-of-submodular decomposition, enabling efficient approximation via majorisation-minimisation. We also obtain geometric decay rates governed by the absolute spectral gap of $P$. For the KL divergence, we establish convexity-based bounds showing that the divergence of $A_\alpha$ is controlled by those of both $P$ and $G$, thereby reducing partition selection to the Gibbs component. Numerical experiments on the Curie-Weiss model demonstrate that suitable choice of both the partition and the parameter $\alpha$ can significantly accelerate convergence in total variation distance. We observe a consistent trade-off between local exploration and global averaging, with intermediate values of $\alpha$ achieving the best performance across regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based solely on the abstract, the central claims rest on standard Markov chain theory and convexity of KL divergence; no free parameters or invented entities are explicitly introduced.

axioms (2)

standard math Markov kernels are row-stochastic matrices admitting a unique stationary distribution.
Invoked implicitly when discussing distance to stationarity and spectral gaps.
standard math KL divergence is convex in the kernel.
Used to establish bounds controlling the divergence of A_alpha by those of P and G.

pith-pipeline@v0.9.0 · 5566 in / 1255 out tokens · 43720 ms · 2026-05-10T14:42:57.149831+00:00 · methodology

On additive averaging kernels for finite Markov chains

Core claim

Load-bearing premise

discussion (0)