Symmetrizing Bregman Divergence on the Cone of Positive Definite Matrices: Which Mean to Use and Why
Pith reviewed 2026-05-14 21:06 UTC · model grok-4.3
The pith
Arithmetic mean over the primal space is the canonical choice for symmetrizing Bregman divergences on positive definite matrices for any mirror map.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Computing the canonical means for symmetrizing Bregman divergences on positive definite matrices reduces to minimizing the symmetrized divergence over axiomatically defined mean functionals. For forward symmetrization this minimum is attained at the primal arithmetic mean for any mirror map. For reverse symmetrization the minimum is attained at the dual arithmetic mean pulled back to the primal space. Applied to common mirror maps, the reverse case yields the arithmetic, log-Euclidean, and harmonic means.
What carries the argument
Axiomatic mean functionals serving as the search space for minimization of the symmetrized Bregman divergence.
If this is right
- Forward symmetrization selects the primal arithmetic mean for every mirror map.
- Reverse symmetrization selects the dual arithmetic mean pulled back to the primal space for every mirror map.
- For the three common mirror maps examined, reverse symmetrization produces the arithmetic, log-Euclidean, and harmonic means respectively.
- Existing symmetrization choices appearing in the literature are recovered as direct consequences of the variational principle.
Where Pith is reading between the lines
- The same variational construction may be used to select means when symmetrizing divergences on other cones or matrix manifolds.
- In applications such as covariance estimation or optimization on the positive definite cone, the forward or reverse choice now dictates a unique mean without ad-hoc selection.
- The dual-space construction suggests a systematic way to generate new means by varying the mirror map and then pulling back.
Load-bearing premise
Mean functionals are defined by axioms that permit the symmetrization task to be written as a minimization problem over those functionals.
What would settle it
For any fixed mirror map, numerically minimize the forward symmetrized Bregman divergence over candidate means and check whether the minimizing mean equals the component-wise arithmetic mean of the two positive definite matrices.
read the original abstract
This work uncovers variational principles behind symmetrizing the Bregman divergences induced by generic mirror maps over the cone of positive definite matrices. We show that computing the canonical means for this symmetrization can be posed as minimizing the desired symmetrized divergences over a set of mean functionals defined axiomatically to satisfy certain properties. For the forward symmetrization, we prove that the arithmetic mean over the primal space is canonical for any mirror map over the positive definite cone. For the reverse symmetrization, we show that the canonical mean is the arithmetic mean over the dual space, pulled back to the primal space. Applying this result to three common mirror maps used in practice, we show that the canonical means for reverse symmetrization, in those cases, turn out to be the arithmetic, log-Euclidean and harmonic means. Our results improve understanding of existing symmetrization practices in the literature, and can be seen as a navigational chart to help decide which mean to use when.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops variational principles for symmetrizing Bregman divergences induced by generic mirror maps on the cone of positive definite matrices. It formulates the selection of canonical means as a minimization problem over mean functionals defined axiomatically. The central results establish that the forward symmetrizer is minimized by the primal arithmetic mean for any mirror map, while the reverse symmetrizer is minimized by the dual arithmetic mean pulled back to the primal space. These are then specialized to three common mirror maps, recovering the arithmetic, log-Euclidean, and harmonic means for the reverse case.
Significance. If the derivations hold, the work supplies a principled, axiomatic justification for choosing means when symmetrizing Bregman divergences on PD matrices, moving beyond ad-hoc selections common in the literature. The generality across arbitrary mirror maps for the forward case and the explicit recovery of standard means for the reverse case constitute a useful navigational chart for optimization and information-geometric applications. The reduction of symmetrization to a well-posed minimization over axiomatically characterized functionals is a clear strength.
major comments (1)
- [§3, Theorem 3.1] §3, Theorem 3.1: The claim that the primal arithmetic mean minimizes the forward symmetrizer for every mirror map rests on substituting the arithmetic mean into the axiomatic properties (Definition 2.3) and verifying it attains the minimum; the manuscript states this holds but does not display the direct substitution step, which is load-bearing for the 'any mirror map' universality result.
minor comments (3)
- [Definition 2.3] Definition 2.3: The four axiomatic properties of mean functionals are listed but not numbered; cross-references in the proofs of Theorems 3.1 and 4.2 would be easier to follow if the axioms were labeled (A1)–(A4).
- [§4.2] §4.2: When the dual arithmetic mean is pulled back to the primal space for the reverse symmetrizer, the notation for the pullback operation is introduced inline; an explicit displayed equation defining the pulled-back functional would improve readability.
- [Abstract and §1] The abstract and §1 both refer to 'canonical means' before the axiomatic setup is introduced; a parenthetical forward reference to Definition 2.3 would clarify the term on first use.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation and the constructive comment on Theorem 3.1. We address the point below.
read point-by-point responses
-
Referee: [§3, Theorem 3.1] §3, Theorem 3.1: The claim that the primal arithmetic mean minimizes the forward symmetrizer for every mirror map rests on substituting the arithmetic mean into the axiomatic properties (Definition 2.3) and verifying it attains the minimum; the manuscript states this holds but does not display the direct substitution step, which is load-bearing for the 'any mirror map' universality result.
Authors: We agree that the direct substitution step is load-bearing for the universality claim and that its omission reduces transparency. In the revised manuscript we will insert an explicit verification: we substitute the arithmetic mean into each axiom of Definition 2.3, confirm that all axioms are satisfied, and show that the resulting value equals the lower bound of the forward symmetrizer, thereby attaining the minimum for arbitrary mirror maps. revision: yes
Circularity Check
Derivations rely on external axioms; no internal circularity
full rationale
The paper defines mean functionals axiomatically and poses symmetrized-Bregman minimization as a variational problem over those functionals. The central claims—that the primal arithmetic mean is canonical for forward symmetrization under any mirror map on the positive-definite cone, and that the dual arithmetic mean (pulled back) is canonical for reverse symmetrization—follow directly from the axioms plus standard convex-analysis properties of Bregman divergences. No derivation step reduces a claimed prediction to a fitted quantity inside the paper, nor does any load-bearing premise collapse to a self-citation chain; the argument remains self-contained once the external axioms are granted.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Mean functionals satisfy certain axiomatic properties allowing symmetrization to be posed as minimization over those functionals
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 1: unique minimizer −→M_canonical = (X+Y)/2 for forward symmetrization (6) for any mirror map ψ
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
L. M. Bregman, “The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming,”USSR computational mathematics and mathematical physics, vol. 7, no. 3, pp. 200–217, 1967
work page 1967
-
[2]
Y . Censor and S. A. Zenios,Parallel optimization: Theory, algorithms, and applications. Oxford University Press, 1997
work page 1997
-
[3]
Clustering with Bregman divergences,
A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, “Clustering with Bregman divergences,”Journal of machine learning research, vol. 6, no. Oct, pp. 1705–1749, 2005
work page 2005
-
[4]
Convex analysis on the Hermitian matrices,
A. S. Lewis, “Convex analysis on the Hermitian matrices,”SIAM Journal on Optimization, vol. 6, no. 1, pp. 164–177, 1996
work page 1996
-
[5]
Matrix nearness problems with Bregman divergences,
I. S. Dhillon and J. A. Tropp, “Matrix nearness problems with Bregman divergences,”SIAM Journal on Matrix Analysis and Applications, vol. 29, no. 4, pp. 1120–1146, 2008
work page 2008
-
[6]
On the Jensen–Shannon symmetrization of distances relying on abstract means,
F. Nielsen, “On the Jensen–Shannon symmetrization of distances relying on abstract means,”Entropy, vol. 21, no. 5, p. 485, 2019
work page 2019
-
[7]
A new metric for probability distributions,
D. M. Endres and J. E. Schindelin, “A new metric for probability distributions,”IEEE Transactions on Information theory, vol. 49, no. 7, pp. 1858–1860, 2003
work page 2003
-
[8]
Legendre functions and the method of random Bregman projections,
H. H. Bauschke and J. M. Borwein, “Legendre functions and the method of random Bregman projections,”J. Convex Anal, vol. 4, no. 1, pp. 27–67, 1997
work page 1997
-
[9]
Mining matrix data with Bregman matrix divergences for portfolio selection,
R. Nock, B. Magdalou, E. Briys, and F. Nielsen, “Mining matrix data with Bregman matrix divergences for portfolio selection,” inMatrix Information Geometry. Springer, 2012, pp. 373–402
work page 2012
-
[10]
Bregman alternating direction method of multipliers,
H. Wang and A. Banerjee, “Bregman alternating direction method of multipliers,”Advances in neural information processing systems, vol. 27, 2014
work page 2014
-
[11]
Bregman divergences for infinite dimensional covariance matrices,
M. Harandi, M. Salzmann, and F. Porikli, “Bregman divergences for infinite dimensional covariance matrices,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1003–1010
work page 2014
-
[12]
Degroot–Friedkin map in opinion dynamics is mirror descent,
A. Halder, “Degroot–Friedkin map in opinion dynamics is mirror descent,”IEEE Control Systems Letters, vol. 3, no. 2, pp. 463–468, 2019
work page 2019
-
[13]
Hopfield neural network flow: A geometric viewpoint,
A. Halder, K. F. Caluya, B. Travacca, and S. J. Moura, “Hopfield neural network flow: A geometric viewpoint,”IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 11, pp. 4869– 4880, 2020
work page 2020
-
[14]
Beyond quadratic costs in LQR: Bregman divergence control,
B. Hassibi, J. Hajar, and R. Ghane, “Beyond quadratic costs in LQR: Bregman divergence control,” in2025 IEEE 64th Conference on Decision and Control (CDC). IEEE, 2025, pp. 1306–1313
work page 2025
-
[15]
Amari,Information geometry and its applications
S.-i. Amari,Information geometry and its applications. Springer, 2016, vol. 194
work page 2016
-
[16]
R. A. Horn and C. R. Johnson,Matrix analysis, 2nd ed. Cambridge university press, 2012
work page 2012
-
[17]
J. Faraut and A. Kor ´anyi,Analysis on symmetric cones. Oxford university press, 1994
work page 1994
-
[18]
Riemannian metric and geometric mean for positive semidefinite matrices of fixed rank,
S. Bonnabel and R. Sepulchre, “Riemannian metric and geometric mean for positive semidefinite matrices of fixed rank,”SIAM Journal on Matrix Analysis and Applications, vol. 31, no. 3, pp. 1055–1070, 2010
work page 2010
-
[19]
A differential geometric approach to the geometric mean of symmetric positive-definite matrices,
M. Moakher, “A differential geometric approach to the geometric mean of symmetric positive-definite matrices,”SIAM journal on matrix analysis and applications, vol. 26, no. 3, pp. 735–747, 2005
work page 2005
-
[20]
Finite horizon linear quadratic Gaussian density regulator with Wasserstein terminal cost,
A. Halder and E. D. Wendel, “Finite horizon linear quadratic Gaussian density regulator with Wasserstein terminal cost,” in2016 American Control Conference (ACC). IEEE, 2016, pp. 7249–7254
work page 2016
-
[21]
Bhatia,Positive definite matrices
R. Bhatia,Positive definite matrices. Princeton university press, 2009
work page 2009
-
[22]
Functional calculus for sesquilinear forms and the purification map,
W. Pusz and S. L. Woronowicz, “Functional calculus for sesquilinear forms and the purification map,”Reports on Mathematical Physics, vol. 8, no. 2, pp. 159–170, 1975
work page 1975
-
[23]
T. Ando, C.-K. Li, and R. Mathias, “Geometric means,”Linear algebra and its applications, vol. 385, pp. 305–334, 2004
work page 2004
-
[24]
Tyrrell Rockafellar,Convex analysis
R. Tyrrell Rockafellar,Convex analysis. Princeton university press, Princeton, NJ, USA, 1970, vol. 28
work page 1970
-
[25]
Convex optimization: Algorithms and complexity,
S. Bubeck, “Convex optimization: Algorithms and complexity,”Foun- dations and trends in Machine Learning, vol. 8, no. 3-4, pp. 231–357, 2015
work page 2015
-
[26]
A. S. Nemirovskij and D. B. Yudin,Problem complexity and method efficiency in optimization. Wiley-Interscience, 1983
work page 1983
-
[27]
I. Bengtsson and K. ˙Zyczkowski,Geometry of quantum states: an introduction to quantum entanglement. Cambridge university press, 2017
work page 2017
-
[28]
Conditional expectation in an operator algebra, iv (entropy and information),
H. Umegaki, “Conditional expectation in an operator algebra, iv (entropy and information),” inKodai Mathematical Seminar Reports, vol. 14, no. 2. Department of Mathematics, Tokyo Institute of Technology, 1962, pp. 59–85
work page 1962
-
[29]
The proper formula for relative entropy and its asymptotics in quantum probability,
F. Hiai and D. Petz, “The proper formula for relative entropy and its asymptotics in quantum probability,”Communications in mathematical physics, vol. 143, no. 1, pp. 99–114, 1991
work page 1991
-
[30]
Y . Nesterov and A. Nemirovskii,Interior-point polynomial algorithms in convex programming. SIAM, 1994
work page 1994
-
[31]
Maximum entropy spectral analysis,
J. Burg, “Maximum entropy spectral analysis,”PhD thesis, Stanford University, 1975
work page 1975
-
[32]
Positive definite matrices and the S-divergence,
S. Sra, “Positive definite matrices and the S-divergence,”Proceedings of the American Mathematical Society, vol. 144, no. 7, pp. 2787–2797, 2016
work page 2016
-
[33]
A. Bhattacharyya, “On a measure of divergence between two statistical populations defined by their probability distribution,”Bulletin of the Calcutta Mathematical Society, vol. 35, pp. 99–110, 1943
work page 1943
-
[34]
Efficient similarity search for covariance matrices via the Jensen-Bregman LogDet divergence,
A. Cherian, S. Sra, A. Banerjee, and N. Papanikolopoulos, “Efficient similarity search for covariance matrices via the Jensen-Bregman LogDet divergence,” in2011 international conference on computer vision. IEEE, 2011, pp. 2399–2406
work page 2011
-
[35]
F. Nielsen, J.-D. Boissonnat, and R. Nock, “On Bregman Voronoi diagrams,” inProceedings of the eighteenth annual ACM-SIAM sym- posium on Discrete algorithms, 2007, pp. 746–755
work page 2007
-
[36]
Trace inequalities and quantum entropy: an introductory course,
E. Carlen, “Trace inequalities and quantum entropy: an introductory course,” inContemporary Mathematics. Providence, RI: American Mathematical Society, 2010, pp. 73–140
work page 2010
-
[37]
Gradient flows in filtering and Fisher- Rao geometry,
A. Halder and T. T. Georgiou, “Gradient flows in filtering and Fisher- Rao geometry,” in2018 Annual American Control Conference (ACC). IEEE, 2018, pp. 4281–4286
work page 2018
-
[38]
Bregman divergences and triangle inequality,
S. Acharyya, A. Banerjee, and D. Boley, “Bregman divergences and triangle inequality,” inProceedings of the 2013 SIAM International Conference on Data Mining. SIAM, 2013, pp. 476–484
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.