pith. sign in

arxiv: 2603.28917 · v2 · submitted 2026-03-30 · 🧮 math.OC · cs.LG· cs.SY· eess.SY· stat.ML

Symmetrizing Bregman Divergence on the Cone of Positive Definite Matrices: Which Mean to Use and Why

Pith reviewed 2026-05-14 21:06 UTC · model grok-4.3

classification 🧮 math.OC cs.LGcs.SYeess.SYstat.ML
keywords Bregman divergencepositive definite matricesmirror mapssymmetrizationcanonical meansarithmetic meanlog-Euclidean meanharmonic mean
0
0 comments X

The pith

Arithmetic mean over the primal space is the canonical choice for symmetrizing Bregman divergences on positive definite matrices for any mirror map.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes variational principles for symmetrizing Bregman divergences generated by generic mirror maps over the cone of positive definite matrices. It shows that finding the canonical mean can be posed as a minimization problem over mean functionals that satisfy certain axioms. For forward symmetrization the minimum is always attained at the arithmetic mean computed directly in the primal space, regardless of the mirror map. For reverse symmetrization the minimum is attained at the arithmetic mean taken in the dual space and then pulled back to the primal space. When the result is applied to three standard mirror maps, the reverse-symmetrization means recover the arithmetic, log-Euclidean, and harmonic means, clarifying which mean is appropriate for a given symmetrization task.

Core claim

Computing the canonical means for symmetrizing Bregman divergences on positive definite matrices reduces to minimizing the symmetrized divergence over axiomatically defined mean functionals. For forward symmetrization this minimum is attained at the primal arithmetic mean for any mirror map. For reverse symmetrization the minimum is attained at the dual arithmetic mean pulled back to the primal space. Applied to common mirror maps, the reverse case yields the arithmetic, log-Euclidean, and harmonic means.

What carries the argument

Axiomatic mean functionals serving as the search space for minimization of the symmetrized Bregman divergence.

If this is right

  • Forward symmetrization selects the primal arithmetic mean for every mirror map.
  • Reverse symmetrization selects the dual arithmetic mean pulled back to the primal space for every mirror map.
  • For the three common mirror maps examined, reverse symmetrization produces the arithmetic, log-Euclidean, and harmonic means respectively.
  • Existing symmetrization choices appearing in the literature are recovered as direct consequences of the variational principle.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same variational construction may be used to select means when symmetrizing divergences on other cones or matrix manifolds.
  • In applications such as covariance estimation or optimization on the positive definite cone, the forward or reverse choice now dictates a unique mean without ad-hoc selection.
  • The dual-space construction suggests a systematic way to generate new means by varying the mirror map and then pulling back.

Load-bearing premise

Mean functionals are defined by axioms that permit the symmetrization task to be written as a minimization problem over those functionals.

What would settle it

For any fixed mirror map, numerically minimize the forward symmetrized Bregman divergence over candidate means and check whether the minimizing mean equals the component-wise arithmetic mean of the two positive definite matrices.

read the original abstract

This work uncovers variational principles behind symmetrizing the Bregman divergences induced by generic mirror maps over the cone of positive definite matrices. We show that computing the canonical means for this symmetrization can be posed as minimizing the desired symmetrized divergences over a set of mean functionals defined axiomatically to satisfy certain properties. For the forward symmetrization, we prove that the arithmetic mean over the primal space is canonical for any mirror map over the positive definite cone. For the reverse symmetrization, we show that the canonical mean is the arithmetic mean over the dual space, pulled back to the primal space. Applying this result to three common mirror maps used in practice, we show that the canonical means for reverse symmetrization, in those cases, turn out to be the arithmetic, log-Euclidean and harmonic means. Our results improve understanding of existing symmetrization practices in the literature, and can be seen as a navigational chart to help decide which mean to use when.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The paper develops variational principles for symmetrizing Bregman divergences induced by generic mirror maps on the cone of positive definite matrices. It formulates the selection of canonical means as a minimization problem over mean functionals defined axiomatically. The central results establish that the forward symmetrizer is minimized by the primal arithmetic mean for any mirror map, while the reverse symmetrizer is minimized by the dual arithmetic mean pulled back to the primal space. These are then specialized to three common mirror maps, recovering the arithmetic, log-Euclidean, and harmonic means for the reverse case.

Significance. If the derivations hold, the work supplies a principled, axiomatic justification for choosing means when symmetrizing Bregman divergences on PD matrices, moving beyond ad-hoc selections common in the literature. The generality across arbitrary mirror maps for the forward case and the explicit recovery of standard means for the reverse case constitute a useful navigational chart for optimization and information-geometric applications. The reduction of symmetrization to a well-posed minimization over axiomatically characterized functionals is a clear strength.

major comments (1)
  1. [§3, Theorem 3.1] §3, Theorem 3.1: The claim that the primal arithmetic mean minimizes the forward symmetrizer for every mirror map rests on substituting the arithmetic mean into the axiomatic properties (Definition 2.3) and verifying it attains the minimum; the manuscript states this holds but does not display the direct substitution step, which is load-bearing for the 'any mirror map' universality result.
minor comments (3)
  1. [Definition 2.3] Definition 2.3: The four axiomatic properties of mean functionals are listed but not numbered; cross-references in the proofs of Theorems 3.1 and 4.2 would be easier to follow if the axioms were labeled (A1)–(A4).
  2. [§4.2] §4.2: When the dual arithmetic mean is pulled back to the primal space for the reverse symmetrizer, the notation for the pullback operation is introduced inline; an explicit displayed equation defining the pulled-back functional would improve readability.
  3. [Abstract and §1] The abstract and §1 both refer to 'canonical means' before the axiomatic setup is introduced; a parenthetical forward reference to Definition 2.3 would clarify the term on first use.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation and the constructive comment on Theorem 3.1. We address the point below.

read point-by-point responses
  1. Referee: [§3, Theorem 3.1] §3, Theorem 3.1: The claim that the primal arithmetic mean minimizes the forward symmetrizer for every mirror map rests on substituting the arithmetic mean into the axiomatic properties (Definition 2.3) and verifying it attains the minimum; the manuscript states this holds but does not display the direct substitution step, which is load-bearing for the 'any mirror map' universality result.

    Authors: We agree that the direct substitution step is load-bearing for the universality claim and that its omission reduces transparency. In the revised manuscript we will insert an explicit verification: we substitute the arithmetic mean into each axiom of Definition 2.3, confirm that all axioms are satisfied, and show that the resulting value equals the lower bound of the forward symmetrizer, thereby attaining the minimum for arbitrary mirror maps. revision: yes

Circularity Check

0 steps flagged

Derivations rely on external axioms; no internal circularity

full rationale

The paper defines mean functionals axiomatically and poses symmetrized-Bregman minimization as a variational problem over those functionals. The central claims—that the primal arithmetic mean is canonical for forward symmetrization under any mirror map on the positive-definite cone, and that the dual arithmetic mean (pulled back) is canonical for reverse symmetrization—follow directly from the axioms plus standard convex-analysis properties of Bregman divergences. No derivation step reduces a claimed prediction to a fitted quantity inside the paper, nor does any load-bearing premise collapse to a self-citation chain; the argument remains self-contained once the external axioms are granted.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard convex-analysis constructions for mirror maps and an axiomatic definition of mean functionals; no new free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Mean functionals satisfy certain axiomatic properties allowing symmetrization to be posed as minimization over those functionals
    Invoked to define the feasible set for the variational problem that yields the canonical means.

pith-pipeline@v0.9.0 · 5487 in / 1190 out tokens · 37863 ms · 2026-05-14T21:06:28.502915+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming,

    L. M. Bregman, “The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming,”USSR computational mathematics and mathematical physics, vol. 7, no. 3, pp. 200–217, 1967

  2. [2]

    Censor and S

    Y . Censor and S. A. Zenios,Parallel optimization: Theory, algorithms, and applications. Oxford University Press, 1997

  3. [3]

    Clustering with Bregman divergences,

    A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, “Clustering with Bregman divergences,”Journal of machine learning research, vol. 6, no. Oct, pp. 1705–1749, 2005

  4. [4]

    Convex analysis on the Hermitian matrices,

    A. S. Lewis, “Convex analysis on the Hermitian matrices,”SIAM Journal on Optimization, vol. 6, no. 1, pp. 164–177, 1996

  5. [5]

    Matrix nearness problems with Bregman divergences,

    I. S. Dhillon and J. A. Tropp, “Matrix nearness problems with Bregman divergences,”SIAM Journal on Matrix Analysis and Applications, vol. 29, no. 4, pp. 1120–1146, 2008

  6. [6]

    On the Jensen–Shannon symmetrization of distances relying on abstract means,

    F. Nielsen, “On the Jensen–Shannon symmetrization of distances relying on abstract means,”Entropy, vol. 21, no. 5, p. 485, 2019

  7. [7]

    A new metric for probability distributions,

    D. M. Endres and J. E. Schindelin, “A new metric for probability distributions,”IEEE Transactions on Information theory, vol. 49, no. 7, pp. 1858–1860, 2003

  8. [8]

    Legendre functions and the method of random Bregman projections,

    H. H. Bauschke and J. M. Borwein, “Legendre functions and the method of random Bregman projections,”J. Convex Anal, vol. 4, no. 1, pp. 27–67, 1997

  9. [9]

    Mining matrix data with Bregman matrix divergences for portfolio selection,

    R. Nock, B. Magdalou, E. Briys, and F. Nielsen, “Mining matrix data with Bregman matrix divergences for portfolio selection,” inMatrix Information Geometry. Springer, 2012, pp. 373–402

  10. [10]

    Bregman alternating direction method of multipliers,

    H. Wang and A. Banerjee, “Bregman alternating direction method of multipliers,”Advances in neural information processing systems, vol. 27, 2014

  11. [11]

    Bregman divergences for infinite dimensional covariance matrices,

    M. Harandi, M. Salzmann, and F. Porikli, “Bregman divergences for infinite dimensional covariance matrices,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1003–1010

  12. [12]

    Degroot–Friedkin map in opinion dynamics is mirror descent,

    A. Halder, “Degroot–Friedkin map in opinion dynamics is mirror descent,”IEEE Control Systems Letters, vol. 3, no. 2, pp. 463–468, 2019

  13. [13]

    Hopfield neural network flow: A geometric viewpoint,

    A. Halder, K. F. Caluya, B. Travacca, and S. J. Moura, “Hopfield neural network flow: A geometric viewpoint,”IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 11, pp. 4869– 4880, 2020

  14. [14]

    Beyond quadratic costs in LQR: Bregman divergence control,

    B. Hassibi, J. Hajar, and R. Ghane, “Beyond quadratic costs in LQR: Bregman divergence control,” in2025 IEEE 64th Conference on Decision and Control (CDC). IEEE, 2025, pp. 1306–1313

  15. [15]

    Amari,Information geometry and its applications

    S.-i. Amari,Information geometry and its applications. Springer, 2016, vol. 194

  16. [16]

    R. A. Horn and C. R. Johnson,Matrix analysis, 2nd ed. Cambridge university press, 2012

  17. [17]

    Faraut and A

    J. Faraut and A. Kor ´anyi,Analysis on symmetric cones. Oxford university press, 1994

  18. [18]

    Riemannian metric and geometric mean for positive semidefinite matrices of fixed rank,

    S. Bonnabel and R. Sepulchre, “Riemannian metric and geometric mean for positive semidefinite matrices of fixed rank,”SIAM Journal on Matrix Analysis and Applications, vol. 31, no. 3, pp. 1055–1070, 2010

  19. [19]

    A differential geometric approach to the geometric mean of symmetric positive-definite matrices,

    M. Moakher, “A differential geometric approach to the geometric mean of symmetric positive-definite matrices,”SIAM journal on matrix analysis and applications, vol. 26, no. 3, pp. 735–747, 2005

  20. [20]

    Finite horizon linear quadratic Gaussian density regulator with Wasserstein terminal cost,

    A. Halder and E. D. Wendel, “Finite horizon linear quadratic Gaussian density regulator with Wasserstein terminal cost,” in2016 American Control Conference (ACC). IEEE, 2016, pp. 7249–7254

  21. [21]

    Bhatia,Positive definite matrices

    R. Bhatia,Positive definite matrices. Princeton university press, 2009

  22. [22]

    Functional calculus for sesquilinear forms and the purification map,

    W. Pusz and S. L. Woronowicz, “Functional calculus for sesquilinear forms and the purification map,”Reports on Mathematical Physics, vol. 8, no. 2, pp. 159–170, 1975

  23. [23]

    Geometric means,

    T. Ando, C.-K. Li, and R. Mathias, “Geometric means,”Linear algebra and its applications, vol. 385, pp. 305–334, 2004

  24. [24]

    Tyrrell Rockafellar,Convex analysis

    R. Tyrrell Rockafellar,Convex analysis. Princeton university press, Princeton, NJ, USA, 1970, vol. 28

  25. [25]

    Convex optimization: Algorithms and complexity,

    S. Bubeck, “Convex optimization: Algorithms and complexity,”Foun- dations and trends in Machine Learning, vol. 8, no. 3-4, pp. 231–357, 2015

  26. [26]

    A. S. Nemirovskij and D. B. Yudin,Problem complexity and method efficiency in optimization. Wiley-Interscience, 1983

  27. [27]

    Bengtsson and K

    I. Bengtsson and K. ˙Zyczkowski,Geometry of quantum states: an introduction to quantum entanglement. Cambridge university press, 2017

  28. [28]

    Conditional expectation in an operator algebra, iv (entropy and information),

    H. Umegaki, “Conditional expectation in an operator algebra, iv (entropy and information),” inKodai Mathematical Seminar Reports, vol. 14, no. 2. Department of Mathematics, Tokyo Institute of Technology, 1962, pp. 59–85

  29. [29]

    The proper formula for relative entropy and its asymptotics in quantum probability,

    F. Hiai and D. Petz, “The proper formula for relative entropy and its asymptotics in quantum probability,”Communications in mathematical physics, vol. 143, no. 1, pp. 99–114, 1991

  30. [30]

    Nesterov and A

    Y . Nesterov and A. Nemirovskii,Interior-point polynomial algorithms in convex programming. SIAM, 1994

  31. [31]

    Maximum entropy spectral analysis,

    J. Burg, “Maximum entropy spectral analysis,”PhD thesis, Stanford University, 1975

  32. [32]

    Positive definite matrices and the S-divergence,

    S. Sra, “Positive definite matrices and the S-divergence,”Proceedings of the American Mathematical Society, vol. 144, no. 7, pp. 2787–2797, 2016

  33. [33]

    On a measure of divergence between two statistical populations defined by their probability distribution,

    A. Bhattacharyya, “On a measure of divergence between two statistical populations defined by their probability distribution,”Bulletin of the Calcutta Mathematical Society, vol. 35, pp. 99–110, 1943

  34. [34]

    Efficient similarity search for covariance matrices via the Jensen-Bregman LogDet divergence,

    A. Cherian, S. Sra, A. Banerjee, and N. Papanikolopoulos, “Efficient similarity search for covariance matrices via the Jensen-Bregman LogDet divergence,” in2011 international conference on computer vision. IEEE, 2011, pp. 2399–2406

  35. [35]

    On Bregman Voronoi diagrams,

    F. Nielsen, J.-D. Boissonnat, and R. Nock, “On Bregman Voronoi diagrams,” inProceedings of the eighteenth annual ACM-SIAM sym- posium on Discrete algorithms, 2007, pp. 746–755

  36. [36]

    Trace inequalities and quantum entropy: an introductory course,

    E. Carlen, “Trace inequalities and quantum entropy: an introductory course,” inContemporary Mathematics. Providence, RI: American Mathematical Society, 2010, pp. 73–140

  37. [37]

    Gradient flows in filtering and Fisher- Rao geometry,

    A. Halder and T. T. Georgiou, “Gradient flows in filtering and Fisher- Rao geometry,” in2018 Annual American Control Conference (ACC). IEEE, 2018, pp. 4281–4286

  38. [38]

    Bregman divergences and triangle inequality,

    S. Acharyya, A. Banerjee, and D. Boley, “Bregman divergences and triangle inequality,” inProceedings of the 2013 SIAM International Conference on Data Mining. SIAM, 2013, pp. 476–484