Bregman proximal gradient method for linear optimization under entropic constraints
Pith reviewed 2026-05-19 09:29 UTC · model grok-4.3
The pith
The Bregman proximal gradient method with entropic Legendre functions achieves O(1/n) convergence for linear optimization under entropic constraints.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish a convergence rate of O(1/n) in objective values for the Bregman proximal gradient method with entropic Legendre functions applied to linear optimization under entropic constraints. For a specific cost structure the framework provides a theoretical justification for the Blahut-Arimoto algorithm and the uniqueness of the Lagrange multiplier associated with the entropic constraint. In the active constraint setting, a bisection procedure approximates the strictly positive Lagrange multiplier.
What carries the argument
Bregman proximal gradient method with entropic Legendre functions that computes the proximal operator separately for active and inactive constraints.
If this is right
- The method converges at O(1/n) in objective values for general linear optimization under entropic constraints.
- For specific cost structures it justifies the Blahut-Arimoto algorithm and proves uniqueness of the Lagrange multiplier.
- A bisection procedure finds an approximation to the strictly positive Lagrange multiplier when the constraint is active.
- Numerical comparisons on game-theory examples confirm practical efficiency over standard solvers, including in higher dimensions.
Where Pith is reading between the lines
- The same proximal framework might yield similar rates for optimization problems with other convex constraints common in information theory.
- The uniqueness result for the multiplier could guide analysis of related iterative algorithms used in communication systems.
- Higher-dimensional implementations may scale better for large game-theory instances once the active-set handling is automated.
Load-bearing premise
The entropic Legendre function satisfies the smoothness and strong convexity properties needed for the Bregman proximal operator to be well-defined and single-valued at every iterate.
What would settle it
Numerical tests on a linear optimization problem with entropic constraints that show objective values converging slower than O(1/n) or a proximal operator that becomes multi-valued would disprove the claimed rate.
Figures
read the original abstract
In this paper, we present an efficient algorithm for solving a linear optimization problem with entropic constraints, a class of problems that arises in game theory and information theory. Our analysis distinguishes between the cases of active and inactive constraints, addressing each using a Bregman proximal gradient method with entropic Legendre functions, for which we establish a convergence rate of $O(1/n)$ in objective values. For a specific cost structure, our framework provides a theoretical justification for the well-known Blahut-Arimoto algorithm and the uniqueness of the Lagrange multiplier associated with the entropic constraint. In the active constraint setting, we include a bisection procedure to approximate the strictly positive Lagrange multiplier. The efficiency of the proposed method is illustrated through comparisons with standard optimization solvers on a representative example from game theory, including extensions to higher-dimensional settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a Bregman proximal gradient method using entropic Legendre functions to solve linear optimization problems subject to entropic constraints. It separates the analysis into inactive-constraint and active-constraint regimes, proves an O(1/n) convergence rate in objective values for both, recovers the Blahut-Arimoto algorithm together with uniqueness of the associated Lagrange multiplier under a specific cost structure, and augments the active-constraint case with a bisection routine that approximates the strictly positive multiplier. Numerical comparisons against standard solvers on a game-theoretic example, including higher-dimensional instances, are reported.
Significance. If the convergence analysis and the handling of the active-constraint multiplier are made fully rigorous, the work supplies a theoretically justified first-order method for a practically relevant class of problems arising in information theory and game theory. The explicit O(1/n) rate and the recovery of Blahut-Arimoto as a special case constitute concrete contributions; the numerical illustrations further support practical utility.
major comments (2)
- [Active-constraint case] Active-constraint case (description of the bisection procedure): the claimed O(1/n) rate is stated for the implementable algorithm that employs bisection to approximate the Lagrange multiplier. No propagation bound is supplied showing how a fixed bisection tolerance affects the objective error, nor is it required that the tolerance decrease with the outer iteration index. Because an additive perturbation of the multiplier produces a persistent bias in the linear term of the proximal step, the stated rate may hold only for the idealized exact-multiplier algorithm rather than the procedure actually described.
- [Convergence analysis] Convergence analysis (assumptions on the entropic Legendre function): the proofs rely on the mirror map satisfying the smoothness and strong-convexity conditions that guarantee the Bregman proximal operator is well-defined and single-valued at every iterate. An explicit verification or citation establishing these properties under the entropic constraint (rather than leaving them implicit) is needed to confirm that the standard Bregman proximal-gradient theory applies directly.
minor comments (2)
- The abstract states that the method extends to higher-dimensional settings; a brief discussion of how the per-iteration cost scales with dimension would clarify practical applicability.
- [Numerical experiments] In the numerical section, the precise dimensions and constraint parameters used in the comparisons with off-the-shelf solvers should be stated explicitly so that the reported speed-ups can be reproduced.
Simulated Author's Rebuttal
We thank the referee for the thorough review and valuable suggestions that help improve the rigor of the manuscript. We address each major comment in detail below, indicating the planned revisions.
read point-by-point responses
-
Referee: [Active-constraint case] Active-constraint case (description of the bisection procedure): the claimed O(1/n) rate is stated for the implementable algorithm that employs bisection to approximate the Lagrange multiplier. No propagation bound is supplied showing how a fixed bisection tolerance affects the objective error, nor is it required that the tolerance decrease with the outer iteration index. Because an additive perturbation of the multiplier produces a persistent bias in the linear term of the proximal step, the stated rate may hold only for the idealized exact-multiplier algorithm rather than the procedure actually described.
Authors: We agree that the current presentation does not fully rigorize the rate for the approximate-multiplier procedure. The analysis in the manuscript establishes O(1/n) convergence assuming an exact positive Lagrange multiplier is available. For the bisection-based implementation, a fixed tolerance indeed introduces a persistent bias in the linear term of the proximal mapping, which prevents the objective error from vanishing at the full rate. In the revision we will add an explicit error-propagation lemma that bounds the additional objective error in terms of the multiplier approximation tolerance. We will further specify that the bisection tolerance must be driven to zero at a suitable rate (e.g., O(1/n)) to preserve the overall O(1/n) guarantee, and we will update the algorithm statement and convergence theorem accordingly. Numerical experiments already indicate that a small fixed tolerance yields near-optimal practical performance; the new analysis will quantify the theoretical trade-off. revision: yes
-
Referee: [Convergence analysis] Convergence analysis (assumptions on the entropic Legendre function): the proofs rely on the mirror map satisfying the smoothness and strong-convexity conditions that guarantee the Bregman proximal operator is well-defined and single-valued at every iterate. An explicit verification or citation establishing these properties under the entropic constraint (rather than leaving them implicit) is needed to confirm that the standard Bregman proximal-gradient theory applies directly.
Authors: We acknowledge that the manuscript leaves the requisite properties of the entropic Legendre function implicit. The function in question is the negative entropy restricted to the probability simplex, which is known to be 1-strongly convex with respect to the L1 norm on the relative interior and to induce the Kullback-Leibler divergence as its Bregman distance. In the revised version we will insert a short dedicated subsection (or appendix) that explicitly verifies (i) strong convexity and smoothness constants, (ii) the domain on which the Bregman proximal operator is single-valued and well-defined, and (iii) the fact that the standard Bregman proximal-gradient convergence theory therefore applies verbatim. We will also add a reference to the literature on entropic mirror maps to support these claims. revision: yes
Circularity Check
No circularity: standard proximal-gradient analysis applied to entropic problem
full rationale
The paper applies the established Bregman proximal gradient framework with entropic Legendre functions to a linear program under entropic constraints, deriving the O(1/n) rate directly from the smoothness/strong-convexity properties of the mirror map and standard convex analysis (no fitted parameters renamed as predictions). The bisection procedure for the active-constraint Lagrange multiplier is introduced as a practical approximation without altering the core convergence claim, which remains independent of the numerical examples or self-citations. The Blahut-Arimoto justification arises by specializing the cost structure inside the same framework rather than redefining the algorithm via its own outputs. All load-bearing steps rely on external properties of Bregman divergences and proximal operators, making the derivation self-contained.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The entropic Legendre function generates a Bregman divergence that is smooth and strongly convex on the relevant domain, allowing the proximal operator to be single-valued.
- domain assumption For the specific cost structure considered, the Lagrange multiplier associated with the entropic constraint is unique and strictly positive when active.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
Bregman proximal gradient method with entropic Legendre functions … ergodic convergence rate of O(1/n) in objective values … recovers the Blahut-Arimoto algorithm
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
IEEE Transactions on Information Theory 18(1), 14–20 (1972)
Arimoto, S.: An algorithm for computing the capacity of arbitrary discrete memoryless channels. IEEE Transactions on Information Theory 18(1), 14–20 (1972)
work page 1972
-
[2]
Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017)
work page 2017
-
[3]
Springer International Publishing AG (2017)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, second edn. Springer International Publishing AG (2017)
work page 2017
-
[4]
Operations Research Letters 31(3), 167–175 (2003)
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters 31(3), 167–175 (2003)
work page 2003
-
[5]
IEEE Transactions on Information Theory 18(4), 460–473 (1972)
Blahut, R.: Computation of channel capacity and rate-distortion functions. IEEE Transactions on Information Theory 18(4), 460–473 (1972)
work page 1972
-
[6]
Cover, T.M., Thomas, J.A.: Elements of information theory. 2nd. Ed., Wiley-Interscience, New York (2006)
work page 2006
-
[7]
IEEE Transactions on Information Theory 20(1), 122–124 (1974)
Csiszar, I.: On the computation of rate-distortion functions (corresp.). IEEE Transactions on Information Theory 20(1), 122–124 (1974)
work page 1974
-
[8]
IEEE Transactions on Information Theory 56(9), 4181–4206 (2010)
Cuff, P., Permuter, H., Cover, T.: Coordination capacity. IEEE Transactions on Information Theory 56(9), 4181–4206 (2010)
work page 2010
-
[9]
IEEE Information Theory Workshop (ITW), 467–471 (2011)
Cuff, P., Zhao, L.: Coordination using implicit communication. IEEE Information Theory Workshop (ITW), 467–471 (2011)
work page 2011
-
[10]
Mathematics of Operations Research 49(1), 78–106 (2024)
Doval, L., Skreta, V.: Constrained information design. Mathematics of Operations Research 49(1), 78–106 (2024)
work page 2024
-
[11]
Cambridge University Press (2011)
El Gamal, A., Kim, Y.H.: Network Information Theory. Cambridge University Press (2011)
work page 2011
-
[12]
Econometrica 74(6), 1603–1636 (2006)
Gossner, O., Hernandez, P., Neyman, A.: Optimal use of communication resources. Econometrica 74(6), 1603–1636 (2006)
work page 2006
-
[13]
Gossner, O., Laraki, R., Tomala, T.: Informationally optimal correlation. Mathematical Programming 116(1-2), 147– 172 (2009) Bregman proximal gradient method for linear optimization under entropic constraints 17
work page 2009
-
[14]
Mathematics of Operation Research 31(1), 13–30 (2006)
Gossner, O., Tomala, T.: Empirical distributions of beliefs under imperfect observation. Mathematics of Operation Research 31(1), 13–30 (2006)
work page 2006
-
[15]
Mathematics of Operation Research 32(2), 413–424 (2007)
Gossner, O., Tomala, T.: Secret correlation in repeated games with imperfect monitoring. Mathematics of Operation Research 32(2), 413–424 (2007)
work page 2007
-
[16]
Gossner, O., Vieille, N.: How to play with a biased coin? Games and Economic Behavior 41(2), 206–226 (2002)
work page 2002
-
[17]
Springer Berlin, Heidelberg (1993)
Hiriart-Urruty, J.B., Lemar´ echal, C.: Convex analysis and minimization algorithms I: Fundamentals. Springer Berlin, Heidelberg (1993)
work page 1993
-
[18]
IEEE Transactions on Information Theory 63(8), 5087–5114 (2017)
Le Treust, M.: Joint empirical coordination of source and channel. IEEE Transactions on Information Theory 63(8), 5087–5114 (2017)
work page 2017
-
[19]
Journal of Economic Theory 184, 104,940 (2019)
Le Treust, M., Tomala, T.: Persuasion with limited communication capacity. Journal of Economic Theory 184, 104,940 (2019)
work page 2019
-
[20]
USSR Computational Mathematics and Mathematical Physics 6(5), 1–50 (1966)
Levitin, E.S., Polyak, B.T.: Constrained minimization methods. USSR Computational Mathematics and Mathematical Physics 6(5), 1–50 (1966)
work page 1966
-
[21]
Journal of Economic Literature 61(1), 226–73 (2023)
Ma´ ckowiak, B., Matˇ ejka, F., Wiederholt, M.: Rational inattention: A review. Journal of Economic Literature 61(1), 226–73 (2023)
work page 2023
-
[22]
American Economic Review 105(1), 272–98 (2015)
Matˇ ejka, F., McKay, A.: Rational inattention to discrete choices: A new foundation for the multinomial logit model. American Economic Review 105(1), 272–98 (2015)
work page 2015
-
[23]
Games and Economic Behavior 29(1–2), 191–223 (1999)
Neyman, A., Okada, D.: Strategic entropy and complexity in repeated games. Games and Economic Behavior 29(1–2), 191–223 (1999)
work page 1999
-
[24]
Games and Economic Behavior 30(2), 228–247 (2000)
Neyman, A., Okada, D.: Repeated games with bounded entropy. Games and Economic Behavior 30(2), 228–247 (2000)
work page 2000
-
[25]
IRE National Convention Record, Part 4 pp
Shannon, C.: Coding theorems for a discrete source with a fidelity criterion. IRE National Convention Record, Part 4 pp. 142–163 (1959)
work page 1959
-
[26]
Journal of Monetary Economics 50(3), 665–690 (2003)
Sims, C.: Implication of rational inattention. Journal of Monetary Economics 50(3), 665–690 (2003)
work page 2003
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.