Bregman proximal gradient method for linear optimization under entropic constraints

Luis M. Brice\~no-Arias; Ma\"el Le Treust

arxiv: 2506.10849 · v4 · submitted 2025-06-12 · 🧮 math.OC

Bregman proximal gradient method for linear optimization under entropic constraints

Luis M. Brice\~no-Arias , Ma\"el Le Treust This is my paper

Pith reviewed 2026-05-19 09:29 UTC · model grok-4.3

classification 🧮 math.OC

keywords Bregman proximal gradiententropic constraintslinear optimizationO(1/n) convergenceBlahut-Arimoto algorithmLagrange multipliergame theoryinformation theory

0 comments

The pith

The Bregman proximal gradient method with entropic Legendre functions achieves O(1/n) convergence for linear optimization under entropic constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an algorithm for linear optimization problems that incorporate entropic constraints, which arise in game theory and information theory. It applies a Bregman proximal gradient method using entropic Legendre functions and proves an O(1/n) rate of convergence in the objective values. The method treats active and inactive constraints differently, and for particular cost structures it justifies the Blahut-Arimoto algorithm along with the uniqueness of the associated Lagrange multiplier. A bisection procedure helps approximate the multiplier when the constraint is active. This provides an efficient alternative to standard solvers, as shown in numerical examples from game theory.

Core claim

The authors establish a convergence rate of O(1/n) in objective values for the Bregman proximal gradient method with entropic Legendre functions applied to linear optimization under entropic constraints. For a specific cost structure the framework provides a theoretical justification for the Blahut-Arimoto algorithm and the uniqueness of the Lagrange multiplier associated with the entropic constraint. In the active constraint setting, a bisection procedure approximates the strictly positive Lagrange multiplier.

What carries the argument

Bregman proximal gradient method with entropic Legendre functions that computes the proximal operator separately for active and inactive constraints.

If this is right

The method converges at O(1/n) in objective values for general linear optimization under entropic constraints.
For specific cost structures it justifies the Blahut-Arimoto algorithm and proves uniqueness of the Lagrange multiplier.
A bisection procedure finds an approximation to the strictly positive Lagrange multiplier when the constraint is active.
Numerical comparisons on game-theory examples confirm practical efficiency over standard solvers, including in higher dimensions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same proximal framework might yield similar rates for optimization problems with other convex constraints common in information theory.
The uniqueness result for the multiplier could guide analysis of related iterative algorithms used in communication systems.
Higher-dimensional implementations may scale better for large game-theory instances once the active-set handling is automated.

Load-bearing premise

The entropic Legendre function satisfies the smoothness and strong convexity properties needed for the Bregman proximal operator to be well-defined and single-valued at every iterate.

What would settle it

Numerical tests on a linear optimization problem with entropic constraints that show objective values converging slower than O(1/n) or a proximal operator that becomes multi-valued would disprove the claimed rate.

Figures

Figures reproduced from arXiv: 2506.10849 by Luis M. Brice\~no-Arias, Ma\"el Le Treust.

**Figure 2.** Figure 2: Extended version of the cost function of [12]. [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of the performances of Algorithm 1 and of fmincon. [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Convergence time of Algorithm 1 for 10 randomly selected costs functions. [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

read the original abstract

In this paper, we present an efficient algorithm for solving a linear optimization problem with entropic constraints, a class of problems that arises in game theory and information theory. Our analysis distinguishes between the cases of active and inactive constraints, addressing each using a Bregman proximal gradient method with entropic Legendre functions, for which we establish a convergence rate of $O(1/n)$ in objective values. For a specific cost structure, our framework provides a theoretical justification for the well-known Blahut-Arimoto algorithm and the uniqueness of the Lagrange multiplier associated with the entropic constraint. In the active constraint setting, we include a bisection procedure to approximate the strictly positive Lagrange multiplier. The efficiency of the proposed method is illustrated through comparisons with standard optimization solvers on a representative example from game theory, including extensions to higher-dimensional settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a Bregman proximal gradient scheme with entropic mirror map for linear programs under entropy constraints, plus a bisection step for the active multiplier and a recovery of Blahut-Arimoto in a special case.

read the letter

The paper develops a Bregman proximal gradient method that uses the entropic Legendre function as the mirror map for linear optimization under entropic constraints. It separates the active and inactive constraint cases, states an O(1/n) rate on the objective, and shows that the framework recovers the Blahut-Arimoto algorithm together with uniqueness of the multiplier for one particular cost structure. A bisection procedure is added to approximate the positive Lagrange multiplier when the constraint is active, and the abstract reports numerical checks against standard solvers on a game-theory example, including higher-dimensional versions.

Referee Report

2 major / 2 minor

Summary. The paper proposes a Bregman proximal gradient method using entropic Legendre functions to solve linear optimization problems subject to entropic constraints. It separates the analysis into inactive-constraint and active-constraint regimes, proves an O(1/n) convergence rate in objective values for both, recovers the Blahut-Arimoto algorithm together with uniqueness of the associated Lagrange multiplier under a specific cost structure, and augments the active-constraint case with a bisection routine that approximates the strictly positive multiplier. Numerical comparisons against standard solvers on a game-theoretic example, including higher-dimensional instances, are reported.

Significance. If the convergence analysis and the handling of the active-constraint multiplier are made fully rigorous, the work supplies a theoretically justified first-order method for a practically relevant class of problems arising in information theory and game theory. The explicit O(1/n) rate and the recovery of Blahut-Arimoto as a special case constitute concrete contributions; the numerical illustrations further support practical utility.

major comments (2)

[Active-constraint case] Active-constraint case (description of the bisection procedure): the claimed O(1/n) rate is stated for the implementable algorithm that employs bisection to approximate the Lagrange multiplier. No propagation bound is supplied showing how a fixed bisection tolerance affects the objective error, nor is it required that the tolerance decrease with the outer iteration index. Because an additive perturbation of the multiplier produces a persistent bias in the linear term of the proximal step, the stated rate may hold only for the idealized exact-multiplier algorithm rather than the procedure actually described.
[Convergence analysis] Convergence analysis (assumptions on the entropic Legendre function): the proofs rely on the mirror map satisfying the smoothness and strong-convexity conditions that guarantee the Bregman proximal operator is well-defined and single-valued at every iterate. An explicit verification or citation establishing these properties under the entropic constraint (rather than leaving them implicit) is needed to confirm that the standard Bregman proximal-gradient theory applies directly.

minor comments (2)

The abstract states that the method extends to higher-dimensional settings; a brief discussion of how the per-iteration cost scales with dimension would clarify practical applicability.
[Numerical experiments] In the numerical section, the precise dimensions and constraint parameters used in the comparisons with off-the-shelf solvers should be stated explicitly so that the reported speed-ups can be reproduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and valuable suggestions that help improve the rigor of the manuscript. We address each major comment in detail below, indicating the planned revisions.

read point-by-point responses

Referee: [Active-constraint case] Active-constraint case (description of the bisection procedure): the claimed O(1/n) rate is stated for the implementable algorithm that employs bisection to approximate the Lagrange multiplier. No propagation bound is supplied showing how a fixed bisection tolerance affects the objective error, nor is it required that the tolerance decrease with the outer iteration index. Because an additive perturbation of the multiplier produces a persistent bias in the linear term of the proximal step, the stated rate may hold only for the idealized exact-multiplier algorithm rather than the procedure actually described.

Authors: We agree that the current presentation does not fully rigorize the rate for the approximate-multiplier procedure. The analysis in the manuscript establishes O(1/n) convergence assuming an exact positive Lagrange multiplier is available. For the bisection-based implementation, a fixed tolerance indeed introduces a persistent bias in the linear term of the proximal mapping, which prevents the objective error from vanishing at the full rate. In the revision we will add an explicit error-propagation lemma that bounds the additional objective error in terms of the multiplier approximation tolerance. We will further specify that the bisection tolerance must be driven to zero at a suitable rate (e.g., O(1/n)) to preserve the overall O(1/n) guarantee, and we will update the algorithm statement and convergence theorem accordingly. Numerical experiments already indicate that a small fixed tolerance yields near-optimal practical performance; the new analysis will quantify the theoretical trade-off. revision: yes
Referee: [Convergence analysis] Convergence analysis (assumptions on the entropic Legendre function): the proofs rely on the mirror map satisfying the smoothness and strong-convexity conditions that guarantee the Bregman proximal operator is well-defined and single-valued at every iterate. An explicit verification or citation establishing these properties under the entropic constraint (rather than leaving them implicit) is needed to confirm that the standard Bregman proximal-gradient theory applies directly.

Authors: We acknowledge that the manuscript leaves the requisite properties of the entropic Legendre function implicit. The function in question is the negative entropy restricted to the probability simplex, which is known to be 1-strongly convex with respect to the L1 norm on the relative interior and to induce the Kullback-Leibler divergence as its Bregman distance. In the revised version we will insert a short dedicated subsection (or appendix) that explicitly verifies (i) strong convexity and smoothness constants, (ii) the domain on which the Bregman proximal operator is single-valued and well-defined, and (iii) the fact that the standard Bregman proximal-gradient convergence theory therefore applies verbatim. We will also add a reference to the literature on entropic mirror maps to support these claims. revision: yes

Circularity Check

0 steps flagged

No circularity: standard proximal-gradient analysis applied to entropic problem

full rationale

The paper applies the established Bregman proximal gradient framework with entropic Legendre functions to a linear program under entropic constraints, deriving the O(1/n) rate directly from the smoothness/strong-convexity properties of the mirror map and standard convex analysis (no fitted parameters renamed as predictions). The bisection procedure for the active-constraint Lagrange multiplier is introduced as a practical approximation without altering the core convergence claim, which remains independent of the numerical examples or self-citations. The Blahut-Arimoto justification arises by specializing the cost structure inside the same framework rather than redefining the algorithm via its own outputs. All load-bearing steps rely on external properties of Bregman divergences and proximal operators, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard properties of Bregman divergences generated by entropic Legendre functions and on the existence of a unique positive Lagrange multiplier when the constraint is active; no new entities are postulated and no parameters appear to be fitted to data.

axioms (2)

domain assumption The entropic Legendre function generates a Bregman divergence that is smooth and strongly convex on the relevant domain, allowing the proximal operator to be single-valued.
Invoked to guarantee well-defined iterates and the stated convergence rate.
domain assumption For the specific cost structure considered, the Lagrange multiplier associated with the entropic constraint is unique and strictly positive when active.
Required for the bisection procedure and the Blahut-Arimoto recovery.

pith-pipeline@v0.9.0 · 5672 in / 1537 out tokens · 39352 ms · 2026-05-19T09:29:53.190527+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Bregman proximal gradient method with entropic Legendre functions … ergodic convergence rate of O(1/n) in objective values … recovers the Blahut-Arimoto algorithm

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

[1]

IEEE Transactions on Information Theory 18(1), 14–20 (1972)

Arimoto, S.: An algorithm for computing the capacity of arbitrary discrete memoryless channels. IEEE Transactions on Information Theory 18(1), 14–20 (1972)

work page 1972
[2]

Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017)

work page 2017
[3]

Springer International Publishing AG (2017)

Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, second edn. Springer International Publishing AG (2017)

work page 2017
[4]

Operations Research Letters 31(3), 167–175 (2003)

Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters 31(3), 167–175 (2003)

work page 2003
[5]

IEEE Transactions on Information Theory 18(4), 460–473 (1972)

Blahut, R.: Computation of channel capacity and rate-distortion functions. IEEE Transactions on Information Theory 18(4), 460–473 (1972)

work page 1972
[6]

Cover, T.M., Thomas, J.A.: Elements of information theory. 2nd. Ed., Wiley-Interscience, New York (2006)

work page 2006
[7]

IEEE Transactions on Information Theory 20(1), 122–124 (1974)

Csiszar, I.: On the computation of rate-distortion functions (corresp.). IEEE Transactions on Information Theory 20(1), 122–124 (1974)

work page 1974
[8]

IEEE Transactions on Information Theory 56(9), 4181–4206 (2010)

Cuff, P., Permuter, H., Cover, T.: Coordination capacity. IEEE Transactions on Information Theory 56(9), 4181–4206 (2010)

work page 2010
[9]

IEEE Information Theory Workshop (ITW), 467–471 (2011)

Cuff, P., Zhao, L.: Coordination using implicit communication. IEEE Information Theory Workshop (ITW), 467–471 (2011)

work page 2011
[10]

Mathematics of Operations Research 49(1), 78–106 (2024)

Doval, L., Skreta, V.: Constrained information design. Mathematics of Operations Research 49(1), 78–106 (2024)

work page 2024
[11]

Cambridge University Press (2011)

El Gamal, A., Kim, Y.H.: Network Information Theory. Cambridge University Press (2011)

work page 2011
[12]

Econometrica 74(6), 1603–1636 (2006)

Gossner, O., Hernandez, P., Neyman, A.: Optimal use of communication resources. Econometrica 74(6), 1603–1636 (2006)

work page 2006
[13]

Mathematical Programming 116(1-2), 147– 172 (2009) Bregman proximal gradient method for linear optimization under entropic constraints 17

Gossner, O., Laraki, R., Tomala, T.: Informationally optimal correlation. Mathematical Programming 116(1-2), 147– 172 (2009) Bregman proximal gradient method for linear optimization under entropic constraints 17

work page 2009
[14]

Mathematics of Operation Research 31(1), 13–30 (2006)

Gossner, O., Tomala, T.: Empirical distributions of beliefs under imperfect observation. Mathematics of Operation Research 31(1), 13–30 (2006)

work page 2006
[15]

Mathematics of Operation Research 32(2), 413–424 (2007)

Gossner, O., Tomala, T.: Secret correlation in repeated games with imperfect monitoring. Mathematics of Operation Research 32(2), 413–424 (2007)

work page 2007
[16]

Gossner, O., Vieille, N.: How to play with a biased coin? Games and Economic Behavior 41(2), 206–226 (2002)

work page 2002
[17]

Springer Berlin, Heidelberg (1993)

Hiriart-Urruty, J.B., Lemar´ echal, C.: Convex analysis and minimization algorithms I: Fundamentals. Springer Berlin, Heidelberg (1993)

work page 1993
[18]

IEEE Transactions on Information Theory 63(8), 5087–5114 (2017)

Le Treust, M.: Joint empirical coordination of source and channel. IEEE Transactions on Information Theory 63(8), 5087–5114 (2017)

work page 2017
[19]

Journal of Economic Theory 184, 104,940 (2019)

Le Treust, M., Tomala, T.: Persuasion with limited communication capacity. Journal of Economic Theory 184, 104,940 (2019)

work page 2019
[20]

USSR Computational Mathematics and Mathematical Physics 6(5), 1–50 (1966)

Levitin, E.S., Polyak, B.T.: Constrained minimization methods. USSR Computational Mathematics and Mathematical Physics 6(5), 1–50 (1966)

work page 1966
[21]

Journal of Economic Literature 61(1), 226–73 (2023)

Ma´ ckowiak, B., Matˇ ejka, F., Wiederholt, M.: Rational inattention: A review. Journal of Economic Literature 61(1), 226–73 (2023)

work page 2023
[22]

American Economic Review 105(1), 272–98 (2015)

Matˇ ejka, F., McKay, A.: Rational inattention to discrete choices: A new foundation for the multinomial logit model. American Economic Review 105(1), 272–98 (2015)

work page 2015
[23]

Games and Economic Behavior 29(1–2), 191–223 (1999)

Neyman, A., Okada, D.: Strategic entropy and complexity in repeated games. Games and Economic Behavior 29(1–2), 191–223 (1999)

work page 1999
[24]

Games and Economic Behavior 30(2), 228–247 (2000)

Neyman, A., Okada, D.: Repeated games with bounded entropy. Games and Economic Behavior 30(2), 228–247 (2000)

work page 2000
[25]

IRE National Convention Record, Part 4 pp

Shannon, C.: Coding theorems for a discrete source with a fidelity criterion. IRE National Convention Record, Part 4 pp. 142–163 (1959)

work page 1959
[26]

Journal of Monetary Economics 50(3), 665–690 (2003)

Sims, C.: Implication of rational inattention. Journal of Monetary Economics 50(3), 665–690 (2003)

work page 2003

[1] [1]

IEEE Transactions on Information Theory 18(1), 14–20 (1972)

Arimoto, S.: An algorithm for computing the capacity of arbitrary discrete memoryless channels. IEEE Transactions on Information Theory 18(1), 14–20 (1972)

work page 1972

[2] [2]

Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017)

work page 2017

[3] [3]

Springer International Publishing AG (2017)

Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, second edn. Springer International Publishing AG (2017)

work page 2017

[4] [4]

Operations Research Letters 31(3), 167–175 (2003)

Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters 31(3), 167–175 (2003)

work page 2003

[5] [5]

IEEE Transactions on Information Theory 18(4), 460–473 (1972)

Blahut, R.: Computation of channel capacity and rate-distortion functions. IEEE Transactions on Information Theory 18(4), 460–473 (1972)

work page 1972

[6] [6]

Cover, T.M., Thomas, J.A.: Elements of information theory. 2nd. Ed., Wiley-Interscience, New York (2006)

work page 2006

[7] [7]

IEEE Transactions on Information Theory 20(1), 122–124 (1974)

Csiszar, I.: On the computation of rate-distortion functions (corresp.). IEEE Transactions on Information Theory 20(1), 122–124 (1974)

work page 1974

[8] [8]

IEEE Transactions on Information Theory 56(9), 4181–4206 (2010)

Cuff, P., Permuter, H., Cover, T.: Coordination capacity. IEEE Transactions on Information Theory 56(9), 4181–4206 (2010)

work page 2010

[9] [9]

IEEE Information Theory Workshop (ITW), 467–471 (2011)

Cuff, P., Zhao, L.: Coordination using implicit communication. IEEE Information Theory Workshop (ITW), 467–471 (2011)

work page 2011

[10] [10]

Mathematics of Operations Research 49(1), 78–106 (2024)

Doval, L., Skreta, V.: Constrained information design. Mathematics of Operations Research 49(1), 78–106 (2024)

work page 2024

[11] [11]

Cambridge University Press (2011)

El Gamal, A., Kim, Y.H.: Network Information Theory. Cambridge University Press (2011)

work page 2011

[12] [12]

Econometrica 74(6), 1603–1636 (2006)

Gossner, O., Hernandez, P., Neyman, A.: Optimal use of communication resources. Econometrica 74(6), 1603–1636 (2006)

work page 2006

[13] [13]

Mathematical Programming 116(1-2), 147– 172 (2009) Bregman proximal gradient method for linear optimization under entropic constraints 17

Gossner, O., Laraki, R., Tomala, T.: Informationally optimal correlation. Mathematical Programming 116(1-2), 147– 172 (2009) Bregman proximal gradient method for linear optimization under entropic constraints 17

work page 2009

[14] [14]

Mathematics of Operation Research 31(1), 13–30 (2006)

Gossner, O., Tomala, T.: Empirical distributions of beliefs under imperfect observation. Mathematics of Operation Research 31(1), 13–30 (2006)

work page 2006

[15] [15]

Mathematics of Operation Research 32(2), 413–424 (2007)

Gossner, O., Tomala, T.: Secret correlation in repeated games with imperfect monitoring. Mathematics of Operation Research 32(2), 413–424 (2007)

work page 2007

[16] [16]

Gossner, O., Vieille, N.: How to play with a biased coin? Games and Economic Behavior 41(2), 206–226 (2002)

work page 2002

[17] [17]

Springer Berlin, Heidelberg (1993)

Hiriart-Urruty, J.B., Lemar´ echal, C.: Convex analysis and minimization algorithms I: Fundamentals. Springer Berlin, Heidelberg (1993)

work page 1993

[18] [18]

IEEE Transactions on Information Theory 63(8), 5087–5114 (2017)

Le Treust, M.: Joint empirical coordination of source and channel. IEEE Transactions on Information Theory 63(8), 5087–5114 (2017)

work page 2017

[19] [19]

Journal of Economic Theory 184, 104,940 (2019)

Le Treust, M., Tomala, T.: Persuasion with limited communication capacity. Journal of Economic Theory 184, 104,940 (2019)

work page 2019

[20] [20]

USSR Computational Mathematics and Mathematical Physics 6(5), 1–50 (1966)

Levitin, E.S., Polyak, B.T.: Constrained minimization methods. USSR Computational Mathematics and Mathematical Physics 6(5), 1–50 (1966)

work page 1966

[21] [21]

Journal of Economic Literature 61(1), 226–73 (2023)

Ma´ ckowiak, B., Matˇ ejka, F., Wiederholt, M.: Rational inattention: A review. Journal of Economic Literature 61(1), 226–73 (2023)

work page 2023

[22] [22]

American Economic Review 105(1), 272–98 (2015)

Matˇ ejka, F., McKay, A.: Rational inattention to discrete choices: A new foundation for the multinomial logit model. American Economic Review 105(1), 272–98 (2015)

work page 2015

[23] [23]

Games and Economic Behavior 29(1–2), 191–223 (1999)

Neyman, A., Okada, D.: Strategic entropy and complexity in repeated games. Games and Economic Behavior 29(1–2), 191–223 (1999)

work page 1999

[24] [24]

Games and Economic Behavior 30(2), 228–247 (2000)

Neyman, A., Okada, D.: Repeated games with bounded entropy. Games and Economic Behavior 30(2), 228–247 (2000)

work page 2000

[25] [25]

IRE National Convention Record, Part 4 pp

Shannon, C.: Coding theorems for a discrete source with a fidelity criterion. IRE National Convention Record, Part 4 pp. 142–163 (1959)

work page 1959

[26] [26]

Journal of Monetary Economics 50(3), 665–690 (2003)

Sims, C.: Implication of rational inattention. Journal of Monetary Economics 50(3), 665–690 (2003)

work page 2003