A Scale-Shape Dual Newton Method for Entropic Least Squares

James V. Burke; Michael P. Friedlander; Nicholas Barnfield; Tim Hoheisel

arxiv: 2604.27154 · v1 · submitted 2026-04-29 · 🧮 math.OC · cs.NA· math.NA

A Scale-Shape Dual Newton Method for Entropic Least Squares

Nicholas Barnfield , James V. Burke , Michael P. Friedlander , Tim Hoheisel This is my paper

Pith reviewed 2026-05-07 08:36 UTC · model grok-4.3

classification 🧮 math.OC cs.NAmath.NA

keywords entropic least squaresdual Newton methodscale-shape decompositionLambert W boundsglobal linear convergenceoverflow resiliencenonnegative orthant

0 comments

The pith

A scale-shape decomposition yields a dual Newton method for entropy-regularized nonnegative least squares that converges globally linearly with O(log ε^{-1}) iterations and locally quadratically while avoiding overflow.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a damped inexact Newton method for entropy-regularized least squares problems where variables are constrained to be nonnegative. It establishes global linear convergence with an iteration count proportional to the logarithm of the inverse accuracy, transitioning to superlinear and then quadratic convergence near the solution. The key innovation is a scale-shape decomposition of the primal solution that leads to a dual formulation with a nonsingular Jacobian whose eigenvalues are bounded using the Lambert W function. This control prevents numerical overflow in the log-sum-exp and softmax evaluations used to compute the objective and derivatives. Consequently, the solution map is shown to be jointly Lipschitz continuous in the input data, regularization parameter, and reference measure, with a continuous extension even when regularization is removed.

Core claim

We give a damped inexact Newton method for entropy-regularized least-squares on the nonnegative orthant that converges globally at a linear rate with O(log ε^{-1}) iteration complexity, locally at a superlinear-to-quadratic rate, and is immune to the finite-precision overflow that limits classical dual solvers. A scale-shape decomposition of the primal -- separating its scale from its direction -- produces a dual with a nonsingular Jacobian. Objectives and Jacobians are evaluated through stable log-sum-exp and softmax primitives. Lambert W bounds on the scale uniformly control the Jacobian's spectrum, from which both rates follow. The solution map is jointly Lipschitz in the data, regular 0

What carries the argument

The scale-shape decomposition of the primal variable into a scalar scale and a shape vector on the unit simplex, which produces a dual problem whose Jacobian is nonsingular and has its spectrum uniformly bounded via Lambert W function estimates on the scale.

Load-bearing premise

The decomposition into scale and shape produces a dual Jacobian whose eigenvalues stay bounded away from zero and infinity by constants derived from the Lambert W function applied to the scale.

What would settle it

An instance of the entropic least-squares problem in which the Newton steps fail to achieve the predicted linear global convergence rate, or in which the Jacobian spectrum violates the Lambert W bounds, or where overflow occurs despite the use of stable log-sum-exp and softmax primitives.

Figures

Figures reproduced from arXiv: 2604.27154 by James V. Burke, Michael P. Friedlander, Nicholas Barnfield, Tim Hoheisel.

**Figure 7.1.** Figure 7.1: Structure of the UEG test problem. (a) Singular values of the periodic view at source ↗

**Figure 7.2.** Figure 7.2: Iteration trajectories at λ = 10−5 , τ0 = 1. Top row, overflow resilience (subsection 7.1), Z ∈ { 2 4 , 2 6 , 2 8 , 2 10 }: (a) maximum componentwise argument maxj (A⊺y k − 1 − c)j of the exponential in ∇ψd under full (undamped) Newton steps. (b) classical dual Newton gradient ∥∇ψd(y k )∥ with Armijo backtracking plus overflow safeguard. (c) Algorithm 3.1 merit ρ(z k ) = ∥F(z k )∥. Bottom row, scale reco… view at source ↗

**Figure 7.3.** Figure 7.3: Relative data-fit residual ∥Ax∗ − b∥/∥b∥ versus λ (log–log) for Z ∈ { 100 , 101 , 102 }. Each point is an independent solve from τ0 = 1. 7.3. Regularization path view at source ↗

read the original abstract

We give a damped inexact Newton method for entropy-regularized least-squares on the nonnegative orthant that converges globally at a linear rate with $O(\log\epsilon^{-1})$ iteration complexity, locally at a superlinear-to-quadratic rate, and is immune to the finite-precision overflow that limits classical dual solvers. A scale-shape decomposition of the primal -- separating its scale from its direction -- produces a dual with a nonsingular Jacobian. Objectives and Jacobians are evaluated through stable log-sum-exp and softmax primitives. Lambert W bounds on the scale uniformly control the Jacobian's spectrum, from which both rates follow. The solution map is jointly Lipschitz in the data, regularization parameter, and reference measure, and extends continuously to the vanishing-regularization limit. Experiments on a problem from analytic continuation of quantum Monte Carlo data confirm the predicted overflow resilience and convergence behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The scale-shape split and Lambert W spectrum bounds give a stable dual Newton solver for entropic least squares with claimed global linear and local superlinear rates, but the uniform control may break under nonuniform reference measures.

read the letter

The paper's main advance is a damped inexact Newton method for entropy-regularized nonnegative least squares that stays numerically stable by evaluating objectives and Jacobians with log-sum-exp and softmax. The scale-shape decomposition separates the primal scale from its direction, yielding a dual whose Jacobian is nonsingular and whose eigenvalues are bounded solely through Lambert W functions of the scale variable. From those bounds the authors derive global linear convergence in O(log 1/eps) iterations and local superlinear-to-quadratic rates, plus a joint Lipschitz property of the solution map that extends continuously to vanishing regularization. Experiments on analytic continuation of quantum Monte Carlo data confirm the overflow resilience and observed convergence behavior. That combination of stability and explicit rates is the concrete contribution worth noting. The soft spot is the spectral control claim. The stress-test concern is real: when the reference measure has heterogeneous weights, the softmax-weighted terms in the Jacobian can introduce additional eigenvalue spread that the Lambert W envelope on scale alone does not obviously contain. If the proofs in the full text only establish the bound for uniform mu or leave the shape-induced conditioning unquantified, both the global linear rate and the local superlinear rate lose their guarantee. The abstract asserts uniform control, so the paper needs to show explicitly that the decomposition decouples the shape component. This work is for people who solve entropic inverse problems or fit physics data and need a reliable, overflow-proof solver rather than a broad theoretical breakthrough. A reader who already works in dual Newton methods for regularized least squares will get immediate practical value from the stability primitives and the numerical tests. The paper deserves a serious referee because the method is fully specified, the claims are falsifiable, and the numerical evidence is present even if the general-case spectral analysis requires tightening.

Referee Report

2 major / 3 minor

Summary. The manuscript proposes a damped inexact Newton method for entropy-regularized least-squares problems on the nonnegative orthant. A scale-shape decomposition of the primal variable yields a dual formulation whose Jacobian is nonsingular, with its spectrum controlled uniformly by Lambert W bounds applied to the scale component. This structure is used to prove global linear convergence with O(log ε^{-1}) iteration complexity and local superlinear-to-quadratic convergence. The algorithm employs stable log-sum-exp and softmax primitives to achieve overflow immunity in finite precision. The solution map is shown to be jointly Lipschitz continuous in the data, regularization parameter, and reference measure, with continuous extension to the vanishing-regularization limit. Experiments on analytic continuation of quantum Monte Carlo data are presented to confirm the predicted stability and convergence behavior.

Significance. If the claimed rates and stability properties hold, the work would provide a theoretically grounded and numerically robust solver for entropic least-squares problems that arise in machine learning, statistics, and physics applications. The scale-shape decomposition combined with Lambert W spectral control is a novel analytical device that could extend to other entropy-regularized convex programs. The Lipschitz continuity result for the solution map is a strong property that supports sensitivity analysis and continuation methods. The experimental validation on a concrete scientific computing task adds practical weight to the theoretical claims.

major comments (2)

[§4.2, Theorem 4.1] §4.2, Theorem 4.1: The global linear convergence with O(log ε^{-1}) complexity rests on the claim that Lambert W bounds on the scale alone uniformly control the spectrum of the dual Jacobian for arbitrary reference measures μ. The derivation does not explicitly bound the contribution of the shape component and the softmax-weighted terms involving the data matrix A and nonuniform μ; if these terms can produce eigenvalues outside the Lambert W envelope, both the global rate and the local superlinear rate would fail to hold in general.
[§3.1, Eq. (8)] §3.1, Eq. (8): The nonsingularity of the dual Jacobian is asserted after the scale-shape split, but the argument that the resulting matrix remains well-conditioned independently of μ's heterogeneity is not accompanied by an explicit eigenvalue estimate or condition-number bound that excludes dependence on the shape variables. This is load-bearing for the choice of damping parameter and the inexact Newton step acceptance criterion.

minor comments (3)

[Abstract] Abstract: The local convergence statement 'superlinear-to-quadratic' is imprecise; the manuscript should state the precise conditions under which quadratic convergence is recovered (e.g., exact Newton steps near the solution).
[§5, Figure 1] §5, Figure 1: The convergence plots do not overlay the theoretical O(log ε^{-1}) reference line, making it impossible to visually confirm that the observed iteration counts match the predicted complexity.
[References] References: Several relevant works on dual Newton methods for entropic problems and on the use of Lambert W functions in optimization are missing; adding them would better situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for identifying points where the convergence analysis would benefit from greater explicitness. We address each major comment below and will revise the manuscript to incorporate the requested clarifications.

read point-by-point responses

Referee: [§4.2, Theorem 4.1] §4.2, Theorem 4.1: The global linear convergence with O(log ε^{-1}) complexity rests on the claim that Lambert W bounds on the scale alone uniformly control the spectrum of the dual Jacobian for arbitrary reference measures μ. The derivation does not explicitly bound the contribution of the shape component and the softmax-weighted terms involving the data matrix A and nonuniform μ; if these terms can produce eigenvalues outside the Lambert W envelope, both the global rate and the local superlinear rate would fail to hold in general.

Authors: The scale-shape decomposition writes the primal variable as the product of a scalar scale s and a normalized shape vector u lying in the probability simplex. This normalization implies that all softmax-weighted inner products with A and μ are convex combinations whose spectral radii are bounded by constants independent of the particular u and of the heterogeneity of μ. Consequently the eigenvalues of the dual Jacobian remain sandwiched between two functions of the Lambert W value at s alone. The argument in the proof of Theorem 4.1 relies on this fact, but the bounding of the shape-induced perturbation is indeed only implicit. We will insert a short auxiliary lemma in §4.2 that explicitly derives the eigenvalue enclosure, confirming that no eigenvalue can escape the Lambert W envelope. The same lemma will also secure the local superlinear rate. revision: yes
Referee: [§3.1, Eq. (8)] §3.1, Eq. (8): The nonsingularity of the dual Jacobian is asserted after the scale-shape split, but the argument that the resulting matrix remains well-conditioned independently of μ's heterogeneity is not accompanied by an explicit eigenvalue estimate or condition-number bound that excludes dependence on the shape variables. This is load-bearing for the choice of damping parameter and the inexact Newton step acceptance criterion.

Authors: We agree that an explicit condition-number bound would make the justification of the damping parameter and the inexact-step tolerance fully transparent. After the scale-shape factorization the Jacobian is positive definite; its smallest and largest eigenvalues are controlled solely by the scale variable through the same Lambert W estimates, because the shape normalization prevents the heterogeneity of μ from inflating the condition number. We will add a brief remark immediately after Eq. (8) that states the resulting condition-number bound in terms of s only. This will directly support the algorithmic parameter choices used in the damped inexact Newton iteration. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation proceeds from scale-shape split and Lambert W spectral bounds

full rationale

The paper derives the Newton method, global linear rate, and local superlinear rate directly from the scale-shape decomposition of the primal (producing a nonsingular dual Jacobian) together with explicit Lambert W bounds on the scale variable that control the spectrum. No step reduces a claimed prediction or uniqueness result to a fitted parameter, self-citation chain, or definitional renaming; the Jacobian nonsingularity and rate claims follow from the stated bounds and stable log-sum-exp/softmax primitives without circular reduction. The solution-map Lipschitz property is likewise obtained from the same decomposition. This is the normal case of an independent algorithmic derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; full manuscript required to audit the derivation steps.

pith-pipeline@v0.9.0 · 5453 in / 1062 out tokens · 39633 ms · 2026-05-07T08:36:40.285534+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 1 internal anchor

[1]

Introduction.Recovering a nonnegative density, spectrum, or intensity from noisy samples of a linear smoothing map is a recurring problem across applied math- ematics. Instances include analytic continuation of quantum Monte Carlo data for warm dense matter [ 11, 12, 17], maximum-entropy image reconstruction in astron- omy [20, 35], dual methods for entro...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.5281/zenodo.19889724 2026
[2]

The present method instead treats theunknownscaleτas an independent dual variable

controls exponential growth by solving a log-domain dual with L-BFGS [ 26]. The present method instead treats theunknownscaleτas an independent dual variable. For unknown scale, Skilling and Bryan[35], and Bryan [9] propose Newton methods that require a well-conditioned surrogate for the forward operator. The Jacobian analysis of section 3 removes this re...

work page
[3]

Fy(y, τ) Fτ(y, τ) # :=

Dual under scale-shape separation.We now formalize the scale-shape decomposition outlined in subsection 1.2. Writing x = τ p with τ = ⟨1, x⟩ and p∈ ∆n, the reformulated problem becomes (2.1) min x∈Rn + ψp(x) = min τ≥0 min x∈τ∆ n ϕp(x, τ), where the primal objective is (2.2)ϕ p(x, τ) := 1 2λ ∥Ax−b∥ 2 +⟨c, x⟩+ (g q +δ τ∆ n)(x). Since gq is strictly convex a...

work page
[4]

This section develops a damped inexact Newton iteration for that equation

Damped inexact Newton on the scale-shape dual.Section 2 reduced the primal problem to solving F (z) = 0, where z = (y, τ) ∈R m ×R ++ bundles the dual and scale variables. This section develops a damped inexact Newton iteration for that equation. The analysis rests on three lemmas. First, the merit function ρ = ∥F∥ has compact sublevel sets whose τ-project...

work page
[5]

These are developed individually in Lemmas 4.1 to 4.5

Bounds.The iteration complexity and the asymptotic rates of convergence of Algorithm 3.1 depend on a collection of bounds for F , DF , the search directions dk, and the step-sizeα k. These are developed individually in Lemmas 4.1 to 4.5. Standing setup.Throughout this section, fix β > 0, ¯η∈[0, 1), and η∈ [0,¯η]; let τβmax, τβmin, and yβmax be as in Lemma...

work page
[6]

Hence (4.12) and (4.13) follow

work page
[7]

Convergence.This section establishes the main convergence result for Algo- rithm 3.1. Theorem 5.1 describes (1) the global linear convergence of the merit ρ at an explicit rate ˆν <1; (2) the iteration complexity needed to reduce ρ(zk) or ∥zk+1 −z ∗∥ below ε; and (3) superlinear (resp. (1 + p)-superlinear) local rates under standard forcing-sequence condi...

work page
[8]

Notation.Write z∗ = ( y∗, τ ∗), p∗ := p(y∗), S∗ := S(y∗), and eH ∗ := eH(z∗) = λI + τ ∗ADiag (p∗)A⊺ (cf

Perturbation analysis.With the prior parameterized by r := logq , the solu- tion map (b, λ, r) 7→ (y∗, τ ∗, x∗)(b, λ, r) is jointly C∞ on Rm ×R ++ ×R n (Lemma 6.1), locally Lipschitz with computable constants (Theorem 6.2), and continuous as λ↓ 0 (Theorem 6.3). Notation.Write z∗ = ( y∗, τ ∗), p∗ := p(y∗), S∗ := S(y∗), and eH ∗ := eH(z∗) = λI + τ ∗ADiag (p...

work page
[9]

Numerical experiments.Our test problem is a uniform-electron-gas (UEG) analytic-continuation instance from Chuna et al. [11]. The task is to recover a nonnegative spectral density x∗ from noisy imaginary-time observations b≈Ax ∗ by solving (1.1). The forward operator is a discretized periodic Laplace transform, Aij =e −tiωj +e −(β−ti)ωj , with inverse tem...

work page
[10]

The merit tolerance and iteration cap are (ε, K max) = (10−8,300) throughout

The level-set parameter is β = 1.5 ρ(z0), and the safeguard ˆτβmin uses the closed-form bound λW (ζ)/A2 max from Lemma 3.1ii, floored at 10 −16. The merit tolerance and iteration cap are (ε, K max) = (10−8,300) throughout. 7.1. Overflow resilience.Any first- or second-order method applied to the unnormalized dual (1.3) must evaluate exp(A⊺y−c ), which can...

work page 2025
[11]

H. H. Bauschke and P. L. Combettes,Convex Analysis and Monotone Oper- ator Theory in Hilbert Spaces, CMS Books in Mathematics, Springer International Publishing, Cham, 2017

work page 2017
[12]

Beck,First-Order Methods in Optimization, SIAM-Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2017

A. Beck,First-Order Methods in Optimization, SIAM-Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2017

work page 2017
[13]

Ben-Tal, M

A. Ben-Tal, M. Teboulle, and A. Charnes,The role of duality in opti- mization problems involving entropy functionals with applications to information theory, J. Optim. Theory Appl., 58 (1988), pp. 209–223

work page 1988
[14]

Blanchard, D

P. Blanchard, D. J. Higham, and N. J. Higham,Accurately computing the log-sum-exp and softmax functions, IMA J. Numer. Anal., 41 (2021), pp. 2311– 2330

work page 2021
[15]

Borwein and A

J. Borwein and A. S. Lewis,Convergence of Best Entropy Estimates, SIAM J. Optim., 1 (1991), pp. 191–205

work page 1991
[16]

J. M. Borwein and A. S. Lewis,Partially finite convex programming, part i: Quasi relative interiors and duality theory, Math. Program., 57 (1992), pp. 15–48

work page 1992
[17]

J. M. Borwein and A. S. Lewis,Convex Analysis and Nonlinear Optimization: Theory and Examples, vol. 3, Springer Verlag, New York, 2nd ed., 2006

work page 2006
[18]

Boyd and L

S. Boyd and L. Vandenberghe,Convex optimization, Cambridge University Press, 2004

work page 2004
[19]

R. K. Bryan,Maximum entropy analysis of oversampled data problems, Eur. Biophys. J., 18 (1990), pp. 165–174

work page 1990
[20]

J. V. Burke,Second order necessary and sufficient conditions for convex com- posite NDO, Math. Program., 38 (1987), pp. 287–302

work page 1987
[21]

Chuna, N

T. Chuna, N. Barnfield, T. Dornheim, M. P. Friedlander, and T. Ho- heisel,Dual formulation of the maximum entropy method applied to analytic continuation of quantum monte carlo data, Jan. 2025

work page 2025
[22]

Chuna, N

T. Chuna, N. Barnfield, J. Vorberger, M. P. Friedlander, T. Hoheisel, and T. Dornheim,Estimates of the dynamic structure factor for the finite temperature electron liquid via analytic continuation of path integral monte carlo data, Phys. Rev. B, 112 (2025), p. 125112

work page 2025
[23]

Corless, G

R. Corless, G. Gonnet, D. Hare, D. Jeffrey, and D. Knuth,On the Lambert w function, Adv. Comput. Math., 5 (1996), pp. 329–359

work page 1996
[24]

T. M. Cover and J. A. Thomas,Elements of Information Theory, Wiley- Interscience, 2nd ed., 2006

work page 2006
[25]

Cuturi and G

M. Cuturi and G. Peyr ´e,Semi-dual Regularized Optimal Transport, SIAM Rev., 60 (2018), pp. 941–965

work page 2018
[26]

Decarreau, D

A. Decarreau, D. Hilhorst, C. Lemar´echal, and J. Navaza,Dual methods in entropy maximization. Application to some problems in crystallography, SIAM J. Optim., 2 (1992), pp. 173–197

work page 1992
[27]

Dornheim, M

T. Dornheim, M. B ¨ohme, D. Chapman, D. Kraus, T. Preston, Z. Mold- abekov, N. Schl ¨unzen, A. Cangi, T. D ¨oppner, and J. Vorberger, Imaginary-time correlation function thermometry: A new, high-accuracy and model-free temperature analysis technique for x-ray Thomson scattering data, Phys. Plasmas, 30 (2023), p. 042707

work page 2023
[28]

H. W. Engl, M. Hanke, and A. Neubauer,Regularization of Inverse Problems, vol. 375 of Mathematics and Its Applications, Kluwer Academic Publishers, Dordrecht, 1996

work page 1996
[29]

Eriksson,A note on solution of large sparse maximum entropy problems with linear equality constraints, Math

J. Eriksson,A note on solution of large sparse maximum entropy problems with linear equality constraints, Math. Program., 18 (1980), pp. 146–154. 28N. BARNFIELD, J. V. BURKE, M. P. FRIEDLANDER, AND T. HOHEISEL

work page 1980
[30]

Event Horizon Telescope Collaboration,First M87 event horizon telescope results. IV. Imaging the central supermassive black hole, The Astrophysical Journal Letters, 875 (2019), p. L4

work page 2019
[31]

Giuliani and G

G. Giuliani and G. Vignale,Quantum theory of the electron liquid, Cambridge university press, 2008

work page 2008
[32]

E. V. Haynsworth,Determination of the inertia of a partitioned hermitian matrix, Linear Algebra Appl., 1 (1968), pp. 73–81

work page 1968
[33]

R. A. Horn and C. R. Johnson,Matrix Analysis, Cambridge University Press, New York, NY, second edition, corrected reprint ed., 2017

work page 2017
[34]

E. T. Jaynes,Information Theory and Statistical Mechanics, Phys. Rev., 106 (1957), pp. 620–630

work page 1957
[35]

C. T. Kelley,Iterative methods for linear and nonlinear equations, SIAM, 1995

work page 1995
[36]

D. C. Liu and J. Nocedal,On the limited memory BFGS method for large scale optimization, Math. Program., 45 (1989), pp. 503–528

work page 1989
[37]

Lu and S

Z. Lu and S. Mei,Accelerated first-order methods for convex optimization with locally Lipschitz continuous gradient, Apr. 2023

work page 2023
[38]

Malitsky and K

Y. Malitsky and K. Mishchenko,Adaptive Gradient Descent without Descent, Aug. 2020

work page 2020
[39]

Nesterov,Lectures on Convex Optimization, vol

Y. Nesterov,Lectures on Convex Optimization, vol. 137 of Springer Optimiza- tion and Its Applications, Springer International Publishing, Cham, 2018

work page 2018
[40]

Ortega and Rheinbolt,Iterative Solutions of Nonlinear Equations in Several Variables, Academic Press, Series on Computer Science and Applied Mathematics, 1970

work page 1970
[41]

Rockafellar and R

R. Rockafellar and R. J.-B. Wets,Variational Analysis, Springer Verlag, Heidelberg, Berlin, New York, 1998

work page 1998
[42]

R. T. Rockafellar,Convex analysis, Princeton Mathematical Series, Princeton University Press, Princeton, N. J., 1970

work page 1970
[43]

R. T. Rockafellar and L. A. Dontchev,Implicit Functions and Solution Mappings: A view from variational analysis, Springer Monographs in Mathematics, Springer, 2014

work page 2014
[44]

Shore and R

J. Shore and R. Johnson,Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy, IEEE Trans. Inform. Theory, 26 (1980), pp. 26–37

work page 1980
[45]

Skilling and R

J. Skilling and R. K. Bryan,Maximum entropy image reconstruction: General algorithm, Mon. Not. R. Astron. Soc., 211 (1984), pp. 111–124

work page 1984

[1] [1]

Introduction.Recovering a nonnegative density, spectrum, or intensity from noisy samples of a linear smoothing map is a recurring problem across applied math- ematics. Instances include analytic continuation of quantum Monte Carlo data for warm dense matter [ 11, 12, 17], maximum-entropy image reconstruction in astron- omy [20, 35], dual methods for entro...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.5281/zenodo.19889724 2026

[2] [2]

The present method instead treats theunknownscaleτas an independent dual variable

controls exponential growth by solving a log-domain dual with L-BFGS [ 26]. The present method instead treats theunknownscaleτas an independent dual variable. For unknown scale, Skilling and Bryan[35], and Bryan [9] propose Newton methods that require a well-conditioned surrogate for the forward operator. The Jacobian analysis of section 3 removes this re...

work page

[3] [3]

Fy(y, τ) Fτ(y, τ) # :=

Dual under scale-shape separation.We now formalize the scale-shape decomposition outlined in subsection 1.2. Writing x = τ p with τ = ⟨1, x⟩ and p∈ ∆n, the reformulated problem becomes (2.1) min x∈Rn + ψp(x) = min τ≥0 min x∈τ∆ n ϕp(x, τ), where the primal objective is (2.2)ϕ p(x, τ) := 1 2λ ∥Ax−b∥ 2 +⟨c, x⟩+ (g q +δ τ∆ n)(x). Since gq is strictly convex a...

work page

[4] [4]

This section develops a damped inexact Newton iteration for that equation

Damped inexact Newton on the scale-shape dual.Section 2 reduced the primal problem to solving F (z) = 0, where z = (y, τ) ∈R m ×R ++ bundles the dual and scale variables. This section develops a damped inexact Newton iteration for that equation. The analysis rests on three lemmas. First, the merit function ρ = ∥F∥ has compact sublevel sets whose τ-project...

work page

[5] [5]

These are developed individually in Lemmas 4.1 to 4.5

Bounds.The iteration complexity and the asymptotic rates of convergence of Algorithm 3.1 depend on a collection of bounds for F , DF , the search directions dk, and the step-sizeα k. These are developed individually in Lemmas 4.1 to 4.5. Standing setup.Throughout this section, fix β > 0, ¯η∈[0, 1), and η∈ [0,¯η]; let τβmax, τβmin, and yβmax be as in Lemma...

work page

[6] [6]

Hence (4.12) and (4.13) follow

work page

[7] [7]

Convergence.This section establishes the main convergence result for Algo- rithm 3.1. Theorem 5.1 describes (1) the global linear convergence of the merit ρ at an explicit rate ˆν <1; (2) the iteration complexity needed to reduce ρ(zk) or ∥zk+1 −z ∗∥ below ε; and (3) superlinear (resp. (1 + p)-superlinear) local rates under standard forcing-sequence condi...

work page

[8] [8]

Notation.Write z∗ = ( y∗, τ ∗), p∗ := p(y∗), S∗ := S(y∗), and eH ∗ := eH(z∗) = λI + τ ∗ADiag (p∗)A⊺ (cf

Perturbation analysis.With the prior parameterized by r := logq , the solu- tion map (b, λ, r) 7→ (y∗, τ ∗, x∗)(b, λ, r) is jointly C∞ on Rm ×R ++ ×R n (Lemma 6.1), locally Lipschitz with computable constants (Theorem 6.2), and continuous as λ↓ 0 (Theorem 6.3). Notation.Write z∗ = ( y∗, τ ∗), p∗ := p(y∗), S∗ := S(y∗), and eH ∗ := eH(z∗) = λI + τ ∗ADiag (p...

work page

[9] [9]

Numerical experiments.Our test problem is a uniform-electron-gas (UEG) analytic-continuation instance from Chuna et al. [11]. The task is to recover a nonnegative spectral density x∗ from noisy imaginary-time observations b≈Ax ∗ by solving (1.1). The forward operator is a discretized periodic Laplace transform, Aij =e −tiωj +e −(β−ti)ωj , with inverse tem...

work page

[10] [10]

The merit tolerance and iteration cap are (ε, K max) = (10−8,300) throughout

The level-set parameter is β = 1.5 ρ(z0), and the safeguard ˆτβmin uses the closed-form bound λW (ζ)/A2 max from Lemma 3.1ii, floored at 10 −16. The merit tolerance and iteration cap are (ε, K max) = (10−8,300) throughout. 7.1. Overflow resilience.Any first- or second-order method applied to the unnormalized dual (1.3) must evaluate exp(A⊺y−c ), which can...

work page 2025

[11] [11]

H. H. Bauschke and P. L. Combettes,Convex Analysis and Monotone Oper- ator Theory in Hilbert Spaces, CMS Books in Mathematics, Springer International Publishing, Cham, 2017

work page 2017

[12] [12]

Beck,First-Order Methods in Optimization, SIAM-Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2017

A. Beck,First-Order Methods in Optimization, SIAM-Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2017

work page 2017

[13] [13]

Ben-Tal, M

A. Ben-Tal, M. Teboulle, and A. Charnes,The role of duality in opti- mization problems involving entropy functionals with applications to information theory, J. Optim. Theory Appl., 58 (1988), pp. 209–223

work page 1988

[14] [14]

Blanchard, D

P. Blanchard, D. J. Higham, and N. J. Higham,Accurately computing the log-sum-exp and softmax functions, IMA J. Numer. Anal., 41 (2021), pp. 2311– 2330

work page 2021

[15] [15]

Borwein and A

J. Borwein and A. S. Lewis,Convergence of Best Entropy Estimates, SIAM J. Optim., 1 (1991), pp. 191–205

work page 1991

[16] [16]

J. M. Borwein and A. S. Lewis,Partially finite convex programming, part i: Quasi relative interiors and duality theory, Math. Program., 57 (1992), pp. 15–48

work page 1992

[17] [17]

J. M. Borwein and A. S. Lewis,Convex Analysis and Nonlinear Optimization: Theory and Examples, vol. 3, Springer Verlag, New York, 2nd ed., 2006

work page 2006

[18] [18]

Boyd and L

S. Boyd and L. Vandenberghe,Convex optimization, Cambridge University Press, 2004

work page 2004

[19] [19]

R. K. Bryan,Maximum entropy analysis of oversampled data problems, Eur. Biophys. J., 18 (1990), pp. 165–174

work page 1990

[20] [20]

J. V. Burke,Second order necessary and sufficient conditions for convex com- posite NDO, Math. Program., 38 (1987), pp. 287–302

work page 1987

[21] [21]

Chuna, N

T. Chuna, N. Barnfield, T. Dornheim, M. P. Friedlander, and T. Ho- heisel,Dual formulation of the maximum entropy method applied to analytic continuation of quantum monte carlo data, Jan. 2025

work page 2025

[22] [22]

Chuna, N

T. Chuna, N. Barnfield, J. Vorberger, M. P. Friedlander, T. Hoheisel, and T. Dornheim,Estimates of the dynamic structure factor for the finite temperature electron liquid via analytic continuation of path integral monte carlo data, Phys. Rev. B, 112 (2025), p. 125112

work page 2025

[23] [23]

Corless, G

R. Corless, G. Gonnet, D. Hare, D. Jeffrey, and D. Knuth,On the Lambert w function, Adv. Comput. Math., 5 (1996), pp. 329–359

work page 1996

[24] [24]

T. M. Cover and J. A. Thomas,Elements of Information Theory, Wiley- Interscience, 2nd ed., 2006

work page 2006

[25] [25]

Cuturi and G

M. Cuturi and G. Peyr ´e,Semi-dual Regularized Optimal Transport, SIAM Rev., 60 (2018), pp. 941–965

work page 2018

[26] [26]

Decarreau, D

A. Decarreau, D. Hilhorst, C. Lemar´echal, and J. Navaza,Dual methods in entropy maximization. Application to some problems in crystallography, SIAM J. Optim., 2 (1992), pp. 173–197

work page 1992

[27] [27]

Dornheim, M

T. Dornheim, M. B ¨ohme, D. Chapman, D. Kraus, T. Preston, Z. Mold- abekov, N. Schl ¨unzen, A. Cangi, T. D ¨oppner, and J. Vorberger, Imaginary-time correlation function thermometry: A new, high-accuracy and model-free temperature analysis technique for x-ray Thomson scattering data, Phys. Plasmas, 30 (2023), p. 042707

work page 2023

[28] [28]

H. W. Engl, M. Hanke, and A. Neubauer,Regularization of Inverse Problems, vol. 375 of Mathematics and Its Applications, Kluwer Academic Publishers, Dordrecht, 1996

work page 1996

[29] [29]

Eriksson,A note on solution of large sparse maximum entropy problems with linear equality constraints, Math

J. Eriksson,A note on solution of large sparse maximum entropy problems with linear equality constraints, Math. Program., 18 (1980), pp. 146–154. 28N. BARNFIELD, J. V. BURKE, M. P. FRIEDLANDER, AND T. HOHEISEL

work page 1980

[30] [30]

Event Horizon Telescope Collaboration,First M87 event horizon telescope results. IV. Imaging the central supermassive black hole, The Astrophysical Journal Letters, 875 (2019), p. L4

work page 2019

[31] [31]

Giuliani and G

G. Giuliani and G. Vignale,Quantum theory of the electron liquid, Cambridge university press, 2008

work page 2008

[32] [32]

E. V. Haynsworth,Determination of the inertia of a partitioned hermitian matrix, Linear Algebra Appl., 1 (1968), pp. 73–81

work page 1968

[33] [33]

R. A. Horn and C. R. Johnson,Matrix Analysis, Cambridge University Press, New York, NY, second edition, corrected reprint ed., 2017

work page 2017

[34] [34]

E. T. Jaynes,Information Theory and Statistical Mechanics, Phys. Rev., 106 (1957), pp. 620–630

work page 1957

[35] [35]

C. T. Kelley,Iterative methods for linear and nonlinear equations, SIAM, 1995

work page 1995

[36] [36]

D. C. Liu and J. Nocedal,On the limited memory BFGS method for large scale optimization, Math. Program., 45 (1989), pp. 503–528

work page 1989

[37] [37]

Lu and S

Z. Lu and S. Mei,Accelerated first-order methods for convex optimization with locally Lipschitz continuous gradient, Apr. 2023

work page 2023

[38] [38]

Malitsky and K

Y. Malitsky and K. Mishchenko,Adaptive Gradient Descent without Descent, Aug. 2020

work page 2020

[39] [39]

Nesterov,Lectures on Convex Optimization, vol

Y. Nesterov,Lectures on Convex Optimization, vol. 137 of Springer Optimiza- tion and Its Applications, Springer International Publishing, Cham, 2018

work page 2018

[40] [40]

Ortega and Rheinbolt,Iterative Solutions of Nonlinear Equations in Several Variables, Academic Press, Series on Computer Science and Applied Mathematics, 1970

work page 1970

[41] [41]

Rockafellar and R

R. Rockafellar and R. J.-B. Wets,Variational Analysis, Springer Verlag, Heidelberg, Berlin, New York, 1998

work page 1998

[42] [42]

R. T. Rockafellar,Convex analysis, Princeton Mathematical Series, Princeton University Press, Princeton, N. J., 1970

work page 1970

[43] [43]

R. T. Rockafellar and L. A. Dontchev,Implicit Functions and Solution Mappings: A view from variational analysis, Springer Monographs in Mathematics, Springer, 2014

work page 2014

[44] [44]

Shore and R

J. Shore and R. Johnson,Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy, IEEE Trans. Inform. Theory, 26 (1980), pp. 26–37

work page 1980

[45] [45]

Skilling and R

J. Skilling and R. K. Bryan,Maximum entropy image reconstruction: General algorithm, Mon. Not. R. Astron. Soc., 211 (1984), pp. 111–124

work page 1984