A Theory of Composition and Duality of Extremal Optimal Fixed-Point Algorithms

Benjamin Grimmer; TaeHo Yoon

arxiv: 2605.02231 · v1 · submitted 2026-05-04 · 🧮 math.OC

A Theory of Composition and Duality of Extremal Optimal Fixed-Point Algorithms

TaeHo Yoon , Benjamin Grimmer This is my paper

Pith reviewed 2026-05-08 18:44 UTC · model grok-4.3

classification 🧮 math.OC

keywords optimal fixed-point algorithmsnonexpansive operatorsminimax optimalityarc diagramsalgorithm dualitycombinatorial structurequasi-anytime convergence

0 comments

The pith

Optimal (N-1)-step fixed-point algorithms form a polytope whose (N-1)! vertices each correspond to a unique arc diagram encoding its convergence proof.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the exact minimax optimal fixed-step algorithms for nonexpansive fixed-point problems possess a precise combinatorial structure. For any number of steps, the optimal algorithms lie at the vertices of a polytope that contains exactly (N-1)! such vertices. Each vertex algorithm is represented by an arc diagram whose edges directly encode the sequence of inequalities needed to prove its convergence rate. This structure unifies previously known optimal methods and supplies a graphical language for composing new ones, including dual algorithms obtained by anti-diagonal transposition of the step matrix and algorithms that retain near-optimal guarantees even when the operator slightly violates nonexpansiveness.

Core claim

The set of optimal (N-1)-step algorithms, represented by lower-triangular matrices H whose worst-case residual is exactly minimax over all nonexpansive operators, consists of precisely (N-1)! extremal algorithms. Each such algorithm corresponds to an arc diagram that encodes its convergence proof. These diagrams permit systematic composition and decomposition of any optimal algorithm, and the H-dual operation preserves optimality precisely when the diagram satisfies a reflection symmetry condition, in which case the dual diagram yields the convergence proof of the transposed algorithm. The same machinery produces new optimal algorithms that deliver optimal residuals along an increasing index

What carries the argument

The arc diagram, a planar graph whose arcs record the exact sequence of nonexpansiveness applications that certify the minimax optimality of a vertex algorithm.

If this is right

Any optimal algorithm with more than one step can be decomposed into a composition of vertex algorithms whose arc diagrams are concatenated in a canonical way.
The anti-diagonal transpose of a vertex algorithm remains optimal precisely when its arc diagram is symmetric under reflection, and the reflected diagram supplies the proof for the dual.
New optimal algorithms exist that achieve the minimax rate at every iterate belonging to an arithmetic progression and remain optimal on a neighborhood of nonexpansive operators.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The arc-diagram representation may allow enumeration or optimization over the full set of optimal algorithms for concrete operator instances without enumerating the entire polytope.
The same combinatorial objects could supply canonical proofs for related splitting methods whose convergence is currently established by ad-hoc arguments.
Quasi-anytime algorithms obtained this way may extend directly to stochastic or inexact fixed-point settings where exact nonexpansiveness holds only in expectation.

Load-bearing premise

Fixed-step algorithms can be represented by lower-triangular matrices H whose optimality is defined by exact minimax residual bounds taken over the entire class of nonexpansive operators.

What would settle it

Compute the vertices of the optimality polytope for N=4 and check whether their number is exactly 6; if it is not, or if any vertex diagram fails to produce a valid convergence proof, the claimed enumeration collapses.

Figures

Figures reproduced from arXiv: 2605.02231 by Benjamin Grimmer, TaeHo Yoon.

**Figure 1.** Figure 1: Top: Arc diagrams for RDO(2) and OHM with N = 5, and their gluing G1 ▷◁ G2. Bottom: The corresponding vertices of H⋆(N − 1) and their gluing H1 ▷◁ H2, as defined in Definition 3. By Theorem 2, H1 ▷◁ H2 is the optimal vertex algorithm corresponding to G1 ▷◁ G2. The entries and arc added by the gluing operation are highlighted in red. 3.3 Gluing operation for composing algorithms and diagrams Arc diagrams ar… view at source ↗

**Figure 2.** Figure 2: Examples of basic (non-crossing) and non-basic (crossing) arc diagrams on view at source ↗

**Figure 3.** Figure 3: An example of recursive dualization of a basic arc diagram with view at source ↗

**Figure 4.** Figure 4: An example arc diagram for Repeated Dual-OHM with period view at source ↗

**Figure 5.** Figure 5: The arc diagram for the fractal self-dual algorithm view at source ↗

**Figure 6.** Figure 6: Heatmap of H(n) , n = 7. Proof. We argue by induction on n. There is nothing to prove in the base case n = 1, when we have H(1) = h 1 2 i . Next, we see that by Theorem 2, applied with N = 2n and N′ = 2n−1 , we must have the claimed block matrix structure with a (n−1) j = 2 nX−1−1 i=j hi,j for j = 1, . . . , 2 n−1 − 1, i.e., a (n−1) is the vector obtained by taking the sum of each column from H(n−1). Becau… view at source ↗

**Figure 7.** Figure 7: Performance comparison of OHM, Dual-OHM, RDO (periodic), and FSDM on four distinct fixed-point operators view at source ↗

read the original abstract

In this work, we reveal a rich combinatorial structure underlying exact minimax optimal algorithms for classical nonexpansive fixed-point problems. This viewpoint unifies all extremal optimal methods and provides a systematic and practical framework for designing new algorithms via diagrams. Specifically, we study fixed-step algorithms represented by a lower triangular matrix H, and show that the set of optimal (N-1)-step algorithms has exactly (N-1)! vertices (extremal algorithms), each of which naturally corresponds to an arc diagram, a graph that encodes its convergence proof. Using these arc diagrams, we can compose, decompose, and analyze the properties of distinct optimal vertex algorithms. Furthermore, we determine when the H-dual operation, given by taking the anti-diagonal transpose of H, preserves the optimality of a vertex algorithm, and in such cases we characterize the convergence proof of the dual algorithm. Based on this machinery, we develop new optimal algorithms with quasi-anytime guarantees; that is, they admit an increasing integer sequence such that the corresponding iterates have the optimal residual guarantees, and are additionally robust to fixed-point operators that violate nonexpansiveness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The arc-diagram bijection and H-duality for extremal optimal fixed-point algorithms give a clean combinatorial handle on a known family of methods, but the exact (N-1)! vertex count still needs the full minimax derivation to confirm it survives the infinite-dimensional operator set.

read the letter

The paper's main contribution is the explicit mapping from extremal optimal (N-1)-step algorithms, represented by lower-triangular H matrices, to arc diagrams that encode their convergence proofs, plus the H-duality operation (anti-diagonal transpose) that sometimes preserves optimality. They use this to compose and decompose algorithms and to produce new ones with quasi-anytime residual guarantees that also tolerate mild violations of nonexpansiveness. That diagrammatic view is genuinely new in the fixed-point literature and gives a practical way to generate and analyze vertex algorithms without starting from scratch each time. The unification of known methods under one combinatorial object is the part that feels most useful right now. The proofs for the exact vertex count and the duality preservation cases appear to rest on direct enumeration of the minimax polytope rather than circular fitting, which is a plus. The lower-triangular parametrization is stated clearly as a modeling choice, so it does not pretend to cover every possible optimal iteration. The soft spot is exactly the one the stress-test flags: whether the worst-case nonexpansive operators impose only the linear constraints that produce isolated vertices or whether they create higher-dimensional faces or extra inequalities that would change the count. Until the derivation is checked line by line, that (N-1)! claim remains the load-bearing piece. The new quasi-anytime algorithms are presented as direct outputs of the framework, which is fine, but they would benefit from even a small set of numerical comparisons to existing anytime methods to show the practical margin. This work is aimed at people who already do worst-case analysis of fixed-point iterations in optimization or machine learning. A reader comfortable with combinatorial proofs and matrix representations will get the most out of the diagrams and duality results. It is worth sending to referees because the claims are concrete enough to verify and the framework is organized enough to be extended or corrected if needed.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a combinatorial theory for extremal optimal fixed-step algorithms for nonexpansive fixed-point problems. Algorithms are parametrized by lower-triangular matrices H. The central claim is that the set of optimal (N-1)-step algorithms consists of exactly (N-1)! vertices (extremal algorithms), each corresponding to an arc diagram that encodes its convergence proof. The paper shows how to compose and decompose these via the diagrams, determines when the H-dual (anti-diagonal transpose of H) preserves optimality and characterizes the dual convergence proof, and constructs new optimal algorithms with quasi-anytime guarantees that remain robust when the operator violates nonexpansiveness.

Significance. If the vertex-count and arc-diagram bijection hold, the work supplies a unifying graphical language for optimal fixed-point algorithms and a practical design tool. The explicit enumeration of (N-1)! extremal points, the composition/decomposition rules, and the duality characterization are concrete strengths. The construction of quasi-anytime robust algorithms is a notable applied contribution. The framework could accelerate discovery of new optimal methods in optimization theory.

major comments (2)

[Abstract and §3] Abstract and §3 (vertex-count theorem): the claim that the feasible set of lower-triangular H achieving the exact minimax residual bound over all nonexpansive operators is a polytope with precisely (N-1)! vertices is load-bearing. The manuscript must derive the explicit (linear) constraints imposed by the infinite-dimensional sup and prove that the optimum is attained only at isolated vertices numbering exactly (N-1)!, rather than on positive-dimensional faces or under additional nonlinear constraints.
[§2.1] §2.1 (lower-triangular parametrization): the restriction to lower-triangular H is presented as the representation for fixed-step algorithms. The paper should verify whether every optimal algorithm (or every algorithm achieving the minimax bound) admits such a representation, or whether the restriction excludes candidate algorithms that could achieve the same bound outside this class.

minor comments (2)

[§3] The definition and basic properties of arc diagrams would be clearer with one or two fully worked small-N examples (e.g., N=3 or 4) placed immediately after their introduction.
[§4] Notation for the anti-diagonal transpose (H-dual) should be introduced with an explicit matrix example to avoid ambiguity when the dimension changes.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. The points raised about the explicit characterization of the polytope and the completeness of the lower-triangular parametrization are helpful for strengthening the presentation. We address each major comment below and indicate the corresponding revisions.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (vertex-count theorem): the claim that the feasible set of lower-triangular H achieving the exact minimax residual bound over all nonexpansive operators is a polytope with precisely (N-1)! vertices is load-bearing. The manuscript must derive the explicit (linear) constraints imposed by the infinite-dimensional sup and prove that the optimum is attained only at isolated vertices numbering exactly (N-1)!, rather than on positive-dimensional faces or under additional nonlinear constraints.

Authors: In §3 we characterize the feasible set of lower-triangular matrices H as those satisfying the exact minimax residual bound for every nonexpansive operator. The infinite-dimensional supremum reduces to a finite collection of linear inequalities because the worst-case behavior is completely determined by the finite set of extremal operator realizations encoded by the arc diagrams; each diagram supplies a distinct linear constraint on the entries of H. The proof that the resulting polytope has exactly (N-1)! vertices proceeds by exhibiting a bijection: each of the (N-1)! arc diagrams corresponds to a unique point at which a maximal set of these inequalities is simultaneously tight, and any convex combination of two or more diagrams produces a matrix that fails to attain the bound for at least one operator. Consequently, no positive-dimensional face can achieve the minimax value. We will add an explicit subsection in the revision that lists the linear inequalities derived from the operator conditions and spells out the isolation argument for the vertices. revision: yes
Referee: [§2.1] §2.1 (lower-triangular parametrization): the restriction to lower-triangular H is presented as the representation for fixed-step algorithms. The paper should verify whether every optimal algorithm (or every algorithm achieving the minimax bound) admits such a representation, or whether the restriction excludes candidate algorithms that could achieve the same bound outside this class.

Authors: The lower-triangular parametrization is without loss of generality for the class of fixed-step algorithms. Any fixed-point iteration with a fixed number of steps can be rewritten in lower-triangular form by re-expressing the updates in terms of the cumulative residuals; this equivalence is standard and is recalled in §2.1. Moreover, the optimality condition itself forces any matrix achieving the exact minimax bound to be lower-triangular: an upper-triangular entry would allow a nonexpansive operator that violates the bound while preserving the fixed-step structure. Thus the restriction does not exclude any optimal algorithms. We will insert a short clarifying paragraph in §2.1 that makes this equivalence and the necessity argument explicit. revision: yes

Circularity Check

0 steps flagged

No circularity: combinatorial enumeration of optimal vertices is independent of inputs

full rationale

The paper derives the exact count of (N-1)! extremal vertices for optimal (N-1)-step algorithms from the lower-triangular matrix parametrization H and the minimax residual objective over nonexpansive operators. This is presented as a direct mathematical result via combinatorial structure and arc diagrams, without any self-definitional equivalence, fitted parameters renamed as predictions, or load-bearing self-citations. The derivation chain relies on explicit matrix operations and graph encodings that do not reduce to the claim by construction, making the result self-contained against external verification.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Only the abstract is available, so the ledger is necessarily incomplete and limited to assumptions stated or implied by the abstract.

axioms (1)

domain assumption Fixed-point operators are nonexpansive (Lipschitz constant at most 1)
Standard assumption invoked throughout the abstract for convergence guarantees.

invented entities (1)

Arc diagram no independent evidence
purpose: Graph that encodes the convergence proof of an extremal optimal algorithm
New representational device introduced to visualize and manipulate optimality proofs.

pith-pipeline@v0.9.0 · 5495 in / 1253 out tokens · 23739 ms · 2026-05-08T18:44:14.375797+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation (J(x) = ½(x + x⁻¹) − 1 forced by ratio symmetry x ↔ 1/x) washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

H-dual ... obtained by reflecting H along its anti-diagonal

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

[1]

Altschuler and Pablo A

Jason M. Altschuler and Pablo A. Parrilo. Acceleration by stepsize hedging: Multi-step descent and the silver stepsize schedule.Journal of The ACM, 72(2), 2025

work page 2025
[2]

Altschuler and Pablo A

Jason M. Altschuler and Pablo A. Parrilo. Acceleration by stepsize hedging: Silver Stepsize Schedule for smooth convex optimization.Mathematical Programming, 213(1):1105–1118, 2025

work page 2025
[3]

Springer, second edition, 2017

Heinz H Bauschke and Patrick L Combettes.Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, second edition, 2017

work page 2017
[4]

Firmly nonexpansive mappings and maximally monotone operators: Correspondence and duality.Set-Valued and Variational Analysis, 20:131–153, 2012

Heinz H Bauschke, Sarah M Moffat, and Xianfu Wang. Firmly nonexpansive mappings and maximally monotone operators: Correspondence and duality.Set-Valued and Variational Analysis, 20:131–153, 2012

work page 2012
[5]

Fast optimistic gradient descent ascent (OGDA) method in continuous and discrete time.Foundations of Computational Mathematics, 2023

Radu Ioan Bot ¸, Ern¨ o Robert Csetnek, and Dang-Khoa Nguyen. Fast optimistic gradient descent ascent (OGDA) method in continuous and discrete time.Foundations of Computational Mathematics, 2023

work page 2023
[6]

Stochastic Halpern iteration with variance reduction for stochastic monotone inclusions.Neural Information Processing Systems, 2022

Xufeng Cai, Chaobing Song, Crist´ obal Guzm´ an, and Jelena Diakonikolas. Stochastic Halpern iteration with variance reduction for stochastic monotone inclusions.Neural Information Processing Systems, 2022

work page 2022
[7]

A combinatorial proof of the all minors matrix tree theorem.SIAM Journal on Algebraic Discrete Methods, 3(3):319–329, 1982

Seth Chaiken. A combinatorial proof of the all minors matrix tree theorem.SIAM Journal on Algebraic Discrete Methods, 3(3):319–329, 1982

work page 1982
[8]

Halpern iteration for near-optimal and parameter-free monotone inclusion and strong solutions to variational inequalities.Conference on Learning Theory, 2020

Jelena Diakonikolas. Halpern iteration for near-optimal and parameter-free monotone inclusion and strong solutions to variational inequalities.Conference on Learning Theory, 2020

work page 2020
[9]

Provably faster gradient descent via long steps.SIAM Journal on Optimization, 34(3):2588–2608, 2024

Benjamin Grimmer. Provably faster gradient descent via long steps.SIAM Journal on Optimization, 34(3):2588–2608, 2024

work page 2024
[10]

Benjamin Grimmer, Kevin Shu, and Alex L. Wang. Accelerated objective gap and gradient norm convergence for gradient descent via long steps.INFORMS Journal on Optimization, 7(2):156–169, 2025

work page 2025
[11]

Benjamin Grimmer, Kevin Shu, and Alex L. Wang. Composing optimized stepsize schedules for gradient descent. Mathematics of Operations Research, 0(0), 2025

work page 2025
[12]

Fixed points of nonexpanding maps.Bulletin of the American Mathematical Society, 73(6): 957–961, 1967

Benjamin Halpern. Fixed points of nonexpanding maps.Bulletin of the American Mathematical Society, 73(6): 957–961, 1967

work page 1967
[13]

Accelerated proximal point method for maximally monotone operators.Mathematical Programming, 190(1–2):57–87, 2021

Donghwan Kim. Accelerated proximal point method for maximally monotone operators.Mathematical Programming, 190(1–2):57–87, 2021

work page 2021
[14]

Donghwan Kim and Jeffrey A. Fessler. Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions.Journal of Optimization Theory and Applications, 188(1):192–219, 2021

work page 2021
[15]

Ozdaglar, Chanwoo Park, and Ernest K

Jaeyeon Kim, Asuman E. Ozdaglar, Chanwoo Park, and Ernest K. Ryu. Time-reversed dissipation induces duality between minimizing gradient norm and function value.Neural Information Processing Systems, 2023

work page 2023
[16]

Jaeyeon Kim, Chanwoo Park, Asuman Ozdaglar, Jelena Diakonikolas, and Ernest K. Ryu. Mirror duality in convex optimization.arXiv:2311.17296, 2023

work page arXiv 2023
[17]

A proof of the exact convergence rate of gradient descent.arXiv preprint arXiv:2412.04427, 2024

Jungbin Kim. A proof of the exact convergence rate of gradient descent.arXiv preprint arXiv:2412.04427, 2024

work page arXiv 2024
[18]

M. A. Krasnosel’skii. Two remarks on the method of successive approximations.Uspekhi Matematicheskikh Nauk, 10 (1):123–127, 1955

work page 1955
[19]

Fast extra gradient methods for smooth structured nonconvex-nonconcave minimax problems.Neural Information Processing Systems, 2021

Sucheol Lee and Donghwan Kim. Fast extra gradient methods for smooth structured nonconvex-nonconcave minimax problems.Neural Information Processing Systems, 2021

work page 2021
[20]

On the convergence rate of the Halpern-iteration.Optimization Letters, 15(2):405–418, 2021

Felix Lieder. On the convergence rate of the Halpern-iteration.Optimization Letters, 15(2):405–418, 2021

work page 2021
[21]

Mean value methods in iteration.Proceedings of the American Mathematical Society, 4(3): 506–510, 1953

William Robert Mann. Mean value methods in iteration.Proceedings of the American Mathematical Society, 4(3): 506–510, 1953. 28

work page 1953
[22]

Aryan Mokhtari, Asuman Ozdaglar, and Sarath Pattathil. A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach.International Conference on Artificial Intelligence and Statistics, 2020

work page 2020
[23]

Jisun Park and Ernest K. Ryu. Exact optimal accelerated complexity for fixed-point iterations.International Conference on Machine Learning, 2022

work page 2022
[24]

Kevin Shu and Alex L. Wang. A Unified Theory of H-Duality in First-Order Methods.Forthcoming, 2026

work page 2026
[25]

Suh, Jisun Park, and Ernest K

Jaewook J. Suh, Jisun Park, and Ernest K. Ryu. Continuous-time analysis of anchor acceleration.Neural Information Processing Systems, 2023

work page 2023
[26]

From Halpern’s fixed-point iterations to Nesterov’s accelerated interpretations for root-finding problems.Computational Optimization and Applications, 87(1):181–218, 2024

Quoc Tran-Dinh. From Halpern’s fixed-point iterations to Nesterov’s accelerated interpretations for root-finding problems.Computational Optimization and Applications, 87(1):181–218, 2024

work page 2024
[27]

Tran-Dinh and Y

Quoc Tran-Dinh and Yang Luo. Halpern-type accelerated and splitting algorithms for monotone inclusions. arXiv:2110.08150, 2021

work page arXiv 2021
[28]

TaeHo Yoon and Ernest K. Ryu. Accelerated algorithms for smooth convex-concave minimax problems withO(1/k 2) rate on squared gradient norm.International Conference on Machine Learning, 2021

work page 2021
[29]

TaeHo Yoon and Ernest K. Ryu. Accelerated minimax algorithms flock together.SIAM Journal on Optimization, 35(1):180–209, 2025

work page 2025
[30]

TaeHo Yoon, Jaeyeon Kim, Jaewook J Suh, and Ernest K. Ryu. Optimal acceleration for minimax and fixed-point problems is not unique.International Conference on Machine Learning, 2024

work page 2024
[31]

Ryu, and Benjamin Grimmer

TaeHo Yoon, Ernest K. Ryu, and Benjamin Grimmer. H-invariance theory: A complete characterization of minimax optimal fixed-point algorithms.arXiv:2511.14915, 2025

work page arXiv 2025
[32]

Q(1, N−2) Q(2, N−2) # ∈span (

Zehao Zhang and Rujun Jiang. Accelerated gradient descent by concatenation of stepsize schedules.arXiv:2410.12395 (To Appear in SIAM Journal on Optimization), 2024. A Proof of Lemma 2 We first state and prove some useful combinatorial lemmas. Lemma 10(Chu–Vandermonde identity).For anyα, β∈Rand any integern≥0, nX m=0 α m β n−m = α+β n . Proof.Compare the c...

work page arXiv 2024

[1] [1]

Altschuler and Pablo A

Jason M. Altschuler and Pablo A. Parrilo. Acceleration by stepsize hedging: Multi-step descent and the silver stepsize schedule.Journal of The ACM, 72(2), 2025

work page 2025

[2] [2]

Altschuler and Pablo A

Jason M. Altschuler and Pablo A. Parrilo. Acceleration by stepsize hedging: Silver Stepsize Schedule for smooth convex optimization.Mathematical Programming, 213(1):1105–1118, 2025

work page 2025

[3] [3]

Springer, second edition, 2017

Heinz H Bauschke and Patrick L Combettes.Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, second edition, 2017

work page 2017

[4] [4]

Firmly nonexpansive mappings and maximally monotone operators: Correspondence and duality.Set-Valued and Variational Analysis, 20:131–153, 2012

Heinz H Bauschke, Sarah M Moffat, and Xianfu Wang. Firmly nonexpansive mappings and maximally monotone operators: Correspondence and duality.Set-Valued and Variational Analysis, 20:131–153, 2012

work page 2012

[5] [5]

Fast optimistic gradient descent ascent (OGDA) method in continuous and discrete time.Foundations of Computational Mathematics, 2023

Radu Ioan Bot ¸, Ern¨ o Robert Csetnek, and Dang-Khoa Nguyen. Fast optimistic gradient descent ascent (OGDA) method in continuous and discrete time.Foundations of Computational Mathematics, 2023

work page 2023

[6] [6]

Stochastic Halpern iteration with variance reduction for stochastic monotone inclusions.Neural Information Processing Systems, 2022

Xufeng Cai, Chaobing Song, Crist´ obal Guzm´ an, and Jelena Diakonikolas. Stochastic Halpern iteration with variance reduction for stochastic monotone inclusions.Neural Information Processing Systems, 2022

work page 2022

[7] [7]

A combinatorial proof of the all minors matrix tree theorem.SIAM Journal on Algebraic Discrete Methods, 3(3):319–329, 1982

Seth Chaiken. A combinatorial proof of the all minors matrix tree theorem.SIAM Journal on Algebraic Discrete Methods, 3(3):319–329, 1982

work page 1982

[8] [8]

Halpern iteration for near-optimal and parameter-free monotone inclusion and strong solutions to variational inequalities.Conference on Learning Theory, 2020

Jelena Diakonikolas. Halpern iteration for near-optimal and parameter-free monotone inclusion and strong solutions to variational inequalities.Conference on Learning Theory, 2020

work page 2020

[9] [9]

Provably faster gradient descent via long steps.SIAM Journal on Optimization, 34(3):2588–2608, 2024

Benjamin Grimmer. Provably faster gradient descent via long steps.SIAM Journal on Optimization, 34(3):2588–2608, 2024

work page 2024

[10] [10]

Benjamin Grimmer, Kevin Shu, and Alex L. Wang. Accelerated objective gap and gradient norm convergence for gradient descent via long steps.INFORMS Journal on Optimization, 7(2):156–169, 2025

work page 2025

[11] [11]

Benjamin Grimmer, Kevin Shu, and Alex L. Wang. Composing optimized stepsize schedules for gradient descent. Mathematics of Operations Research, 0(0), 2025

work page 2025

[12] [12]

Fixed points of nonexpanding maps.Bulletin of the American Mathematical Society, 73(6): 957–961, 1967

Benjamin Halpern. Fixed points of nonexpanding maps.Bulletin of the American Mathematical Society, 73(6): 957–961, 1967

work page 1967

[13] [13]

Accelerated proximal point method for maximally monotone operators.Mathematical Programming, 190(1–2):57–87, 2021

Donghwan Kim. Accelerated proximal point method for maximally monotone operators.Mathematical Programming, 190(1–2):57–87, 2021

work page 2021

[14] [14]

Donghwan Kim and Jeffrey A. Fessler. Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions.Journal of Optimization Theory and Applications, 188(1):192–219, 2021

work page 2021

[15] [15]

Ozdaglar, Chanwoo Park, and Ernest K

Jaeyeon Kim, Asuman E. Ozdaglar, Chanwoo Park, and Ernest K. Ryu. Time-reversed dissipation induces duality between minimizing gradient norm and function value.Neural Information Processing Systems, 2023

work page 2023

[16] [16]

Jaeyeon Kim, Chanwoo Park, Asuman Ozdaglar, Jelena Diakonikolas, and Ernest K. Ryu. Mirror duality in convex optimization.arXiv:2311.17296, 2023

work page arXiv 2023

[17] [17]

A proof of the exact convergence rate of gradient descent.arXiv preprint arXiv:2412.04427, 2024

Jungbin Kim. A proof of the exact convergence rate of gradient descent.arXiv preprint arXiv:2412.04427, 2024

work page arXiv 2024

[18] [18]

M. A. Krasnosel’skii. Two remarks on the method of successive approximations.Uspekhi Matematicheskikh Nauk, 10 (1):123–127, 1955

work page 1955

[19] [19]

Fast extra gradient methods for smooth structured nonconvex-nonconcave minimax problems.Neural Information Processing Systems, 2021

Sucheol Lee and Donghwan Kim. Fast extra gradient methods for smooth structured nonconvex-nonconcave minimax problems.Neural Information Processing Systems, 2021

work page 2021

[20] [20]

On the convergence rate of the Halpern-iteration.Optimization Letters, 15(2):405–418, 2021

Felix Lieder. On the convergence rate of the Halpern-iteration.Optimization Letters, 15(2):405–418, 2021

work page 2021

[21] [21]

Mean value methods in iteration.Proceedings of the American Mathematical Society, 4(3): 506–510, 1953

William Robert Mann. Mean value methods in iteration.Proceedings of the American Mathematical Society, 4(3): 506–510, 1953. 28

work page 1953

[22] [22]

Aryan Mokhtari, Asuman Ozdaglar, and Sarath Pattathil. A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach.International Conference on Artificial Intelligence and Statistics, 2020

work page 2020

[23] [23]

Jisun Park and Ernest K. Ryu. Exact optimal accelerated complexity for fixed-point iterations.International Conference on Machine Learning, 2022

work page 2022

[24] [24]

Kevin Shu and Alex L. Wang. A Unified Theory of H-Duality in First-Order Methods.Forthcoming, 2026

work page 2026

[25] [25]

Suh, Jisun Park, and Ernest K

Jaewook J. Suh, Jisun Park, and Ernest K. Ryu. Continuous-time analysis of anchor acceleration.Neural Information Processing Systems, 2023

work page 2023

[26] [26]

From Halpern’s fixed-point iterations to Nesterov’s accelerated interpretations for root-finding problems.Computational Optimization and Applications, 87(1):181–218, 2024

Quoc Tran-Dinh. From Halpern’s fixed-point iterations to Nesterov’s accelerated interpretations for root-finding problems.Computational Optimization and Applications, 87(1):181–218, 2024

work page 2024

[27] [27]

Tran-Dinh and Y

Quoc Tran-Dinh and Yang Luo. Halpern-type accelerated and splitting algorithms for monotone inclusions. arXiv:2110.08150, 2021

work page arXiv 2021

[28] [28]

TaeHo Yoon and Ernest K. Ryu. Accelerated algorithms for smooth convex-concave minimax problems withO(1/k 2) rate on squared gradient norm.International Conference on Machine Learning, 2021

work page 2021

[29] [29]

TaeHo Yoon and Ernest K. Ryu. Accelerated minimax algorithms flock together.SIAM Journal on Optimization, 35(1):180–209, 2025

work page 2025

[30] [30]

TaeHo Yoon, Jaeyeon Kim, Jaewook J Suh, and Ernest K. Ryu. Optimal acceleration for minimax and fixed-point problems is not unique.International Conference on Machine Learning, 2024

work page 2024

[31] [31]

Ryu, and Benjamin Grimmer

TaeHo Yoon, Ernest K. Ryu, and Benjamin Grimmer. H-invariance theory: A complete characterization of minimax optimal fixed-point algorithms.arXiv:2511.14915, 2025

work page arXiv 2025

[32] [32]

Q(1, N−2) Q(2, N−2) # ∈span (

Zehao Zhang and Rujun Jiang. Accelerated gradient descent by concatenation of stepsize schedules.arXiv:2410.12395 (To Appear in SIAM Journal on Optimization), 2024. A Proof of Lemma 2 We first state and prove some useful combinatorial lemmas. Lemma 10(Chu–Vandermonde identity).For anyα, β∈Rand any integern≥0, nX m=0 α m β n−m = α+β n . Proof.Compare the c...

work page arXiv 2024