Log-Averaged Mirror Prox for Fast, Large-Scale Optimal Transport in Linear Space
Pith reviewed 2026-05-17 22:18 UTC · model grok-4.3
The pith
LAMP solves large-scale optimal transport in linear memory by tracking an averaged dual sequence inside mirror prox updates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LAMP implements the primal mirror-prox updates by tracking an averaged dual sequence rather than the full dual matrix. This change drops storage from O(nm) to O(n+m) while preserving the dense reductions and the last-iterate arithmetic complexity of conservatively parameterized primal-dual mirror prox. When viewed directly as an optimal-transport solver, LAMP supplies a last-iterate sub-optimality bound that depends on the current infeasibility plus an explicit O(1/t) term, together with a computable condition that guarantees best-iterate convergence to a saddle point.
What carries the argument
the log-averaged dual sequence that substitutes for the full dual matrix inside the primal mirror-prox step
If this is right
- Storage cost drops from quadratic to linear in the support sizes while arithmetic work stays the same order.
- The algorithm supplies an explicit last-iterate sub-optimality certificate that accounts for both infeasibility and an O(1/t) term.
- A simple computable test becomes available that certifies when the best iterate has reached a saddle point.
- Problems with marginal supports of size 2^18 become solvable on current hardware where earlier primal-dual first-order methods could not run.
Where Pith is reading between the lines
- The same averaging trick might extend to other primal-dual first-order schemes that currently store full dual matrices in transportation or assignment problems.
- Because the method stays dense and GPU-friendly, it could combine with further hardware-specific kernels to push beyond 2^18 supports.
- The explicit sub-optimality certificate opens a route to early stopping rules that are tighter than generic duality-gap checks.
Load-bearing premise
That maintaining the averaged dual sequence produces exactly the same primal trajectory and convergence behavior as the original mirror-prox method without hidden conditions on the cost matrix or regularization.
What would settle it
Running both standard primal-dual mirror prox and LAMP on an identical small entropic optimal-transport instance and observing materially different last-iterate distances to optimality or different final transport plans.
Figures
read the original abstract
We propose Log-Averaged Mirror Prox (LAMP), a linear-space primal-dual method for large-scale optimal transport. LAMP implements primal mirror prox updates by tracking an averaged dual sequence, reducing storage complexity from ${O}(nm)$ to $O(n+m)$ while preserving dense, GPU-friendly reductions. Consequently, LAMP preserves the last-iterate $\widetilde{O}( nm\varepsilon^{-1})$ arithmetic complexity of conservatively parameterized primal-dual mirror prox. We further analyze LAMP as a direct optimal transport solver in a more performant parameter regime, providing a last-iterate sub-optimality certificate dependent on infeasibility and an explicit $O(1/t)$ term. Moreover, we give a computable sufficient condition for best-iterate convergence to a saddle-point. Numerical experiments with an optimized CUDA implementation show that LAMP outperforms first-order baselines in several high-accuracy (entropic) optimal transport problems. LAMP is further shown to scale up to problems with $n=m=2^{18}$ marginal supports, which were previously beyond the reach of primal-dual first-order methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Log-Averaged Mirror Prox (LAMP), a primal-dual first-order method for entropic optimal transport. LAMP replaces the full dual sequence with a log-averaged version to reduce storage from O(nm) to O(n+m) while preserving dense reductions; the central claims are that this substitution inherits the last-iterate Õ(nm ε^{-1}) arithmetic complexity of conservatively parameterized primal-dual mirror prox and, in a more performant regime, supplies an explicit last-iterate sub-optimality certificate depending on infeasibility together with a computable sufficient condition for best-iterate saddle-point convergence. Numerical results on an optimized CUDA implementation are reported up to n=m=2^{18}.
Significance. If the equivalence between the log-averaged trajectory and standard mirror-prox updates is rigorously established, the work would meaningfully extend the practical reach of primal-dual first-order methods for large-scale OT, particularly on GPU hardware where memory is the binding constraint. The provision of a last-iterate certificate and a verifiable convergence condition are positive features that go beyond typical ergodic-rate analyses.
major comments (3)
- [§3.1–3.2] §3.1–3.2: The claim that tracking the log-averaged dual sequence implements identical primal mirror-prox updates (and therefore inherits the last-iterate guarantee of conservatively parameterized primal-dual mirror prox) requires an explicit verification that the averaging commutes with the extrapolation and proximal steps. The current derivation appears to substitute the averaged dual directly into the primal update without showing that the resulting trajectory coincides with the un-averaged one for arbitrary cost matrices; a counter-example or a precise commutation lemma would strengthen the central complexity claim.
- [Theorem 4.1] Theorem 4.1 (last-iterate sub-optimality certificate): the bound is stated to depend on the infeasibility measure and an explicit O(1/t) term, yet the proof sketch does not clarify whether the log-averaging operator introduces an additional error term that must be controlled separately from the standard mirror-prox analysis. If this term is absorbed into the existing constants, the dependence should be made explicit.
- [§4.3] §4.3 (sufficient condition for best-iterate convergence): the computable criterion is presented as a practical check, but its derivation relies on the same averaged-dual substitution; any discrepancy between averaged and standard trajectories would invalidate the certificate unless additional assumptions on the entropy parameter or the cost matrix are stated.
minor comments (3)
- [Abstract] The abstract states preservation of complexity but supplies no derivation outline; a one-sentence pointer to the relevant theorem in the introduction would improve readability.
- [§2] Notation for the log-average operator (e.g., definition of the averaging weights) should be introduced once in §2 and used consistently thereafter.
- [Numerical experiments] Figure 1 and the scaling experiments would benefit from explicit reporting of the number of iterations required to reach the reported accuracy, rather than only wall-clock time.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. We address each major comment below and have revised the manuscript to strengthen the theoretical derivations where needed.
read point-by-point responses
-
Referee: The claim that tracking the log-averaged dual sequence implements identical primal mirror-prox updates requires an explicit verification that the averaging commutes with the extrapolation and proximal steps. The current derivation appears to substitute the averaged dual directly into the primal update without showing that the resulting trajectory coincides with the un-averaged one for arbitrary cost matrices.
Authors: We acknowledge that the original derivation in §3.1–3.2 would benefit from a more explicit commutation argument. In the revised manuscript we have added a new Lemma 3.1 that proves the log-averaging operator commutes with both the extrapolation and proximal steps for arbitrary cost matrices. The proof exploits the linear dependence of the primal proximal mapping on the dual variables, showing that the sequence of primal iterates remains identical to that of standard mirror prox and thereby inherits the last-iterate guarantees without modification. revision: yes
-
Referee: Theorem 4.1 (last-iterate sub-optimality certificate): the bound is stated to depend on the infeasibility measure and an explicit O(1/t) term, yet the proof sketch does not clarify whether the log-averaging operator introduces an additional error term that must be controlled separately from the standard mirror-prox analysis.
Authors: We thank the referee for this observation. The revised proof of Theorem 4.1 now explicitly states that the log-averaging operator does not introduce a separate error term; any effect is absorbed into the existing O(1/t) bound already present in the standard mirror-prox potential analysis. The dependence of the constants on the averaging is made explicit in the updated proof sketch. revision: yes
-
Referee: §4.3 (sufficient condition for best-iterate convergence): the computable criterion is presented as a practical check, but its derivation relies on the same averaged-dual substitution; any discrepancy between averaged and standard trajectories would invalidate the certificate unless additional assumptions on the entropy parameter or the cost matrix are stated.
Authors: We have revised §4.3 to reference the new commutation lemma from §3.2. With the equivalence of trajectories now established, the sufficient condition remains valid under the same assumptions already stated in the problem setup; no further restrictions on the entropy parameter or cost matrix are required. revision: yes
Circularity Check
No circularity: LAMP is an independent algorithmic construction with self-contained analysis
full rationale
The paper introduces LAMP as a direct modification of primal-dual mirror prox that replaces the dual sequence with its log-average to achieve O(n+m) storage while claiming to preserve the Õ(nm ε^{-1}) last-iterate rate. This is presented as a new algorithmic device whose convergence properties are then analyzed separately, including a last-iterate sub-optimality certificate that depends on infeasibility and an explicit O(1/t) term. No step in the provided abstract or description reduces a claimed prediction or rate to a fitted parameter, a self-citation chain, or a redefinition of the input; the derivation chain is therefore self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard assumptions on the optimal transport problem (non-negative costs, marginal constraints) suffice for the stated convergence rates.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
LAMP implements primal mirror prox updates by tracking an averaged dual sequence, reducing storage complexity from O(nm) to O(n+m) while preserving the last-iterate Õ(nm ε^{-1}) arithmetic complexity of conservatively parameterized primal-dual mirror prox.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose Log-Averaged Mirror Prox (LAMP), a linear-space primal-dual method for large-scale optimal transport.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Semi- Streaming Bipartite Matching in Fewer Passes and Optimal Space
[AJJ+22] Sepehr Assadi, Arun Jambulapati, Yujia Jin, Aaron Sidford, and Kevin Tian. Semi- Streaming Bipartite Matching in Fewer Passes and Optimal Space. InProceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), Proceedings, pages 627–669. Society for Industrial and Applied Mathematics, January
work page 2022
-
[2]
33 [Alt22] Jason M. Altschuler. Flows, scaling, and entropy revisited: A unified perspective via optimizing joint distributions.https://arxiv.org/abs/2210.16456,
-
[3]
Springer Nature Switzerland, Cham,
[CNR25] Sinho Chewi, Jonathan Niles-Weed, and Philippe Rigollet.Statistical Optimal Trans- port: ´Ecole d’ ´Et´ e de Probabilit´ es de Saint-Flour XLIX – 2019, volume 2364 ofLecture Notes in Mathematics. Springer Nature Switzerland, Cham,
work page 2019
-
[4]
Fast algorithms for computational optimal transport and Wasserstein barycenter
[GHJ20] Wenshuo Guo, Nhat Ho, and Michael Jordan. Fast algorithms for computational optimal transport and Wasserstein barycenter. InProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, pages 2088–2097. PMLR,
work page 2088
-
[5]
[LS14] Yin Tat Lee and Aaron Sidford. Path finding methods for linear programming: Solving linear programsin ˜O( √ rank) iterations and faster algorithms for maximum flow. In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pages 424– 433, Philadelphia, PA, USA,
work page 2014
-
[6]
arXiv preprint arXiv:2407.19689
[LY24] Haihao Lu and Jinwen Yang. PDOT: A Practical Primal-Dual Algorithm and a GPU- Based Solver for Optimal Transport.https://arxiv.org/abs/2407.19689,
-
[7]
Area-convexity, l∞regularization, and undirected multicommodity flow
[She17] Jonah Sherman. Area-convexity, l∞regularization, and undirected multicommodity flow. InProceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, pages 452–460, New York, NY, USA,
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.