PINS: Proximal Iterations with Sparse Newton and Sinkhorn for Optimal Transport

Di Wu; Haizhao Yang; Ling Liang

arxiv: 2502.03749 · v2 · submitted 2025-02-06 · 💻 cs.LG · math.OC

PINS: Proximal Iterations with Sparse Newton and Sinkhorn for Optimal Transport

Di Wu , Ling Liang , Haizhao Yang This is my paper

Pith reviewed 2026-05-23 03:23 UTC · model grok-4.3

classification 💻 cs.LG math.OC

keywords optimal transportproximal point methodSinkhorn algorithmNewton methodsparsificationentropic regularizationmachine learning

0 comments

The pith

PINS solves the unregularized optimal transport problem by wrapping Sinkhorn warm-ups inside proximal iterations and refining each subproblem with sparse Newton steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PINS as a two-loop method that overcomes the accuracy plateau of entropic regularization in optimal transport. An outer proximal-point iteration generates a sequence of shifted entropic problems whose solutions approach the exact unregularized optimum; each of those inner problems is solved first by Sinkhorn and then by a sparse-Newton correction. The authors prove global convergence of the outer sequence and show that the Hessian inside each Newton step admits a sparsification whose nonzero pattern does not depend on the cost matrix. Experiments on synthetic, MNIST, and DOTmark data report substantially lower cost errors than pure Sinkhorn together with speed-ups of 5-73 times at matched accuracy and reduced memory in a streaming implementation.

Core claim

PINS converges globally to an optimal solution of the unregularized OT problem. At every outer iteration the inner Hessian admits a sparsification whose structure is independent of the cost matrix, allowing the inner solver to combine a Sinkhorn warm-up with a sparse-Newton refinement that produces the required accuracy for the outer sequence.

What carries the argument

The two-loop solver: an entropic proximal-point outer iteration that produces a sequence of shifted cost matrices, each solved by an inner loop of Sinkhorn followed by sparse-Newton refinement on a cost-independent sparse Hessian.

If this is right

The outer sequence reaches the true OT plan rather than stopping at an entropic approximation.
Hessian sparsification remains valid at every outer step regardless of the particular cost values.
Memory usage drops 24-54 percent relative to network-simplex LP on large DOTmark instances while staying feasible under tighter per-process budgets.
The same outer-loop structure yields 5-73 times faster run times than Sinkhorn at equal accuracy on the tested instances.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The cost-independent sparsification pattern could be reused across multiple related transport problems without recomputing the sparsity mask.
The proximal-shift mechanism might be adapted to other bias-removal tasks such as unbalanced or semi-discrete transport.
Because the inner Newton step is local to each outer iteration, the method could be combined with warm-starting strategies across successive proximal steps.

Load-bearing premise

The sparse-Newton refinement after each Sinkhorn warm-up must remain accurate enough that approximation errors do not accumulate across outer iterations and block convergence to the exact unregularized optimum.

What would settle it

Solve a small OT instance whose exact optimum is known from a linear-programming solver; if PINS terminates with a relative cost error larger than the LP tolerance, the convergence claim fails.

Figures

Figures reproduced from arXiv: 2502.03749 by Di Wu, Haizhao Yang, Ling Liang.

**Figure 1.** Figure 1: Comparison of PINS and Newton accelerated methods without EPPA loops on synthetic datasets. The vertical axis represents the logarithmic error between the computed cost and the exact cost. The gray dashed line indicates the converging solution of the Newton accelerated methods for clarity. 0.0 0.5 1.0 1.5 Time Cost (s) 5 0 5 Log Error MNIST 1 0 100 200 300 Time Cost (s) 5 0 5 Log Error MNIST 4 0 100 200 3… view at source ↗

**Figure 2.** Figure 2: Comparison of PINS and Newton accelerated methods without EPPA loops on (augmented) MNIST datasets. seconds vs. 425 seconds). These findings align well with our theoretical conclusion in Theorem (4.4). The approximate Hessian matrix’s sparsity, bounded by τ (Hk ) ≤ 3n−1 2n2 , is independent of the cost matrix and holds at every iteration. This property enables efficient solutions to the sparse Hessian sys… view at source ↗

**Figure 3.** Figure 3: The comparison between PINS and Sinkhorn with EPPA on synthetic dataset. 0 25 50 75 Time Cost (s) 7.5 5.0 2.5 Log Error MNIST 1 0 500 1000 Time Cost (s) 7.5 5.0 2.5 Log Error MNIST 4 0 5000 10000 15000 Time Cost (s) 5 0 Log Error MNIST 9 0 20000 40000 60000 Time Cost (s) 5 0 Log Error MNIST 16 Sinkhorn with EPPA PINS [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: The comparison between PINS and Sinkhorn with EPPA on MNIST dataset. 5.4. Regularization Parameter We conducted several experiments to show the performance of PINS with different regularization parameters. The results are shown in [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Optimal transport (OT) is a widely used tool in machine learning, but computing high-accuracy solutions for large instances remains costly. Entropic regularization and the Sinkhorn algorithm improve scalability; however, when the regularization parameter is small, Sinkhorn convergence slows, and the iterates approach an entropic solution that remains separated from the true OT plan by an entropic-bias plateau. We introduce PINS (Proximal Iterations with sparse Newton and Sinkhorn), a two-loop solver designed to move beyond this plateau. The outer loop applies an entropic proximal-point method, solving the original OT problem through a sequence of entropic subproblems with shifted cost matrices. Each inner subproblem is then solved by a Sinkhorn warm-up followed by sparse-Newton refinement. We prove that PINS converges globally to an optimal solution of the unregularized OT problem and that the inner Hessian admits a sparsification at every outer iteration with a structure independent of the cost matrix. On synthetic and augmented-MNIST instances, PINS achieves much lower relative cost errors than Sinkhorn-type baselines, which stall at the entropic-bias plateau, and is $5$--$73\times$ faster than Sinkhorn with the same outer loop at matched accuracy. On large-scale DOTmark instances, a streaming implementation reduces peak memory by $24$--$54\%$ compared with the network-simplex linear programming (LP) solver and remains feasible under per-process memory budgets for which the LP solver fails.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PINS pairs an entropic proximal outer loop with Sinkhorn-plus-sparse-Newton inner solves and claims cost-independent Hessian sparsification plus global convergence to exact OT.

read the letter

The main takeaway is a concrete two-loop solver that runs an outer proximal-point sequence on successively shifted entropic OT problems and solves each inner problem with a Sinkhorn warm-up followed by a sparse Newton refinement. The paper states that the inner Hessian admits a sparsification whose pattern does not depend on the cost matrix and that the whole procedure converges globally to an unregularized optimum. On the reported synthetic, augmented-MNIST, and DOTmark instances it reaches lower relative cost errors than plain Sinkhorn (which plateaus) and delivers 5-73x speedups at matched accuracy, plus 24-54% lower peak memory than network simplex on the largest cases. Those empirical numbers are the part that looks immediately useful. The algorithmic idea of cost-independent sparsification is the clearest point of novelty relative to prior Sinkhorn and proximal OT work. The experiments are run on standard benchmark families and the memory result is quantified under realistic per-process limits, so the practical claims are easy to evaluate. The soft spot is the one the stress-test note flags. Global convergence to the exact unregularized plan requires that the inner approximation errors, including those from sparsification, stay small enough not to accumulate across outer iterations. The abstract asserts the sparsification exists and is cost-independent but does not supply explicit error bounds or show how the tolerance is driven to zero with the outer regularization parameter. If the full proofs control the perturbation uniformly, the claim stands; if the tolerance is fixed, the outer sequence may converge only to a nearby point rather than the true optimum. This is the load-bearing theoretical step. The paper is aimed at researchers who build or use high-accuracy OT solvers inside larger ML pipelines and who already know the entropic-bias limitation of Sinkhorn. It deserves a serious referee because the method is specific, the performance claims are measured on public instances, and the convergence statement is precise enough to be checked.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces PINS, a two-loop solver for unregularized optimal transport that uses an outer entropic proximal-point iteration on shifted cost matrices, with each inner subproblem solved via Sinkhorn warm-up followed by sparse-Newton refinement. It claims to prove global convergence of the outer sequence to an exact OT optimum and that the inner Hessian admits a sparsification whose structure is independent of the cost matrix at every outer iteration. Empirical results on synthetic, augmented-MNIST, and DOTmark instances report substantially lower relative cost errors than Sinkhorn baselines (which plateau due to entropic bias) together with 5–73× speedups at matched accuracy and 24–54% memory reduction in a streaming implementation.

Significance. If the global convergence and cost-independent sparsification hold with controlled inner error, the work would supply a practical route to high-accuracy unregularized OT plans at large scale, directly addressing the bias plateau that limits Sinkhorn-type methods when the regularization parameter is driven small. The combination of a proximal outer loop with a structurally sparse inner Newton step, together with the stated global convergence result, constitutes a substantive algorithmic and theoretical contribution.

major comments (2)

[§3 and convergence theorem (§4)] §3 (inner solver) and the convergence theorem (likely §4): the claim that the sparse-Newton refinement produces iterates accurate enough for the outer proximal sequence to reach the exact unregularized limit requires an explicit bound on the sparsification error that is either driven to zero with the outer regularization parameter or shown to be uniformly controlled across iterations; the abstract states the sparsification exists but supplies no such quantitative guarantee, which is load-bearing for the global convergence assertion.
[convergence theorem] Theorem on global convergence: the proof sketch must verify that the perturbation introduced by the fixed-tolerance sparsification does not accumulate and prevent contraction of the outer proximal error to zero; without this, the reduction from the regularized subproblems to the unregularized OT optimum remains formally open.

minor comments (2)

[§2] Notation for the shifted cost matrices in the outer loop should be introduced once and used consistently; the current presentation occasionally redefines the shift without cross-reference.
[large-scale experiments] The streaming implementation section would benefit from an explicit statement of the per-iteration memory complexity in terms of the support size after sparsification.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for identifying the need for explicit quantitative controls on sparsification error to close the global convergence argument. We address both major comments below and will strengthen the analysis accordingly in revision.

read point-by-point responses

Referee: [§3 and convergence theorem (§4)] §3 (inner solver) and the convergence theorem (likely §4): the claim that the sparse-Newton refinement produces iterates accurate enough for the outer proximal sequence to reach the exact unregularized limit requires an explicit bound on the sparsification error that is either driven to zero with the outer regularization parameter or shown to be uniformly controlled across iterations; the abstract states the sparsification exists but supplies no such quantitative guarantee, which is load-bearing for the global convergence assertion.

Authors: We agree that an explicit bound on the sparsification error is required for rigor. In the revised manuscript we will insert a new lemma (in §3) that quantifies the sparsification error and shows it is O(ε_outer) where ε_outer is the current outer regularization parameter; the bound is independent of the cost matrix by the structural property already proved. This drives the inner error to zero with the proximal sequence and supplies the missing quantitative guarantee. revision: yes
Referee: [convergence theorem] Theorem on global convergence: the proof sketch must verify that the perturbation introduced by the fixed-tolerance sparsification does not accumulate and prevent contraction of the outer proximal error to zero; without this, the reduction from the regularized subproblems to the unregularized OT optimum remains formally open.

Authors: We concur that the current sketch does not explicitly rule out accumulation. The revised proof (expanded in the appendix) will show that the per-iteration perturbation is bounded by a summable series whose sum is controlled by the contraction rate of the proximal-point map; the cost-independent sparsification structure ensures the perturbation remains uniformly bounded rather than growing, so the outer error still contracts to zero. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on algorithmic construction and external proofs

full rationale

The paper introduces PINS as a two-loop proximal-point algorithm with Sinkhorn warm-up and sparse-Newton inner solves, claiming global convergence to the unregularized OT optimum and cost-matrix-independent Hessian sparsification. These are presented as results of the algorithm's design and stated mathematical proofs rather than quantities defined in terms of themselves, fitted parameters renamed as predictions, or load-bearing self-citations. No equations or steps reduce by construction to the inputs; the derivation chain is self-contained against the stated assumptions and does not invoke prior author work to force uniqueness or smuggle ansatzes. Minor self-citation risk is absent from the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method relies on standard convex-optimization facts about proximal-point convergence for entropic OT and on the existence of a sparsifiable Hessian structure; no new free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The entropic proximal-point iteration converges to the solution of the unregularized OT problem when each subproblem is solved to sufficient accuracy.
Invoked to justify the outer loop; this is a known result in optimization theory but is load-bearing for the global convergence claim.

pith-pipeline@v0.9.0 · 5802 in / 1441 out tokens · 30635 ms · 2026-05-23T03:23:10.737245+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Beyond Expected Information Gain: Stable Bayesian Optimal Experimental Design with Integral Probability Metrics and Plug-and-Play Extensions
stat.ML 2026-04 unverdicted novelty 7.0

An IPM-based framework for Bayesian optimal experimental design is proposed that replaces KL-based expected information gain with Wasserstein, MMD, and energy distances, delivering stronger stability guarantees and pl...
A Provably Convergent and Practical Algorithm for Gromov--Wasserstein Optimal Transport
cs.LG 2026-05 unverdicted novelty 6.0

An inexact projected-gradient framework for GWOT with a verifiable feasibility-residual condition that proves subsequential and full-sequence convergence to stationary points under a mild tolerance decay.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · cited by 2 Pith papers · 1 internal anchor

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[2]

Aldous, D. J. The (2) limit in the random assignment problem. Random Structures & Algorithms, 18 0 (4): 0 381--418, 2001

work page 2001
[3]

Near-linear time approximation algorithms for optimal transport via S inkhorn iteration

Altschuler, J., Niles-Weed, J., and Rigollet, P. Near-linear time approximation algorithms for optimal transport via S inkhorn iteration. Advances in neural information processing systems, 30, 2017

work page 2017
[4]

Massively scalable S inkhorn distances via the N ystr \"o m method

Altschuler, J., Bach, F., Rudi, A., and Niles-Weed, J. Massively scalable S inkhorn distances via the N ystr \"o m method. Advances in neural information processing systems, 32, 2019

work page 2019
[5]

Wasserstein generative adversarial networks

Arjovsky, M., Chintala, S., and Bottou, L. Wasserstein generative adversarial networks. In International conference on machine learning, pp.\ 214--223. PMLR, 2017

work page 2017
[6]

G., Dabney, W., and Munos, R

Bellemare, M. G., Dabney, W., and Munos, R. A distributional perspective on reinforcement learning. In International conference on machine learning, pp.\ 449--458. PMLR, 2017

work page 2017
[7]

Iterative B regman projections for regularized transportation problems

Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L., and Peyr \'e , G. Iterative B regman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37 0 (2): 0 A1111--A1138, 2015

work page 2015
[8]

and Tsitsiklis, J

Bertsimas, D. and Tsitsiklis, J. N. Introduction to linear optimization, volume 6. Athena scientific Belmont, MA, 1997

work page 1997
[9]

Assignment problems: revised reprint

Burkard, R., Dell'Amico, M., and Martello, S. Assignment problems: revised reprint. SIAM, 2012

work page 2012
[10]

and Zenios, S

Censor, Y. and Zenios, S. A. Proximal minimization algorithm with D -functions. Journal of Optimization Theory and Applications, 73 0 (3): 0 451--464, 1992

work page 1992
[11]

Unified optimal transport framework for universal domain adaptation

Chang, W., Shi, Y., Tuan, H., and Wang, J. Unified optimal transport framework for universal domain adaptation. Advances in Neural Information Processing Systems, 35: 0 29512--29524, 2022

work page 2022
[12]

and Teboulle, M

Chen, G. and Teboulle, M. Convergence analysis of a proximal-like minimization algorithm using B regman functions. SIAM Journal on Optimization, 3 0 (3): 0 538--543, 1993

work page 1993
[13]

Graph optimal transport for cross-domain alignment

Chen, L., Gan, Z., Cheng, Y., Li, L., Carin, L., and Liu, J. Graph optimal transport for cross-domain alignment. In International Conference on Machine Learning, pp.\ 1542--1553. PMLR, 2020

work page 2020
[14]

T., Liang, L., Toh, K.-C., and Yang, L

Chu, H. T., Liang, L., Toh, K.-C., and Yang, L. An efficient implementable inexact entropic proximal point algorithm for a class of linear programming problems. Computational Optimization and Applications, 85 0 (1): 0 107--146, 2023

work page 2023
[15]

A regularized interior point method for sparse optimal transport on graphs

Cipolla, S., Gondzio, J., and Zanetti, F. A regularized interior point method for sparse optimal transport on graphs. European Journal of Operational Research, 319 0 (2): 0 413--426, 2024

work page 2024
[16]

Sinkhorn distances: Lightspeed computation of optimal transport

Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013

work page 2013
[17]

Distributional reinforcement learning with quantile regression

Dabney, W., Rowland, M., Bellemare, M., and Munos, R. Distributional reinforcement learning with quantile regression. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018

work page 2018
[18]

Diffusion S chr \"o dinger bridge with applications to score-based generative modeling

De Bortoli, V., Thornton, J., Heng, J., and Doucet, A. Diffusion S chr \"o dinger bridge with applications to score-based generative modeling. Advances in Neural Information Processing Systems, 34: 0 17695--17709, 2021

work page 2021
[19]

Power particles: an incompressible fluid solver based on power diagrams

De Goes, F., Wallez, C., Huang, J., Pavlov, D., and Desbrun, M. Power particles: an incompressible fluid solver based on power diagrams. ACM Trans. Graph., 34: 0 50--1, 2015

work page 2015
[20]

and Walsh III, J

Dieci, L. and Walsh III, J. D. The boundary method for semi-discrete optimal transport partitions and wasserstein distance computation. Journal of Computational and Applied Mathematics, 353: 0 318--344, 2019

work page 2019
[21]

Nonlinear proximal point algorithms using B regman functions, with applications to convex programming

Eckstein, J. Nonlinear proximal point algorithms using B regman functions, with applications to convex programming. Mathematics of Operations Research, 18 0 (1): 0 202--226, 1993

work page 1993
[22]

Approximate iterations in B regman-function-based proximal algorithms

Eckstein, J. Approximate iterations in B regman-function-based proximal algorithms. Mathematical programming, 83: 0 113--123, 1998

work page 1998
[23]

Unbalanced minibatch optimal transport; applications to domain adaptation

Fatras, K., S \'e journ \'e , T., Flamary, R., and Courty, N. Unbalanced minibatch optimal transport; applications to domain adaptation. In International Conference on Machine Learning, pp.\ 3186--3197. PMLR, 2021

work page 2021
[24]

and Reeves, C

Fletcher, R. and Reeves, C. M. Function minimization by conjugate gradients. The computer journal, 7 0 (2): 0 149--154, 1964

work page 1964
[25]

Optimal transport methods in economics

Galichon, A. Optimal transport methods in economics. Princeton University Press, 2018

work page 2018
[26]

Deep generative learning via variational gradient flow

Gao, Y., Jiao, Y., Wang, Y., Wang, Y., Yang, C., and Zhang, S. Deep generative learning via variational gradient flow. In International Conference on Machine Learning, pp.\ 2093--2101. PMLR, 2019

work page 2093
[27]

Learning generative models with S inkhorn divergences

Genevay, A., Peyr \'e , G., and Cuturi, M. Learning generative models with S inkhorn divergences. In International Conference on Artificial Intelligence and Statistics, pp.\ 1608--1617. PMLR, 2018

work page 2018
[28]

and Nutz, M

Ghosal, P. and Nutz, M. On the convergence rate of S inkhorn's algorithm. arXiv preprint arXiv:2212.06000, 2022

work page arXiv 2022
[29]

A sparse smoothing N ewton method for solving discrete optimal transport problems

Hou, D., Liang, L., and Toh, K.-C. A sparse smoothing N ewton method for solving discrete optimal transport problems. ACM Transactions on Mathematical Software, 50 0 (3): 0 1--26, 2024

work page 2024
[30]

R., Tape, C

Huguet, G., Tong, A., Zapatero, M. R., Tape, C. J., Wolf, G., and Krishnaswamy, S. Geodesic S inkhorn for fast and accurate optimal transport on manifolds. In 2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP), pp.\ 1--6. IEEE, 2023

work page 2023
[31]

P., and Gretton, A

Jitkrittum, W., Szab \'o , Z., Chwialkowski, K. P., and Gretton, A. Interpretable distribution features with maximum testing power. Advances in Neural Information Processing Systems, 29, 2016

work page 2016
[32]

and Inouye, D

Kulinski, S. and Inouye, D. I. Towards explaining distribution shifts. In International Conference on Machine Learning, pp.\ 17931--17952. PMLR, 2023

work page 2023
[33]

A survey of the S chr \"o dinger problem and some of its connections with optimal transport

L \'e onard, C. A survey of the S chr \"o dinger problem and some of its connections with optimal transport. Discrete and Continuous Dynamical Systems, 34 0 (4): 0 1533--1574, 2013

work page 2013
[34]

Accelerating multi-block constrained optimization through learning to optimize

Liang, L., Austin, C., and Yang, H. Accelerating multi-block constrained optimization through learning to optimize. arXiv preprint arXiv:2409.17320, 2024

work page arXiv 2024
[35]

Lin, T., Ho, N., and Jordan, M. I. On the efficiency of entropic regularized algorithms for optimal transport. Journal of Machine Learning Research, 23 0 (137): 0 1--42, 2022

work page 2022
[36]

Luo, Y., Jiang, Z., Cohen, S., Grefenstette, E., and Deisenroth, M. P. Optimal transport for offline imitation learning. arXiv preprint arXiv:2303.13971, 2023

work page arXiv 2023
[37]

and Parisi, G

M \'e zard, M. and Parisi, G. On the solution of the random link matching problems. Journal de Physique, 48 0 (9): 0 1451--1459, 1987

work page 1987
[38]

F., Mboula, F

Montesuma, E. F., Mboula, F. M. N., and Souloumiac, A. Recent advances in optimal transport for machine learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

work page 2024
[39]

and Nemirovskii, A

Nesterov, Y. and Nemirovskii, A. Interior-point polynomial algorithms in convex programming. SIAM, 1994

work page 1994
[40]

and Wright, S

Nocedal, J. and Wright, S. J. Numerical optimization. Springer, 1999

work page 1999
[41]

Computational optimal transport: With applications to data science

Peyr \'e , G., Cuturi, M., et al. Computational optimal transport: With applications to data science. Foundations and Trends in Machine Learning , 11 0 (5-6): 0 355--607, 2019

work page 2019
[42]

Polyak, B. T. Introduction to optimization. Optimization Software, Publications Division, 1987

work page 1987
[43]

Improving GAN s using optimal transport

Salimans, T., Zhang, H., Radford, A., and Metaxas, D. Improving GAN s using optimal transport. In International Conference on Learning Representations, 2018

work page 2018
[44]

and Lindenbaum, M

Sandler, R. and Lindenbaum, M. Nonnegative matrix factorization with earth mover's distance metric for image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33 0 (8): 0 1590--1602, 2011

work page 2011
[45]

and Cuturi, M

Scetbon, M. and Cuturi, M. Linear time S inkhorn divergences using positive features. Advances in neural information processing systems, 33: 0 13468--13480, 2020

work page 2020
[46]

Low-rank sinkhorn factorization

Scetbon, M., Cuturi, M., and Peyr \'e , G. Low-rank sinkhorn factorization. In International Conference on Machine Learning, pp.\ 9344--9354. PMLR, 2021

work page 2021
[47]

Stabilized sparse scaling algorithms for entropy regularized transport problems

Schmitzer, B. Stabilized sparse scaling algorithms for entropy regularized transport problems. SIAM Journal on Scientific Computing, 41 0 (3): 0 A1443--A1481, 2019

work page 2019
[48]

Diffusion S chr \"o dinger bridge matching

Shi, Y., De Bortoli, V., Campbell, A., and Doucet, A. Diffusion S chr \"o dinger bridge matching. Advances in Neural Information Processing Systems, 36, 2024

work page 2024
[49]

Convolutional W asserstein distances: Efficient optimal transportation on geometric domains

Solomon, J., De Goes, F., Peyr \'e , G., Cuturi, M., Butscher, A., Nguyen, A., Du, T., and Guibas, L. Convolutional W asserstein distances: Efficient optimal transportation on geometric domains. ACM Transactions on Graphics (ToG), 34 0 (4): 0 1--11, 2015

work page 2015
[50]

Stanley, R. P. Enumerative combinatorics volume 1 second edition. Cambridge studies in advanced mathematics, 2011

work page 2011
[51]

Steele, J. M. Probability theory and combinatorial optimization. SIAM, 1997

work page 1997
[52]

K., Xiao, T., and Ying, L

Tang, X., Rahmanian, H., Shavlovsky, M., Thekumparampil, K. K., Xiao, T., and Ying, L. A S inkhorn-type algorithm for constrained optimal transport. arXiv preprint arXiv:2403.05054, 2024 a

work page arXiv 2024
[53]

K., Xiao, T., and Ying, L

Tang, X., Shavlovsky, M., Rahmanian, H., Tardini, E., Thekumparampil, K. K., Xiao, T., and Ying, L. Accelerating S inkhorn algorithm with sparse newton iterations. In The Twelfth International Conference on Learning Representations, 2024 b . URL https://openreview.net/forum?id=Kuj5gVp5GQ

work page 2024
[54]

and Qiu, Y

Tang, Z. and Qiu, Y. Safe and sparse N ewton method for entropic-regularized optimal transport. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=Nmmiyjw7Xg

work page 2024
[55]

C., Pereira, L

Torres, L. C., Pereira, L. M., and Amini, M. H. A survey on optimal transport for machine learning: Theory and applications. arXiv preprint arXiv:2106.01963, 2021

work page arXiv 2021
[56]

Villani, C. et al. Optimal transport: old and new, volume 338. Springer, 2009

work page 2009
[57]

Deep generative learning via S chr \"o dinger bridge

Wang, G., Jiao, Y., Xu, Q., Wang, Y., and Yang, C. Deep generative learning via S chr \"o dinger bridge. In International conference on machine learning, pp.\ 10794--10804. PMLR, 2021

work page 2021
[58]

Recent advances on machine learning for computational fluid dynamics: A survey

Wang, H., Cao, Y., Huang, Z., Liu, Y., Hu, P., Luo, X., Song, Z., Zhao, W., Liu, J., Sun, J., et al. Recent advances on machine learning for computational fluid dynamics: A survey. arXiv preprint arXiv:2408.12171, 2024

work page arXiv 2024
[59]

Large-scale multi-modal pre-trained models: A comprehensive survey

Wang, X., Chen, G., Qian, G., Gao, P., Wei, X.-Y., Wang, Y., Tian, Y., and Gao, W. Large-scale multi-modal pre-trained models: A comprehensive survey. Machine Intelligence Research, 20 0 (4): 0 447--482, 2023

work page 2023
[60]

An explicit analysis of the entropic penalty in linear programming

Weed, J. An explicit analysis of the entropic penalty in linear programming. In Conference On Learning Theory, pp.\ 1841--1855. PMLR, 2018

work page 2018
[61]

Neural network approximation for pessimistic offline reinforcement learning

Wu, D., Jiao, Y., Shen, L., Yang, H., and Lu, X. Neural network approximation for pessimistic offline reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp.\ 15868--15877, 2024

work page 2024
[62]

A fast proximal point method for computing exact W asserstein distance

Xie, Y., Wang, X., Wang, R., and Zha, H. A fast proximal point method for computing exact W asserstein distance. In Uncertainty in artificial intelligence, pp.\ 433--453. PMLR, 2020

work page 2020
[63]

T., and Toh, K.-C

Yang, L., Liang, L., Chu, H. T., and Toh, K.-C. A corrected inexact proximal augmented L agrangian method with a relative error criterion for a class of group-quadratic regularized optimal transport problems. Journal of Scientific Computing, 99 0 (3): 0 79, 2024

work page 2024
[64]

ripALM: A Relative-Type Inexact Proximal Augmented Lagrangian Method for Linearly Constrained Convex Optimization

Zhu, J., Liang, L., Yang, L., and Toh, K.-C. rip ALM : A relative-type inexact proximal augmented L agrangian method with applications to quadratically regularized optimal transport. arXiv preprint arXiv:2411.13267, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[2] [2]

Aldous, D. J. The (2) limit in the random assignment problem. Random Structures & Algorithms, 18 0 (4): 0 381--418, 2001

work page 2001

[3] [3]

Near-linear time approximation algorithms for optimal transport via S inkhorn iteration

Altschuler, J., Niles-Weed, J., and Rigollet, P. Near-linear time approximation algorithms for optimal transport via S inkhorn iteration. Advances in neural information processing systems, 30, 2017

work page 2017

[4] [4]

Massively scalable S inkhorn distances via the N ystr \"o m method

Altschuler, J., Bach, F., Rudi, A., and Niles-Weed, J. Massively scalable S inkhorn distances via the N ystr \"o m method. Advances in neural information processing systems, 32, 2019

work page 2019

[5] [5]

Wasserstein generative adversarial networks

Arjovsky, M., Chintala, S., and Bottou, L. Wasserstein generative adversarial networks. In International conference on machine learning, pp.\ 214--223. PMLR, 2017

work page 2017

[6] [6]

G., Dabney, W., and Munos, R

Bellemare, M. G., Dabney, W., and Munos, R. A distributional perspective on reinforcement learning. In International conference on machine learning, pp.\ 449--458. PMLR, 2017

work page 2017

[7] [7]

Iterative B regman projections for regularized transportation problems

Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L., and Peyr \'e , G. Iterative B regman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37 0 (2): 0 A1111--A1138, 2015

work page 2015

[8] [8]

and Tsitsiklis, J

Bertsimas, D. and Tsitsiklis, J. N. Introduction to linear optimization, volume 6. Athena scientific Belmont, MA, 1997

work page 1997

[9] [9]

Assignment problems: revised reprint

Burkard, R., Dell'Amico, M., and Martello, S. Assignment problems: revised reprint. SIAM, 2012

work page 2012

[10] [10]

and Zenios, S

Censor, Y. and Zenios, S. A. Proximal minimization algorithm with D -functions. Journal of Optimization Theory and Applications, 73 0 (3): 0 451--464, 1992

work page 1992

[11] [11]

Unified optimal transport framework for universal domain adaptation

Chang, W., Shi, Y., Tuan, H., and Wang, J. Unified optimal transport framework for universal domain adaptation. Advances in Neural Information Processing Systems, 35: 0 29512--29524, 2022

work page 2022

[12] [12]

and Teboulle, M

Chen, G. and Teboulle, M. Convergence analysis of a proximal-like minimization algorithm using B regman functions. SIAM Journal on Optimization, 3 0 (3): 0 538--543, 1993

work page 1993

[13] [13]

Graph optimal transport for cross-domain alignment

Chen, L., Gan, Z., Cheng, Y., Li, L., Carin, L., and Liu, J. Graph optimal transport for cross-domain alignment. In International Conference on Machine Learning, pp.\ 1542--1553. PMLR, 2020

work page 2020

[14] [14]

T., Liang, L., Toh, K.-C., and Yang, L

Chu, H. T., Liang, L., Toh, K.-C., and Yang, L. An efficient implementable inexact entropic proximal point algorithm for a class of linear programming problems. Computational Optimization and Applications, 85 0 (1): 0 107--146, 2023

work page 2023

[15] [15]

A regularized interior point method for sparse optimal transport on graphs

Cipolla, S., Gondzio, J., and Zanetti, F. A regularized interior point method for sparse optimal transport on graphs. European Journal of Operational Research, 319 0 (2): 0 413--426, 2024

work page 2024

[16] [16]

Sinkhorn distances: Lightspeed computation of optimal transport

Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013

work page 2013

[17] [17]

Distributional reinforcement learning with quantile regression

Dabney, W., Rowland, M., Bellemare, M., and Munos, R. Distributional reinforcement learning with quantile regression. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018

work page 2018

[18] [18]

Diffusion S chr \"o dinger bridge with applications to score-based generative modeling

De Bortoli, V., Thornton, J., Heng, J., and Doucet, A. Diffusion S chr \"o dinger bridge with applications to score-based generative modeling. Advances in Neural Information Processing Systems, 34: 0 17695--17709, 2021

work page 2021

[19] [19]

Power particles: an incompressible fluid solver based on power diagrams

De Goes, F., Wallez, C., Huang, J., Pavlov, D., and Desbrun, M. Power particles: an incompressible fluid solver based on power diagrams. ACM Trans. Graph., 34: 0 50--1, 2015

work page 2015

[20] [20]

and Walsh III, J

Dieci, L. and Walsh III, J. D. The boundary method for semi-discrete optimal transport partitions and wasserstein distance computation. Journal of Computational and Applied Mathematics, 353: 0 318--344, 2019

work page 2019

[21] [21]

Nonlinear proximal point algorithms using B regman functions, with applications to convex programming

Eckstein, J. Nonlinear proximal point algorithms using B regman functions, with applications to convex programming. Mathematics of Operations Research, 18 0 (1): 0 202--226, 1993

work page 1993

[22] [22]

Approximate iterations in B regman-function-based proximal algorithms

Eckstein, J. Approximate iterations in B regman-function-based proximal algorithms. Mathematical programming, 83: 0 113--123, 1998

work page 1998

[23] [23]

Unbalanced minibatch optimal transport; applications to domain adaptation

Fatras, K., S \'e journ \'e , T., Flamary, R., and Courty, N. Unbalanced minibatch optimal transport; applications to domain adaptation. In International Conference on Machine Learning, pp.\ 3186--3197. PMLR, 2021

work page 2021

[24] [24]

and Reeves, C

Fletcher, R. and Reeves, C. M. Function minimization by conjugate gradients. The computer journal, 7 0 (2): 0 149--154, 1964

work page 1964

[25] [25]

Optimal transport methods in economics

Galichon, A. Optimal transport methods in economics. Princeton University Press, 2018

work page 2018

[26] [26]

Deep generative learning via variational gradient flow

Gao, Y., Jiao, Y., Wang, Y., Wang, Y., Yang, C., and Zhang, S. Deep generative learning via variational gradient flow. In International Conference on Machine Learning, pp.\ 2093--2101. PMLR, 2019

work page 2093

[27] [27]

Learning generative models with S inkhorn divergences

Genevay, A., Peyr \'e , G., and Cuturi, M. Learning generative models with S inkhorn divergences. In International Conference on Artificial Intelligence and Statistics, pp.\ 1608--1617. PMLR, 2018

work page 2018

[28] [28]

and Nutz, M

Ghosal, P. and Nutz, M. On the convergence rate of S inkhorn's algorithm. arXiv preprint arXiv:2212.06000, 2022

work page arXiv 2022

[29] [29]

A sparse smoothing N ewton method for solving discrete optimal transport problems

Hou, D., Liang, L., and Toh, K.-C. A sparse smoothing N ewton method for solving discrete optimal transport problems. ACM Transactions on Mathematical Software, 50 0 (3): 0 1--26, 2024

work page 2024

[30] [30]

R., Tape, C

Huguet, G., Tong, A., Zapatero, M. R., Tape, C. J., Wolf, G., and Krishnaswamy, S. Geodesic S inkhorn for fast and accurate optimal transport on manifolds. In 2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP), pp.\ 1--6. IEEE, 2023

work page 2023

[31] [31]

P., and Gretton, A

Jitkrittum, W., Szab \'o , Z., Chwialkowski, K. P., and Gretton, A. Interpretable distribution features with maximum testing power. Advances in Neural Information Processing Systems, 29, 2016

work page 2016

[32] [32]

and Inouye, D

Kulinski, S. and Inouye, D. I. Towards explaining distribution shifts. In International Conference on Machine Learning, pp.\ 17931--17952. PMLR, 2023

work page 2023

[33] [33]

A survey of the S chr \"o dinger problem and some of its connections with optimal transport

L \'e onard, C. A survey of the S chr \"o dinger problem and some of its connections with optimal transport. Discrete and Continuous Dynamical Systems, 34 0 (4): 0 1533--1574, 2013

work page 2013

[34] [34]

Accelerating multi-block constrained optimization through learning to optimize

Liang, L., Austin, C., and Yang, H. Accelerating multi-block constrained optimization through learning to optimize. arXiv preprint arXiv:2409.17320, 2024

work page arXiv 2024

[35] [35]

Lin, T., Ho, N., and Jordan, M. I. On the efficiency of entropic regularized algorithms for optimal transport. Journal of Machine Learning Research, 23 0 (137): 0 1--42, 2022

work page 2022

[36] [36]

Luo, Y., Jiang, Z., Cohen, S., Grefenstette, E., and Deisenroth, M. P. Optimal transport for offline imitation learning. arXiv preprint arXiv:2303.13971, 2023

work page arXiv 2023

[37] [37]

and Parisi, G

M \'e zard, M. and Parisi, G. On the solution of the random link matching problems. Journal de Physique, 48 0 (9): 0 1451--1459, 1987

work page 1987

[38] [38]

F., Mboula, F

Montesuma, E. F., Mboula, F. M. N., and Souloumiac, A. Recent advances in optimal transport for machine learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

work page 2024

[39] [39]

and Nemirovskii, A

Nesterov, Y. and Nemirovskii, A. Interior-point polynomial algorithms in convex programming. SIAM, 1994

work page 1994

[40] [40]

and Wright, S

Nocedal, J. and Wright, S. J. Numerical optimization. Springer, 1999

work page 1999

[41] [41]

Computational optimal transport: With applications to data science

Peyr \'e , G., Cuturi, M., et al. Computational optimal transport: With applications to data science. Foundations and Trends in Machine Learning , 11 0 (5-6): 0 355--607, 2019

work page 2019

[42] [42]

Polyak, B. T. Introduction to optimization. Optimization Software, Publications Division, 1987

work page 1987

[43] [43]

Improving GAN s using optimal transport

Salimans, T., Zhang, H., Radford, A., and Metaxas, D. Improving GAN s using optimal transport. In International Conference on Learning Representations, 2018

work page 2018

[44] [44]

and Lindenbaum, M

Sandler, R. and Lindenbaum, M. Nonnegative matrix factorization with earth mover's distance metric for image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33 0 (8): 0 1590--1602, 2011

work page 2011

[45] [45]

and Cuturi, M

Scetbon, M. and Cuturi, M. Linear time S inkhorn divergences using positive features. Advances in neural information processing systems, 33: 0 13468--13480, 2020

work page 2020

[46] [46]

Low-rank sinkhorn factorization

Scetbon, M., Cuturi, M., and Peyr \'e , G. Low-rank sinkhorn factorization. In International Conference on Machine Learning, pp.\ 9344--9354. PMLR, 2021

work page 2021

[47] [47]

Stabilized sparse scaling algorithms for entropy regularized transport problems

Schmitzer, B. Stabilized sparse scaling algorithms for entropy regularized transport problems. SIAM Journal on Scientific Computing, 41 0 (3): 0 A1443--A1481, 2019

work page 2019

[48] [48]

Diffusion S chr \"o dinger bridge matching

Shi, Y., De Bortoli, V., Campbell, A., and Doucet, A. Diffusion S chr \"o dinger bridge matching. Advances in Neural Information Processing Systems, 36, 2024

work page 2024

[49] [49]

Convolutional W asserstein distances: Efficient optimal transportation on geometric domains

Solomon, J., De Goes, F., Peyr \'e , G., Cuturi, M., Butscher, A., Nguyen, A., Du, T., and Guibas, L. Convolutional W asserstein distances: Efficient optimal transportation on geometric domains. ACM Transactions on Graphics (ToG), 34 0 (4): 0 1--11, 2015

work page 2015

[50] [50]

Stanley, R. P. Enumerative combinatorics volume 1 second edition. Cambridge studies in advanced mathematics, 2011

work page 2011

[51] [51]

Steele, J. M. Probability theory and combinatorial optimization. SIAM, 1997

work page 1997

[52] [52]

K., Xiao, T., and Ying, L

Tang, X., Rahmanian, H., Shavlovsky, M., Thekumparampil, K. K., Xiao, T., and Ying, L. A S inkhorn-type algorithm for constrained optimal transport. arXiv preprint arXiv:2403.05054, 2024 a

work page arXiv 2024

[53] [53]

K., Xiao, T., and Ying, L

Tang, X., Shavlovsky, M., Rahmanian, H., Tardini, E., Thekumparampil, K. K., Xiao, T., and Ying, L. Accelerating S inkhorn algorithm with sparse newton iterations. In The Twelfth International Conference on Learning Representations, 2024 b . URL https://openreview.net/forum?id=Kuj5gVp5GQ

work page 2024

[54] [54]

and Qiu, Y

Tang, Z. and Qiu, Y. Safe and sparse N ewton method for entropic-regularized optimal transport. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=Nmmiyjw7Xg

work page 2024

[55] [55]

C., Pereira, L

Torres, L. C., Pereira, L. M., and Amini, M. H. A survey on optimal transport for machine learning: Theory and applications. arXiv preprint arXiv:2106.01963, 2021

work page arXiv 2021

[56] [56]

Villani, C. et al. Optimal transport: old and new, volume 338. Springer, 2009

work page 2009

[57] [57]

Deep generative learning via S chr \"o dinger bridge

Wang, G., Jiao, Y., Xu, Q., Wang, Y., and Yang, C. Deep generative learning via S chr \"o dinger bridge. In International conference on machine learning, pp.\ 10794--10804. PMLR, 2021

work page 2021

[58] [58]

Recent advances on machine learning for computational fluid dynamics: A survey

Wang, H., Cao, Y., Huang, Z., Liu, Y., Hu, P., Luo, X., Song, Z., Zhao, W., Liu, J., Sun, J., et al. Recent advances on machine learning for computational fluid dynamics: A survey. arXiv preprint arXiv:2408.12171, 2024

work page arXiv 2024

[59] [59]

Large-scale multi-modal pre-trained models: A comprehensive survey

Wang, X., Chen, G., Qian, G., Gao, P., Wei, X.-Y., Wang, Y., Tian, Y., and Gao, W. Large-scale multi-modal pre-trained models: A comprehensive survey. Machine Intelligence Research, 20 0 (4): 0 447--482, 2023

work page 2023

[60] [60]

An explicit analysis of the entropic penalty in linear programming

Weed, J. An explicit analysis of the entropic penalty in linear programming. In Conference On Learning Theory, pp.\ 1841--1855. PMLR, 2018

work page 2018

[61] [61]

Neural network approximation for pessimistic offline reinforcement learning

Wu, D., Jiao, Y., Shen, L., Yang, H., and Lu, X. Neural network approximation for pessimistic offline reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp.\ 15868--15877, 2024

work page 2024

[62] [62]

A fast proximal point method for computing exact W asserstein distance

Xie, Y., Wang, X., Wang, R., and Zha, H. A fast proximal point method for computing exact W asserstein distance. In Uncertainty in artificial intelligence, pp.\ 433--453. PMLR, 2020

work page 2020

[63] [63]

T., and Toh, K.-C

Yang, L., Liang, L., Chu, H. T., and Toh, K.-C. A corrected inexact proximal augmented L agrangian method with a relative error criterion for a class of group-quadratic regularized optimal transport problems. Journal of Scientific Computing, 99 0 (3): 0 79, 2024

work page 2024

[64] [64]

ripALM: A Relative-Type Inexact Proximal Augmented Lagrangian Method for Linearly Constrained Convex Optimization

Zhu, J., Liang, L., Yang, L., and Toh, K.-C. rip ALM : A relative-type inexact proximal augmented L agrangian method with applications to quadratically regularized optimal transport. arXiv preprint arXiv:2411.13267, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024