pith. sign in

arxiv: 2502.03749 · v2 · submitted 2025-02-06 · 💻 cs.LG · math.OC

PINS: Proximal Iterations with Sparse Newton and Sinkhorn for Optimal Transport

Pith reviewed 2026-05-23 03:23 UTC · model grok-4.3

classification 💻 cs.LG math.OC
keywords optimal transportproximal point methodSinkhorn algorithmNewton methodsparsificationentropic regularizationmachine learning
0
0 comments X

The pith

PINS solves the unregularized optimal transport problem by wrapping Sinkhorn warm-ups inside proximal iterations and refining each subproblem with sparse Newton steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PINS as a two-loop method that overcomes the accuracy plateau of entropic regularization in optimal transport. An outer proximal-point iteration generates a sequence of shifted entropic problems whose solutions approach the exact unregularized optimum; each of those inner problems is solved first by Sinkhorn and then by a sparse-Newton correction. The authors prove global convergence of the outer sequence and show that the Hessian inside each Newton step admits a sparsification whose nonzero pattern does not depend on the cost matrix. Experiments on synthetic, MNIST, and DOTmark data report substantially lower cost errors than pure Sinkhorn together with speed-ups of 5-73 times at matched accuracy and reduced memory in a streaming implementation.

Core claim

PINS converges globally to an optimal solution of the unregularized OT problem. At every outer iteration the inner Hessian admits a sparsification whose structure is independent of the cost matrix, allowing the inner solver to combine a Sinkhorn warm-up with a sparse-Newton refinement that produces the required accuracy for the outer sequence.

What carries the argument

The two-loop solver: an entropic proximal-point outer iteration that produces a sequence of shifted cost matrices, each solved by an inner loop of Sinkhorn followed by sparse-Newton refinement on a cost-independent sparse Hessian.

If this is right

  • The outer sequence reaches the true OT plan rather than stopping at an entropic approximation.
  • Hessian sparsification remains valid at every outer step regardless of the particular cost values.
  • Memory usage drops 24-54 percent relative to network-simplex LP on large DOTmark instances while staying feasible under tighter per-process budgets.
  • The same outer-loop structure yields 5-73 times faster run times than Sinkhorn at equal accuracy on the tested instances.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The cost-independent sparsification pattern could be reused across multiple related transport problems without recomputing the sparsity mask.
  • The proximal-shift mechanism might be adapted to other bias-removal tasks such as unbalanced or semi-discrete transport.
  • Because the inner Newton step is local to each outer iteration, the method could be combined with warm-starting strategies across successive proximal steps.

Load-bearing premise

The sparse-Newton refinement after each Sinkhorn warm-up must remain accurate enough that approximation errors do not accumulate across outer iterations and block convergence to the exact unregularized optimum.

What would settle it

Solve a small OT instance whose exact optimum is known from a linear-programming solver; if PINS terminates with a relative cost error larger than the LP tolerance, the convergence claim fails.

Figures

Figures reproduced from arXiv: 2502.03749 by Di Wu, Haizhao Yang, Ling Liang.

Figure 1
Figure 1. Figure 1: Comparison of PINS and Newton accelerated methods without EPPA loops on synthetic datasets. The vertical axis rep￾resents the logarithmic error between the computed cost and the exact cost. The gray dashed line indicates the converging solution of the Newton accelerated methods for clarity. 0.0 0.5 1.0 1.5 Time Cost (s) 5 0 5 Log Error MNIST 1 0 100 200 300 Time Cost (s) 5 0 5 Log Error MNIST 4 0 100 200 3… view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of PINS and Newton accelerated methods without EPPA loops on (augmented) MNIST datasets. seconds vs. 425 seconds). These findings align well with our theoretical conclusion in Theorem (4.4). The approximate Hessian matrix’s spar￾sity, bounded by τ (Hk ) ≤ 3n−1 2n2 , is independent of the cost matrix and holds at every iteration. This property enables efficient solutions to the sparse Hessian sys… view at source ↗
Figure 3
Figure 3. Figure 3: The comparison between PINS and Sinkhorn with EPPA on synthetic dataset. 0 25 50 75 Time Cost (s) 7.5 5.0 2.5 Log Error MNIST 1 0 500 1000 Time Cost (s) 7.5 5.0 2.5 Log Error MNIST 4 0 5000 10000 15000 Time Cost (s) 5 0 Log Error MNIST 9 0 20000 40000 60000 Time Cost (s) 5 0 Log Error MNIST 16 Sinkhorn with EPPA PINS [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The comparison between PINS and Sinkhorn with EPPA on MNIST dataset. 5.4. Regularization Parameter We conducted several experiments to show the performance of PINS with different regularization parameters. The re￾sults are shown in [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Optimal transport (OT) is a widely used tool in machine learning, but computing high-accuracy solutions for large instances remains costly. Entropic regularization and the Sinkhorn algorithm improve scalability; however, when the regularization parameter is small, Sinkhorn convergence slows, and the iterates approach an entropic solution that remains separated from the true OT plan by an entropic-bias plateau. We introduce PINS (Proximal Iterations with sparse Newton and Sinkhorn), a two-loop solver designed to move beyond this plateau. The outer loop applies an entropic proximal-point method, solving the original OT problem through a sequence of entropic subproblems with shifted cost matrices. Each inner subproblem is then solved by a Sinkhorn warm-up followed by sparse-Newton refinement. We prove that PINS converges globally to an optimal solution of the unregularized OT problem and that the inner Hessian admits a sparsification at every outer iteration with a structure independent of the cost matrix. On synthetic and augmented-MNIST instances, PINS achieves much lower relative cost errors than Sinkhorn-type baselines, which stall at the entropic-bias plateau, and is $5$--$73\times$ faster than Sinkhorn with the same outer loop at matched accuracy. On large-scale DOTmark instances, a streaming implementation reduces peak memory by $24$--$54\%$ compared with the network-simplex linear programming (LP) solver and remains feasible under per-process memory budgets for which the LP solver fails.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces PINS, a two-loop solver for unregularized optimal transport that uses an outer entropic proximal-point iteration on shifted cost matrices, with each inner subproblem solved via Sinkhorn warm-up followed by sparse-Newton refinement. It claims to prove global convergence of the outer sequence to an exact OT optimum and that the inner Hessian admits a sparsification whose structure is independent of the cost matrix at every outer iteration. Empirical results on synthetic, augmented-MNIST, and DOTmark instances report substantially lower relative cost errors than Sinkhorn baselines (which plateau due to entropic bias) together with 5–73× speedups at matched accuracy and 24–54% memory reduction in a streaming implementation.

Significance. If the global convergence and cost-independent sparsification hold with controlled inner error, the work would supply a practical route to high-accuracy unregularized OT plans at large scale, directly addressing the bias plateau that limits Sinkhorn-type methods when the regularization parameter is driven small. The combination of a proximal outer loop with a structurally sparse inner Newton step, together with the stated global convergence result, constitutes a substantive algorithmic and theoretical contribution.

major comments (2)
  1. [§3 and convergence theorem (§4)] §3 (inner solver) and the convergence theorem (likely §4): the claim that the sparse-Newton refinement produces iterates accurate enough for the outer proximal sequence to reach the exact unregularized limit requires an explicit bound on the sparsification error that is either driven to zero with the outer regularization parameter or shown to be uniformly controlled across iterations; the abstract states the sparsification exists but supplies no such quantitative guarantee, which is load-bearing for the global convergence assertion.
  2. [convergence theorem] Theorem on global convergence: the proof sketch must verify that the perturbation introduced by the fixed-tolerance sparsification does not accumulate and prevent contraction of the outer proximal error to zero; without this, the reduction from the regularized subproblems to the unregularized OT optimum remains formally open.
minor comments (2)
  1. [§2] Notation for the shifted cost matrices in the outer loop should be introduced once and used consistently; the current presentation occasionally redefines the shift without cross-reference.
  2. [large-scale experiments] The streaming implementation section would benefit from an explicit statement of the per-iteration memory complexity in terms of the support size after sparsification.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for identifying the need for explicit quantitative controls on sparsification error to close the global convergence argument. We address both major comments below and will strengthen the analysis accordingly in revision.

read point-by-point responses
  1. Referee: [§3 and convergence theorem (§4)] §3 (inner solver) and the convergence theorem (likely §4): the claim that the sparse-Newton refinement produces iterates accurate enough for the outer proximal sequence to reach the exact unregularized limit requires an explicit bound on the sparsification error that is either driven to zero with the outer regularization parameter or shown to be uniformly controlled across iterations; the abstract states the sparsification exists but supplies no such quantitative guarantee, which is load-bearing for the global convergence assertion.

    Authors: We agree that an explicit bound on the sparsification error is required for rigor. In the revised manuscript we will insert a new lemma (in §3) that quantifies the sparsification error and shows it is O(ε_outer) where ε_outer is the current outer regularization parameter; the bound is independent of the cost matrix by the structural property already proved. This drives the inner error to zero with the proximal sequence and supplies the missing quantitative guarantee. revision: yes

  2. Referee: [convergence theorem] Theorem on global convergence: the proof sketch must verify that the perturbation introduced by the fixed-tolerance sparsification does not accumulate and prevent contraction of the outer proximal error to zero; without this, the reduction from the regularized subproblems to the unregularized OT optimum remains formally open.

    Authors: We concur that the current sketch does not explicitly rule out accumulation. The revised proof (expanded in the appendix) will show that the per-iteration perturbation is bounded by a summable series whose sum is controlled by the contraction rate of the proximal-point map; the cost-independent sparsification structure ensures the perturbation remains uniformly bounded rather than growing, so the outer error still contracts to zero. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on algorithmic construction and external proofs

full rationale

The paper introduces PINS as a two-loop proximal-point algorithm with Sinkhorn warm-up and sparse-Newton inner solves, claiming global convergence to the unregularized OT optimum and cost-matrix-independent Hessian sparsification. These are presented as results of the algorithm's design and stated mathematical proofs rather than quantities defined in terms of themselves, fitted parameters renamed as predictions, or load-bearing self-citations. No equations or steps reduce by construction to the inputs; the derivation chain is self-contained against the stated assumptions and does not invoke prior author work to force uniqueness or smuggle ansatzes. Minor self-citation risk is absent from the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method relies on standard convex-optimization facts about proximal-point convergence for entropic OT and on the existence of a sparsifiable Hessian structure; no new free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption The entropic proximal-point iteration converges to the solution of the unregularized OT problem when each subproblem is solved to sufficient accuracy.
    Invoked to justify the outer loop; this is a known result in optimization theory but is load-bearing for the global convergence claim.

pith-pipeline@v0.9.0 · 5802 in / 1441 out tokens · 30635 ms · 2026-05-23T03:23:10.737245+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Beyond Expected Information Gain: Stable Bayesian Optimal Experimental Design with Integral Probability Metrics and Plug-and-Play Extensions

    stat.ML 2026-04 unverdicted novelty 7.0

    An IPM-based framework for Bayesian optimal experimental design is proposed that replaces KL-based expected information gain with Wasserstein, MMD, and energy distances, delivering stronger stability guarantees and pl...

  2. A Provably Convergent and Practical Algorithm for Gromov--Wasserstein Optimal Transport

    cs.LG 2026-05 unverdicted novelty 6.0

    An inexact projected-gradient framework for GWOT with a verifiable feasibility-residual condition that proves subsequential and full-sequence convergence to stationary points under a mild tolerance decay.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · cited by 2 Pith papers · 1 internal anchor

  1. [1]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  2. [2]

    Aldous, D. J. The (2) limit in the random assignment problem. Random Structures & Algorithms, 18 0 (4): 0 381--418, 2001

  3. [3]

    Near-linear time approximation algorithms for optimal transport via S inkhorn iteration

    Altschuler, J., Niles-Weed, J., and Rigollet, P. Near-linear time approximation algorithms for optimal transport via S inkhorn iteration. Advances in neural information processing systems, 30, 2017

  4. [4]

    Massively scalable S inkhorn distances via the N ystr \"o m method

    Altschuler, J., Bach, F., Rudi, A., and Niles-Weed, J. Massively scalable S inkhorn distances via the N ystr \"o m method. Advances in neural information processing systems, 32, 2019

  5. [5]

    Wasserstein generative adversarial networks

    Arjovsky, M., Chintala, S., and Bottou, L. Wasserstein generative adversarial networks. In International conference on machine learning, pp.\ 214--223. PMLR, 2017

  6. [6]

    G., Dabney, W., and Munos, R

    Bellemare, M. G., Dabney, W., and Munos, R. A distributional perspective on reinforcement learning. In International conference on machine learning, pp.\ 449--458. PMLR, 2017

  7. [7]

    Iterative B regman projections for regularized transportation problems

    Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L., and Peyr \'e , G. Iterative B regman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37 0 (2): 0 A1111--A1138, 2015

  8. [8]

    and Tsitsiklis, J

    Bertsimas, D. and Tsitsiklis, J. N. Introduction to linear optimization, volume 6. Athena scientific Belmont, MA, 1997

  9. [9]

    Assignment problems: revised reprint

    Burkard, R., Dell'Amico, M., and Martello, S. Assignment problems: revised reprint. SIAM, 2012

  10. [10]

    and Zenios, S

    Censor, Y. and Zenios, S. A. Proximal minimization algorithm with D -functions. Journal of Optimization Theory and Applications, 73 0 (3): 0 451--464, 1992

  11. [11]

    Unified optimal transport framework for universal domain adaptation

    Chang, W., Shi, Y., Tuan, H., and Wang, J. Unified optimal transport framework for universal domain adaptation. Advances in Neural Information Processing Systems, 35: 0 29512--29524, 2022

  12. [12]

    and Teboulle, M

    Chen, G. and Teboulle, M. Convergence analysis of a proximal-like minimization algorithm using B regman functions. SIAM Journal on Optimization, 3 0 (3): 0 538--543, 1993

  13. [13]

    Graph optimal transport for cross-domain alignment

    Chen, L., Gan, Z., Cheng, Y., Li, L., Carin, L., and Liu, J. Graph optimal transport for cross-domain alignment. In International Conference on Machine Learning, pp.\ 1542--1553. PMLR, 2020

  14. [14]

    T., Liang, L., Toh, K.-C., and Yang, L

    Chu, H. T., Liang, L., Toh, K.-C., and Yang, L. An efficient implementable inexact entropic proximal point algorithm for a class of linear programming problems. Computational Optimization and Applications, 85 0 (1): 0 107--146, 2023

  15. [15]

    A regularized interior point method for sparse optimal transport on graphs

    Cipolla, S., Gondzio, J., and Zanetti, F. A regularized interior point method for sparse optimal transport on graphs. European Journal of Operational Research, 319 0 (2): 0 413--426, 2024

  16. [16]

    Sinkhorn distances: Lightspeed computation of optimal transport

    Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013

  17. [17]

    Distributional reinforcement learning with quantile regression

    Dabney, W., Rowland, M., Bellemare, M., and Munos, R. Distributional reinforcement learning with quantile regression. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018

  18. [18]

    Diffusion S chr \"o dinger bridge with applications to score-based generative modeling

    De Bortoli, V., Thornton, J., Heng, J., and Doucet, A. Diffusion S chr \"o dinger bridge with applications to score-based generative modeling. Advances in Neural Information Processing Systems, 34: 0 17695--17709, 2021

  19. [19]

    Power particles: an incompressible fluid solver based on power diagrams

    De Goes, F., Wallez, C., Huang, J., Pavlov, D., and Desbrun, M. Power particles: an incompressible fluid solver based on power diagrams. ACM Trans. Graph., 34: 0 50--1, 2015

  20. [20]

    and Walsh III, J

    Dieci, L. and Walsh III, J. D. The boundary method for semi-discrete optimal transport partitions and wasserstein distance computation. Journal of Computational and Applied Mathematics, 353: 0 318--344, 2019

  21. [21]

    Nonlinear proximal point algorithms using B regman functions, with applications to convex programming

    Eckstein, J. Nonlinear proximal point algorithms using B regman functions, with applications to convex programming. Mathematics of Operations Research, 18 0 (1): 0 202--226, 1993

  22. [22]

    Approximate iterations in B regman-function-based proximal algorithms

    Eckstein, J. Approximate iterations in B regman-function-based proximal algorithms. Mathematical programming, 83: 0 113--123, 1998

  23. [23]

    Unbalanced minibatch optimal transport; applications to domain adaptation

    Fatras, K., S \'e journ \'e , T., Flamary, R., and Courty, N. Unbalanced minibatch optimal transport; applications to domain adaptation. In International Conference on Machine Learning, pp.\ 3186--3197. PMLR, 2021

  24. [24]

    and Reeves, C

    Fletcher, R. and Reeves, C. M. Function minimization by conjugate gradients. The computer journal, 7 0 (2): 0 149--154, 1964

  25. [25]

    Optimal transport methods in economics

    Galichon, A. Optimal transport methods in economics. Princeton University Press, 2018

  26. [26]

    Deep generative learning via variational gradient flow

    Gao, Y., Jiao, Y., Wang, Y., Wang, Y., Yang, C., and Zhang, S. Deep generative learning via variational gradient flow. In International Conference on Machine Learning, pp.\ 2093--2101. PMLR, 2019

  27. [27]

    Learning generative models with S inkhorn divergences

    Genevay, A., Peyr \'e , G., and Cuturi, M. Learning generative models with S inkhorn divergences. In International Conference on Artificial Intelligence and Statistics, pp.\ 1608--1617. PMLR, 2018

  28. [28]

    and Nutz, M

    Ghosal, P. and Nutz, M. On the convergence rate of S inkhorn's algorithm. arXiv preprint arXiv:2212.06000, 2022

  29. [29]

    A sparse smoothing N ewton method for solving discrete optimal transport problems

    Hou, D., Liang, L., and Toh, K.-C. A sparse smoothing N ewton method for solving discrete optimal transport problems. ACM Transactions on Mathematical Software, 50 0 (3): 0 1--26, 2024

  30. [30]

    R., Tape, C

    Huguet, G., Tong, A., Zapatero, M. R., Tape, C. J., Wolf, G., and Krishnaswamy, S. Geodesic S inkhorn for fast and accurate optimal transport on manifolds. In 2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP), pp.\ 1--6. IEEE, 2023

  31. [31]

    P., and Gretton, A

    Jitkrittum, W., Szab \'o , Z., Chwialkowski, K. P., and Gretton, A. Interpretable distribution features with maximum testing power. Advances in Neural Information Processing Systems, 29, 2016

  32. [32]

    and Inouye, D

    Kulinski, S. and Inouye, D. I. Towards explaining distribution shifts. In International Conference on Machine Learning, pp.\ 17931--17952. PMLR, 2023

  33. [33]

    A survey of the S chr \"o dinger problem and some of its connections with optimal transport

    L \'e onard, C. A survey of the S chr \"o dinger problem and some of its connections with optimal transport. Discrete and Continuous Dynamical Systems, 34 0 (4): 0 1533--1574, 2013

  34. [34]

    Accelerating multi-block constrained optimization through learning to optimize

    Liang, L., Austin, C., and Yang, H. Accelerating multi-block constrained optimization through learning to optimize. arXiv preprint arXiv:2409.17320, 2024

  35. [35]

    Lin, T., Ho, N., and Jordan, M. I. On the efficiency of entropic regularized algorithms for optimal transport. Journal of Machine Learning Research, 23 0 (137): 0 1--42, 2022

  36. [36]

    Luo, Y., Jiang, Z., Cohen, S., Grefenstette, E., and Deisenroth, M. P. Optimal transport for offline imitation learning. arXiv preprint arXiv:2303.13971, 2023

  37. [37]

    and Parisi, G

    M \'e zard, M. and Parisi, G. On the solution of the random link matching problems. Journal de Physique, 48 0 (9): 0 1451--1459, 1987

  38. [38]

    F., Mboula, F

    Montesuma, E. F., Mboula, F. M. N., and Souloumiac, A. Recent advances in optimal transport for machine learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

  39. [39]

    and Nemirovskii, A

    Nesterov, Y. and Nemirovskii, A. Interior-point polynomial algorithms in convex programming. SIAM, 1994

  40. [40]

    and Wright, S

    Nocedal, J. and Wright, S. J. Numerical optimization. Springer, 1999

  41. [41]

    Computational optimal transport: With applications to data science

    Peyr \'e , G., Cuturi, M., et al. Computational optimal transport: With applications to data science. Foundations and Trends in Machine Learning , 11 0 (5-6): 0 355--607, 2019

  42. [42]

    Polyak, B. T. Introduction to optimization. Optimization Software, Publications Division, 1987

  43. [43]

    Improving GAN s using optimal transport

    Salimans, T., Zhang, H., Radford, A., and Metaxas, D. Improving GAN s using optimal transport. In International Conference on Learning Representations, 2018

  44. [44]

    and Lindenbaum, M

    Sandler, R. and Lindenbaum, M. Nonnegative matrix factorization with earth mover's distance metric for image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33 0 (8): 0 1590--1602, 2011

  45. [45]

    and Cuturi, M

    Scetbon, M. and Cuturi, M. Linear time S inkhorn divergences using positive features. Advances in neural information processing systems, 33: 0 13468--13480, 2020

  46. [46]

    Low-rank sinkhorn factorization

    Scetbon, M., Cuturi, M., and Peyr \'e , G. Low-rank sinkhorn factorization. In International Conference on Machine Learning, pp.\ 9344--9354. PMLR, 2021

  47. [47]

    Stabilized sparse scaling algorithms for entropy regularized transport problems

    Schmitzer, B. Stabilized sparse scaling algorithms for entropy regularized transport problems. SIAM Journal on Scientific Computing, 41 0 (3): 0 A1443--A1481, 2019

  48. [48]

    Diffusion S chr \"o dinger bridge matching

    Shi, Y., De Bortoli, V., Campbell, A., and Doucet, A. Diffusion S chr \"o dinger bridge matching. Advances in Neural Information Processing Systems, 36, 2024

  49. [49]

    Convolutional W asserstein distances: Efficient optimal transportation on geometric domains

    Solomon, J., De Goes, F., Peyr \'e , G., Cuturi, M., Butscher, A., Nguyen, A., Du, T., and Guibas, L. Convolutional W asserstein distances: Efficient optimal transportation on geometric domains. ACM Transactions on Graphics (ToG), 34 0 (4): 0 1--11, 2015

  50. [50]

    Stanley, R. P. Enumerative combinatorics volume 1 second edition. Cambridge studies in advanced mathematics, 2011

  51. [51]

    Steele, J. M. Probability theory and combinatorial optimization. SIAM, 1997

  52. [52]

    K., Xiao, T., and Ying, L

    Tang, X., Rahmanian, H., Shavlovsky, M., Thekumparampil, K. K., Xiao, T., and Ying, L. A S inkhorn-type algorithm for constrained optimal transport. arXiv preprint arXiv:2403.05054, 2024 a

  53. [53]

    K., Xiao, T., and Ying, L

    Tang, X., Shavlovsky, M., Rahmanian, H., Tardini, E., Thekumparampil, K. K., Xiao, T., and Ying, L. Accelerating S inkhorn algorithm with sparse newton iterations. In The Twelfth International Conference on Learning Representations, 2024 b . URL https://openreview.net/forum?id=Kuj5gVp5GQ

  54. [54]

    and Qiu, Y

    Tang, Z. and Qiu, Y. Safe and sparse N ewton method for entropic-regularized optimal transport. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=Nmmiyjw7Xg

  55. [55]

    C., Pereira, L

    Torres, L. C., Pereira, L. M., and Amini, M. H. A survey on optimal transport for machine learning: Theory and applications. arXiv preprint arXiv:2106.01963, 2021

  56. [56]

    Villani, C. et al. Optimal transport: old and new, volume 338. Springer, 2009

  57. [57]

    Deep generative learning via S chr \"o dinger bridge

    Wang, G., Jiao, Y., Xu, Q., Wang, Y., and Yang, C. Deep generative learning via S chr \"o dinger bridge. In International conference on machine learning, pp.\ 10794--10804. PMLR, 2021

  58. [58]

    Recent advances on machine learning for computational fluid dynamics: A survey

    Wang, H., Cao, Y., Huang, Z., Liu, Y., Hu, P., Luo, X., Song, Z., Zhao, W., Liu, J., Sun, J., et al. Recent advances on machine learning for computational fluid dynamics: A survey. arXiv preprint arXiv:2408.12171, 2024

  59. [59]

    Large-scale multi-modal pre-trained models: A comprehensive survey

    Wang, X., Chen, G., Qian, G., Gao, P., Wei, X.-Y., Wang, Y., Tian, Y., and Gao, W. Large-scale multi-modal pre-trained models: A comprehensive survey. Machine Intelligence Research, 20 0 (4): 0 447--482, 2023

  60. [60]

    An explicit analysis of the entropic penalty in linear programming

    Weed, J. An explicit analysis of the entropic penalty in linear programming. In Conference On Learning Theory, pp.\ 1841--1855. PMLR, 2018

  61. [61]

    Neural network approximation for pessimistic offline reinforcement learning

    Wu, D., Jiao, Y., Shen, L., Yang, H., and Lu, X. Neural network approximation for pessimistic offline reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp.\ 15868--15877, 2024

  62. [62]

    A fast proximal point method for computing exact W asserstein distance

    Xie, Y., Wang, X., Wang, R., and Zha, H. A fast proximal point method for computing exact W asserstein distance. In Uncertainty in artificial intelligence, pp.\ 433--453. PMLR, 2020

  63. [63]

    T., and Toh, K.-C

    Yang, L., Liang, L., Chu, H. T., and Toh, K.-C. A corrected inexact proximal augmented L agrangian method with a relative error criterion for a class of group-quadratic regularized optimal transport problems. Journal of Scientific Computing, 99 0 (3): 0 79, 2024

  64. [64]

    ripALM: A Relative-Type Inexact Proximal Augmented Lagrangian Method for Linearly Constrained Convex Optimization

    Zhu, J., Liang, L., Yang, L., and Toh, K.-C. rip ALM : A relative-type inexact proximal augmented L agrangian method with applications to quadratically regularized optimal transport. arXiv preprint arXiv:2411.13267, 2024