pith. sign in

arxiv: 2605.28134 · v1 · pith:BPHKZV6Nnew · submitted 2026-05-27 · 🧮 math.OC · stat.ML

Convergence of empirical subgradients for optimal transport-based objectives

Pith reviewed 2026-06-29 11:16 UTC · model grok-4.3

classification 🧮 math.OC stat.ML
keywords optimal transportsubdifferentialsgraphical convergenceempirical convergencesubgradient methodsparameterized objectivessliced Wasserstein
0
0 comments X

The pith

Sampled optimal transport objectives have subdifferentials that converge graphically to the population subdifferential.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that objectives built from finite samples of optimal transport costs have subdifferentials that converge graphically to those of the corresponding population objectives. This convergence implies that subgradient methods run on the sampled problem will approach stationary points of the full population problem. The result relies on smooth parameterizations to maintain stability between statistical consistency and optimization. Illustrations cover risk-averse optimization, fairness-constrained learning, and sliced Wasserstein problems. Nonsmooth costs or models can produce derivatives that destabilize as the sample size grows.

Core claim

We study parameterized objectives defined by sampled transport costs and prove graphical convergence of their subdifferentials to the subdifferential of the population objective. In particular, this ensures that standard subgradient methods consistently approach stationary points of the population-level problem. The analysis is illustrated in risk-averse optimization, fairness-constrained learning, and sliced Wasserstein problems, with smooth parameterizations providing a stable interface between sampling and optimization.

What carries the argument

Graphical convergence of subdifferentials between empirical and population optimal transport-based objectives

If this is right

  • Subgradient methods applied to the sampled problem approach stationary points of the population objective.
  • Smooth parameterizations ensure stable derivatives in the large-sample limit.
  • The convergence result applies directly to risk-averse optimization, fairness-constrained learning, and sliced Wasserstein problems.
  • Nonsmooth costs and models can produce unstable derivatives as sample size increases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Empirical optimal transport losses can be treated as reliable proxies for population-level optimization when parameters remain smooth.
  • The same graphical-convergence approach might apply to other sampling-based losses if analogous technical conditions hold.
  • Training pipelines using transport costs may benefit from enforcing smoothness on the model class to avoid limit instability.

Load-bearing premise

Smooth parameterizations are needed to translate statistical consistency into stable optimization behavior without unstable derivatives in the large-sample limit.

What would settle it

An explicit example of a smooth parameterization and transport cost where the empirical subdifferential fails to converge graphically to the population subdifferential, or where subgradient iterates on growing samples diverge from the population stationary points.

Figures

Figures reproduced from arXiv: 2605.28134 by Tam Le (LPSM, UPCit\'e).

Figure 1
Figure 1. Figure 1: Sample-induced local minimum One-dimensional transport costs illustrate this phenomenon well as they admit a quantile representation [61, Chapter 2], which reduces to an explicit sorting-based formula for discrete measures. This links transport costs with ranks and quantiles and makes them particularly convenient in learning pipelines. Such tractability has been exploited for instance in his￾togram matchin… view at source ↗
Figure 2
Figure 2. Figure 2: The population objective is increasing, with subgradients in [ 1 4 , 1] while the range of empirical derivatives contains zero. Experiment. For the numerical illustrations, we take w = 3/4 and M = 6. In [PITH_FULL_IMAGE:figures/full_fig_p032_2.png] view at source ↗
read the original abstract

Optimal transport is widely used to learn distributions, enforce distributional constraints, and model uncertainty. In applications, transport losses are often computed from samples through tractable representations, such as one-dimensional sorting formulas or sliced Wasserstein costs, making them practical components in training pipelines. We study parameterized objectives defined by sampled transport costs and prove graphical convergence of their subdifferentials to the subdifferential of the population objective. In particular, this ensures that standard subgradient methods consistently approach stationary points of the population-level problem. We illustrate the results in several settings, including risk-averse optimization, fairness-constrained learning, and sliced Wasserstein problems. Our analysis highlights that smooth parameterizations provide a favorable interface between statistical consistency and optimization. By contrast, transport objectives with nonsmooth costs and models may exhibit unstable derivatives in the large-sample limit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proves graphical convergence of the subdifferentials of empirical optimal transport (OT) objectives—defined via sampled transport costs such as one-dimensional sorting or sliced Wasserstein—to the subdifferential of the corresponding population objective. This convergence is shown to ensure that standard subgradient methods applied to the empirical problems consistently approach stationary points of the population problem. The results are illustrated in risk-averse optimization, fairness-constrained learning, and sliced Wasserstein settings, with emphasis on the favorable role of smooth parameterizations versus potential instability in nonsmooth cases.

Significance. If the graphical convergence result holds under the stated conditions, the work supplies a useful theoretical bridge between statistical consistency of empirical OT losses and the reliability of first-order optimization methods. This is relevant for machine learning pipelines that incorporate transport-based objectives, and the explicit contrast between smooth and nonsmooth regimes offers practical guidance on when subgradient consistency can be expected.

major comments (2)
  1. [Main theorem / assumptions paragraph] The central graphical convergence claim (abstract and main theorem) relies on technical conditions on the transport cost and parameterization class that are invoked but whose precise statement and necessity are not fully detailed in the provided abstract; the main result section should explicitly list all assumptions (e.g., on smoothness, compactness, or measurability) and verify they are minimal for the conclusion.
  2. [Section on illustrations] The illustrations (risk-averse optimization, fairness, sliced Wasserstein) are presented as supporting examples, but without quantitative verification that the empirical subdifferentials indeed converge in the reported regimes, it is unclear whether the examples confirm the rate or only the qualitative behavior; a numerical check or explicit error bound would strengthen the claim.
minor comments (2)
  1. Notation for the empirical versus population subdifferentials should be introduced once and used consistently; occasional shifts between ∂ and ∂_emp notation reduce readability.
  2. [Abstract / conclusion] The abstract states that nonsmooth costs 'may exhibit unstable derivatives in the large-sample limit,' but this is not accompanied by a counter-example or reference; adding a brief remark or citation would clarify the contrast.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address each major comment below and will incorporate the suggested clarifications in a revised version of the manuscript.

read point-by-point responses
  1. Referee: [Main theorem / assumptions paragraph] The central graphical convergence claim (abstract and main theorem) relies on technical conditions on the transport cost and parameterization class that are invoked but whose precise statement and necessity are not fully detailed in the provided abstract; the main result section should explicitly list all assumptions (e.g., on smoothness, compactness, or measurability) and verify they are minimal for the conclusion.

    Authors: We agree that the assumptions should be stated more explicitly for clarity. In the revised manuscript we will insert a dedicated 'Assumptions' paragraph immediately preceding the statement of the main graphical convergence theorem. This paragraph will enumerate all conditions on the transport cost (continuity, growth, and measurability requirements) and on the parameterization class (compactness of the parameter domain and appropriate measurability of the maps). We will also add a short remark discussing the role of each assumption in the proof and note which ones are standard versus those that are tailored to the OT setting. revision: yes

  2. Referee: [Section on illustrations] The illustrations (risk-averse optimization, fairness, sliced Wasserstein) are presented as supporting examples, but without quantitative verification that the empirical subdifferentials indeed converge in the reported regimes, it is unclear whether the examples confirm the rate or only the qualitative behavior; a numerical check or explicit error bound would strengthen the claim.

    Authors: The illustrations are designed to highlight qualitative distinctions between smooth and nonsmooth regimes that follow from the theory, rather than to provide rate information. We acknowledge that a quantitative check would make the examples more convincing. In the revision we will add, in the sliced Wasserstein subsection, a small numerical study that tracks the distance between empirical and population subdifferentials (or a proxy such as the norm of the difference in subgradient evaluations) across increasing sample sizes, thereby supplying concrete evidence of the convergence behavior in at least one setting. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a mathematical proof of graphical convergence of subdifferentials for empirical OT-based objectives to the population subdifferential. The derivation relies on standard variational analysis tools and assumptions on smooth parameterizations, without reducing any central claim to a fitted parameter, self-referential definition, or load-bearing self-citation chain. The result is framed as an independent convergence theorem that applies to the stated regimes (risk-averse optimization, fairness, sliced Wasserstein) and explicitly contrasts with nonsmooth cases, remaining self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, invented entities, or ad-hoc axioms; the result rests on standard background from optimal transport and variational analysis.

axioms (1)
  • standard math Standard properties of subdifferentials and graphical convergence from variational analysis
    Invoked to establish the main convergence statement.

pith-pipeline@v0.9.1-grok · 5664 in / 1254 out tokens · 31139 ms · 2026-06-29T11:16:17.714422+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

70 extracted references · 10 canonical work pages · 1 internal anchor

  1. [1]

    Aliprantis and K

    C. Aliprantis and K. Border , Infinite Dimensional Analysis , Springer Berlin, Heidelberg, 2006. 32

  2. [2]

    Ambrosio, N

    L. Ambrosio, N. Gigli, and G. Savar ´e, Gradient flows: in metric spaces and in the space of probability measures , Springer, 2005

  3. [3]

    Arjovsky, S

    M. Arjovsky, S. Chintala, and L. Bottou, Wasserstein generative adversarial networks, in International conference on machine learning, Pmlr, 2017, pp. 214–223

  4. [4]

    Artstein and R

    Z. Artstein and R. A. Vitale, A strong law of large numbers for random compact sets, The Annals of Probability, (1975), pp. 879–882

  5. [5]

    Attouch, Convergence de fonctionnelles convexes , in Journ´ ees d’Analyse Non Lin´ eaire: Proceedings, Besan¸ con, France, June 1977, Springer, 2006, pp

    H. Attouch, Convergence de fonctionnelles convexes , in Journ´ ees d’Analyse Non Lin´ eaire: Proceedings, Besan¸ con, France, June 1977, Springer, 2006, pp. 1–40

  6. [6]

    Aubin, Graphical convergence of set-valued maps, (1987)

    J.-P. Aubin, Graphical convergence of set-valued maps, (1987)

  7. [7]

    Bena¨ım, J

    M. Bena¨ım, J. Hofbauer, and S. Sorin , Perturbations of set-valued dynami- cal systems, with applications to game theory , Dynamic Games and Applications, 2 (2012), pp. 195–205

  8. [8]

    Beyler and F

    E. Beyler and F. Bach , Convergence of deterministic and stochastic diffusion- model samplers: A simple analysis in wasserstein distance , arXiv preprint arXiv:2508.03210, (2025)

  9. [9]

    Billingsley, Convergence of probability measures, John Wiley & Sons, 2013

    P. Billingsley, Convergence of probability measures, John Wiley & Sons, 2013

  10. [10]

    Bolte and E

    J. Bolte and E. Pauwels , Conservative set valued fields, automatic differenti- ation, stochastic gradient methods and deep learning , Mathematical Programming, 188 (2021), pp. 19–51

  11. [11]

    Bonalli, B

    R. Bonalli, B. Bonnet-Weill, and L. Pfeiffer , A characterization of law- invariant and coherent risk measures through optimal transport , arXiv preprint arXiv:2512.19157, (2025)

  12. [12]

    Bruno, Y

    S. Bruno, Y. Zhang, D.-Y. Lim, ¨O. D. Akyildiz, and S. Sabanis , On diffusion-based generative models and their error bounds: The log-concave case with full convergence estimates, arXiv preprint arXiv:2311.13584, (2023)

  13. [13]

    Carlier, V

    G. Carlier, V. Duval, G. Peyr´e, and B. Schmitzer, Convergence of entropic schemes for optimal transport and gradient flows , SIAM Journal on Mathematical Analysis, 49 (2017), pp. 1385–1418

  14. [14]

    Chapel, R

    L. Chapel, R. Tavenard, and S. Vaiter , Differentiable generalized sliced wasserstein plans , Advances in Neural Information Processing Systems, 38 (2026), pp. 162905–162929

  15. [15]

    Clarke, Optimization and Nonsmooth Analysis , Classics in Applied Mathemat- ics, Society for Industrial and Applied Mathematics, 1990

    F. Clarke, Optimization and Nonsmooth Analysis , Classics in Applied Mathemat- ics, Society for Industrial and Applied Mathematics, 1990

  16. [16]

    F. H. Clarke, Generalized gradients and applications, Transactions of the American Mathematical Society, 205 (1975), pp. 247–262. 33

  17. [17]

    Cuturi and A

    M. Cuturi and A. Doucet , Fast computation of wasserstein barycenters , in In- ternational conference on machine learning, PMLR, 2014, pp. 685–693

  18. [18]

    Cuturi, L

    M. Cuturi, L. Meng-Papaxanthos, Y. Tian, C. Bunne, G. Davis, and O. Teboul, Optimal transport tools (ott): A jax toolbox for all things wasserstein , arXiv preprint arXiv:2201.12324, (2022)

  19. [19]

    Cuturi, O

    M. Cuturi, O. Teboul, and J.-P. Vert, Differentiable ranking and sorting using optimal transport, in Advances in Neural Information Processing Systems, H. Wal- lach, H. Larochelle, A. Beygelzimer, F. d 'Alch´ e-Buc, E. Fox, and R. Garnett, eds., vol. 32, Curran Associates, Inc., 2019

  20. [20]

    C ´edric, Optimal transport : old and new / C´ edric Villani , Grundlehren der mathematischen Wissenschaften, Springer, Berlin, 2009

    V. C ´edric, Optimal transport : old and new / C´ edric Villani , Grundlehren der mathematischen Wissenschaften, Springer, Berlin, 2009

  21. [21]

    J. M. Danskin , The theory of max-min and its application to weapons allocation problems, Springer Science & Business Media, 2012

  22. [22]

    Davis, D

    D. Davis, D. Drusvyatskiy, S. Kakade, and J. D. Lee, Stochastic subgradient method converges on tame functions , Foundations of Computational Mathematics, 20 (2020), pp. 119–154

  23. [23]

    Dellacherie and P.-A

    C. Dellacherie and P.-A. Meyer, Probabilities and potential, c: potential theory for discrete and continuous semigroups , vol. 151, Elsevier, 2011

  24. [24]

    Delon, Midway image equalization, Journal of Mathematical Imaging and Vision, 21 (2004), pp

    J. Delon, Midway image equalization, Journal of Mathematical Imaging and Vision, 21 (2004), pp. 119–134

  25. [25]

    Deshpande, Z

    I. Deshpande, Z. Zhang, and A. G. Schwing , Generative modeling using the sliced wasserstein distance, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 3483–3491

  26. [26]

    Dumont, T

    T. Dumont, T. Lacombe, and F.-X. Vialard, On the existence of monge maps for the gromov–wasserstein problem, Foundations of Computational Mathematics, 25 (2025), pp. 463–510

  27. [27]

    Durrett , Probability: Theory and Examples , Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, 2010

    R. Durrett , Probability: Theory and Examples , Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, 2010

  28. [28]

    Dwork, M

    C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel , Fairness through awareness, in Proceedings of the 3rd innovations in theoretical computer science conference, 2012, pp. 214–226

  29. [29]

    Minibatch optimal transport distances; analysis and applications.arXiv preprint arXiv:2101.01792,

    K. Fatras, Y. Zine, S. Majewski, R. Flamary, R. Gribonval, and N. Courty, Minibatch optimal transport distances; analysis and applications, arXiv preprint arXiv:2101.01792, (2021)

  30. [30]

    Feldman, S

    M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian, Certifying and removing disparate impact , in proceed- ings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 2015, pp. 259–268. 34

  31. [31]

    Flamary, N

    R. Flamary, N. Courty, A. Gramfort, M. Z. Alaya, A. Boisbunon, S. Chambon, L. Chapel, A. Corenflos, K. Fatras, N. Fournier, L. Gau- theron, N. T. Gayraud, H. Janati, A. Rakotomamonjy, I. Redko, A. Ro- let, A. Schutz, V. Seguy, D. J. Sutherland, R. Tavenard, A. Tong, and T. Vayer, Pot: Python optimal transport , Journal of Machine Learning Research, 22 (20...

  32. [32]

    F¨ollmer and A

    H. F¨ollmer and A. Schied , Stochastic finance: an introduction in discrete time , Walter de Gruyter, 2011

  33. [33]

    Fournier and A

    N. Fournier and A. Guillin , On the rate of convergence in wasserstein distance of the empirical measure , Probability theory and related fields, 162 (2015), pp. 707– 738

  34. [34]

    Gao and A

    R. Gao and A. Kleywegt , Distributionally robust stochastic optimization with wasserstein distance, Math. Oper. Res., 48 (2023), pp. 603–655

  35. [35]

    Ghossoub and D

    M. Ghossoub and D. Saunders, On the continuity of the feasible set mapping in optimal transport, Economic Theory Bulletin, 9 (2021), pp. 113–117

  36. [36]

    Gulrajani, F

    I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, Improved training of wasserstein gans , Advances in neural information processing systems, 30 (2017)

  37. [37]

    Houdard, A

    A. Houdard, A. Leclaire, N. Papadakis, and J. Rabin, On the gradient for- mula for learning generative models with regularized optimal transport costs , Trans- actions on Machine Learning Research, (2023)

  38. [38]

    D. Kuhn, P. M. Esfahani, V. A. Nguyen, and S. Shafieezadeh-Abadeh , Wasserstein distributionally robust optimization: Theory and applications in ma- chine learning, in Operations research & management science in the age of analytics, Informs, 2019, pp. 130–166

  39. [39]

    Laguel, J

    Y. Laguel, J. Malick, and Z. Harchaoui, Superquantile-based learning: a direct approach using gradient-based optimization, Journal of Signal Processing Systems, 94 (2022), pp. 161–177

  40. [40]

    and Mérigot, Q.Gluing methods for quantitative stability of optimal trans- port maps

    C. Letrouit and Q. M´erigot, Gluing methods for quantitative stability of optimal transport maps, arXiv preprint arXiv:2411.04908, (2024)

  41. [41]

    A. B. Levy, R. Poliquin, and L. Thibault , Partial extensions of attouch’s theorem with applications to proto-derivatives of subgradient mappings , Transactions of the American Mathematical Society, 347 (1995), pp. 1269–1294

  42. [42]

    L´evy, Sur certains processus stochastiques homog` enes, Compositio mathematica, 7 (1940), pp

    P. L´evy, Sur certains processus stochastiques homog` enes, Compositio mathematica, 7 (1940), pp. 283–339

  43. [43]

    Lobashev, M

    A. Lobashev, M. Larchenko, and D. Guskov , Color conditional generation with sliced wasserstein guidance, Advances in Neural Information Processing Systems, 38 (2026), pp. 164572–164601. 35

  44. [44]

    Mehta, V

    R. Mehta, V. Roulet, K. Pillutla, L. Liu, and Z. Harchaoui , Stochas- tic optimization for spectral risk measures , in International Conference on Artificial Intelligence and Statistics, PMLR, 2023, pp. 10112–10159

  45. [45]

    M´erigot, A

    Q. M´erigot, A. Delalande, and F. Chazal , Quantitative stability of optimal transport maps and linearization of the 2-wasserstein space , in International Confer- ence on Artificial Intelligence and Statistics, PMLR, 2020, pp. 3186–3196

  46. [46]

    Nadjahi, Sliced-Wasserstein distance for large-scale machine learning: theory, methodology and extensions, PhD thesis, Institut polytechnique de Paris, 2021

    K. Nadjahi, Sliced-Wasserstein distance for large-scale machine learning: theory, methodology and extensions, PhD thesis, Institut polytechnique de Paris, 2021

  47. [47]

    Nadjahi, A

    K. Nadjahi, A. Durmus, L. Chizat, S. Kolouri, S. Shahrampour, and U. Simsekli, Statistical and topological properties of sliced probability divergences , Advances in Neural Information Processing Systems, 33 (2020), pp. 20802–20812

  48. [48]

    Nguyen, S

    K. Nguyen, S. Zhang, T. Le, and N. Ho , Sliced wasserstein with random-path projecting directions, in Proceedings of the 41st International Conference on Machine Learning, ICML’24, JMLR.org, 2024

  49. [49]

    Norkin, Generalized-differentiable functions, Cybernetics and Systems Analysis, 16 (1980), pp

    V. Norkin, Generalized-differentiable functions, Cybernetics and Systems Analysis, 16 (1980), pp. 10–12

  50. [50]

    V. I. Norkin et al., On a strong graphical law of large numbers for random semi- continuous mappings, Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes, (2013), pp. 102–111

  51. [51]

    V. I. Norkin and R. J.-B. Wets , On a strong graphical law of large numbers for random semicontinuous mappings , Vestnik S.-Petersburg University. Series 10. Applied Mathematics, Computer Science, Control Processes, (2013), pp. 102–111

  52. [52]

    Pauwels and S

    E. Pauwels and S. Vaiter , The derivatives of sinkhorn–knopp converge , SIAM Journal on Optimization, 33 (2023), pp. 1494–1517

  53. [53]

    Peyr´e and M

    G. Peyr´e and M. Cuturi , Computational optimal transport: With applications to data science , Found. Trends Mach. Learn., 11 (2019), p. 355–607

  54. [54]

    Pillutla, Y

    K. Pillutla, Y. Laguel, J. Malick, and Z. Harchaoui , Federated learning with superquantile aggregation for heterogeneous data, Machine Learning, 113 (2024), pp. 2955–3022

  55. [55]

    Rabin, G

    J. Rabin, G. Peyr ´e, J. Delon, and M. Bernot , Wasserstein barycenter and its application to texture mixing , in International conference on scale space and vari- ational methods in computer vision, Springer, 2011, pp. 435–446

  56. [56]

    Risser, A

    L. Risser, A. G. Sanz, Q. Vincenot, and J.-M. Loubes , Tackling algorith- mic bias in neural-network classifiers using wasserstein-2 regularization , Journal of Mathematical Imaging and Vision, 64 (2022), pp. 672–689

  57. [57]

    R. T. Rockafellar and R. J. B. Wets , Variational Analysis, Springer Berlin Heidelberg, 1998. 36

  58. [58]

    Rodr´ıguez-V´ıtores, C

    D. Rodr´ıguez-V´ıtores, C. Lalanne, and J.-M. Loubes , Learning with dif- ferentially private (sliced) wasserstein gradients , arXiv preprint arXiv:2502.01701, (2025)

  59. [59]

    Rychener, B

    Y. Rychener, B. Taskesen, and D. Kuhn , Metrizing fairness , arXiv preprint arXiv:2205.15049, (2022)

  60. [60]

    Salim, A strong law of large numbers for random monotone operators, Set-Valued and Variational Analysis, 31 (2023), p

    A. Salim, A strong law of large numbers for random monotone operators, Set-Valued and Variational Analysis, 31 (2023), p. 38

  61. [61]

    Santambrogio , Optimal Transport for Applied Mathematicians , Progress in Nonlinear Differential Equations and Their Applications, Birkh¨ auser Cham, 1 ed., 2015

    F. Santambrogio , Optimal Transport for Applied Mathematicians , Progress in Nonlinear Differential Equations and Their Applications, Birkh¨ auser Cham, 1 ed., 2015

  62. [62]

    Schechtman , The gradient’s limit of a definable family of functions admits a variational stratification, SIAM Journal on Optimization, (2026)

    S. Schechtman , The gradient’s limit of a definable family of functions admits a variational stratification, SIAM Journal on Optimization, (2026)

  63. [63]

    Sebbouh, M

    O. Sebbouh, M. Cuturi, and G. Peyr´e, Randomized stochastic gradient descent ascent, in International Conference on Artificial Intelligence and Statistics, PMLR, 2022, pp. 2941–2969

  64. [64]

    Shapiro and H

    A. Shapiro and H. Xu , Uniform laws of large numbers for set-valued mappings and subdifferentials of random functions , Journal of Mathematical Analysis and Ap- plications, 325 (2007), pp. 1390–1399

  65. [65]

    Sliced Transport Plans

    E. Tanguy, L. Chapel, and J. Delon , Sliced optimal transport plans , arXiv preprint arXiv:2508.01243, (2025)

  66. [66]

    Tanguy, R

    E. Tanguy, R. Flamary, and J. Delon, Properties of discrete sliced wasserstein losses, Mathematics of Computation, 94 (2025), pp. 1411–1465

  67. [67]

    Vauthier, A

    C. Vauthier, A. Korba, and Q. M ´erigot, Towards understanding gradient dynamics of the sliced-wasserstein distance via critical point analysis , arXiv preprint arXiv:2502.06525, (2025)

  68. [68]

    J. Wang, R. Gao, and Y. Xie , Sinkhorn distributionally robust optimization , 2023

  69. [69]

    R. Xiao, Y. Ge, R. Jiang, and Y. Yan , A unified framework for rank-based loss minimization , Advances in Neural Information Processing Systems, 36 (2023), pp. 51302–51326

  70. [70]

    Zolezzi , Convergence of generalized gradients , Set-Valued Analysis, 2 (1994), pp

    T. Zolezzi , Convergence of generalized gradients , Set-Valued Analysis, 2 (1994), pp. 381–393. 37