pith. sign in

arxiv: 2605.20545 · v1 · pith:H3HKQOFXnew · submitted 2026-05-19 · 📊 stat.ML · cs.LG

Sample Complexity of Transfer Learning: An Optimal Transport Approach

Pith reviewed 2026-05-21 06:16 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords transfer learningsample complexityoptimal transporthigh-dimensional statisticssmoothnessdomain adaptationstatistical learning theory
0
0 comments X

The pith

Transfer learning achieves sample complexity O(m^{-(α+1)/d}) for d>3 by transporting smoothness from source to target via optimal transport.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that transfer learning, viewed as moving a source distribution to a target one, yields a faster error decay than training from scratch once the dimension exceeds three. It derives the improved rate by showing how an optimal transport map can carry over the data distribution's smoothness level to the target task. A sympathetic reader would care because this rate depends on the data smoothness rather than the model's own smoothness, which helps explain why transfer helps most with complex, non-smooth target models such as deep networks. The work therefore supplies a concrete statistical reason for using pre-trained models when target samples are scarce.

Core claim

When the data dimension d is higher than 3, the sample complexity for transfer learning is O(m^{-(α+1)/d}), with α the smoothness of the data distribution. This follows from an optimal transport coupling that preserves α in the transferred measure. Direct learning without transfer, by contrast, is limited to the slower rate O(m^{-p/d}) set by the smoothness p of the target model itself.

What carries the argument

The optimal transport coupling between source and target distributions that preserves the smoothness parameter α when pushing the source measure forward.

If this is right

  • Transfer learning delivers its largest gain precisely when the target model family has low smoothness p.
  • The advantage materializes only once dimension d exceeds 3.
  • Numerical tests on image classification confirm that transfer learning raises accuracy in the low-sample regime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same transport argument could be applied to other forms of domain shift that preserve a smoothness index.
  • Checking whether the predicted exponent holds in dimensions much larger than 3 would test how far the improvement extends.
  • Replacing the Euclidean setting with a manifold would show whether the rate improvement survives on non-flat data.

Load-bearing premise

Source and target distributions must admit an optimal transport coupling that preserves the smoothness parameter α of the data distribution in the transferred measure.

What would settle it

An experiment in dimension d>3 that measures empirical excess risk decaying at rate m^{-p/d} rather than the faster m^{-(α+1)/d} would falsify the claimed improvement.

Figures

Figures reproduced from arXiv: 2605.20545 by Guan Wang, Haoyang Cao, Wenpin Tang, Xin Guo.

Figure 1
Figure 1. Figure 1: Illustration of the Office-31 transfer learning task. [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Transfer learning TABLE IV: Relative Improvement of Transfer Learning over Direct Learning for ROP Detection ROP Data % ∆AUROC ∆Accuracy ∆Precision ∆Sensitivity 100% 10.05% 18.75% 58.01% 19.71% 50% 14.09% 15.37% 45.83% 27.19% 10% 16.58% 14.65% 43.68% 26.03% 1% 16.08% 13.04% 46.67% 49.46% Tables II–IV are (again) consistent with the theoretical results in Sections III. As shown in Table II, transfer learnin… view at source ↗
read the original abstract

Transfer learning is an essential technique for many machine learning/AI models of complex structures such as large language models and generative AI. The essence of transfer learning is to leverage knowledge from resolved source tasks for a new target task, especially when the sample size $m$ of the training data for the latter is low. In this work, we rigorously analyze the potential benefit of transfer learning in terms of sample efficiency. Specifically, taking an optimal transport viewpoint of transfer learning, we find that when the data dimension $d$ is higher than $3$, the sample complexity for transfer learning is $O(m^{-(\alpha+1)/d})$, with $\alpha$ indicating the smoothness of the data distribution, as opposed to the $O(m^{-p/d})$ sample complexity for direct learning with $p$ indicating the smoothness of the optimal target model. Our finding theoretically supports a better sample efficiency for transfer learning, when the target task is optimizing over a family of not-so-smooth models (i.e., highly complex networks with the possible use of non-smooth activation functions). Using image classification as an example, we numerically demonstrate the sample efficiency for transfer learning, that is, in the data hungry regime, the model performance can be significantly improved by transfer learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims to provide a rigorous optimal-transport analysis of transfer learning sample complexity. It asserts that, for data dimension d > 3, transfer learning attains the rate O(m^{-(α+1)/d}) where α is the smoothness of the data distribution, improving on the direct-learning rate O(m^{-p/d}) with p the smoothness of the target model. The claim is illustrated numerically on image classification.

Significance. If the central derivation is completed with explicit regularity conditions, the result would supply a concrete theoretical justification for the observed sample-efficiency gains of transfer learning in high-dimensional regimes with complex, non-smooth models. This would be a useful contribution to the statistical understanding of transfer.

major comments (2)
  1. [Main theorem / derivation of sample-complexity bound] The improved rate O(m^{-(α+1)/d}) is obtained by replacing the model smoothness p with the data smoothness α+1 via an optimal-transport coupling. Standard OT regularity theory (Brenier, Caffarelli) shows that even for smooth compactly supported densities the transport map need not be Lipschitz or Hölder when d ≥ 2. The manuscript must therefore state, in the hypotheses of the main theorem, the precise density assumptions that guarantee the push-forward measure inherits smoothness α; without this the rate does not follow from the stated OT setup.
  2. [Introduction and statement of main result] The abstract and introduction present the result as holding for d > 3, yet the derivation appears to rely on an unverified transfer of Hölder regularity through the coupling. A concrete counter-example or reference to the precise condition (e.g., uniform ellipticity of the densities) under which the OT map remains C^{1,β} with β tied to α should be supplied; otherwise the central claim rests on an assumption that is not automatic.
minor comments (2)
  1. [Numerical experiments] The numerical section would benefit from explicit reporting of the source and target model architectures, the precise transfer procedure (e.g., which layers are frozen), and error bars over multiple random seeds.
  2. [Notation and preliminaries] Notation for the smoothness parameters α and p should be introduced once and used consistently; currently the distinction between data smoothness and model smoothness is clear in the abstract but could be reinforced with a short table of symbols.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for emphasizing the need to make the regularity assumptions on the optimal transport map explicit. We address the two major comments below and will revise the manuscript to strengthen the presentation of the hypotheses.

read point-by-point responses
  1. Referee: [Main theorem / derivation of sample-complexity bound] The improved rate O(m^{-(α+1)/d}) is obtained by replacing the model smoothness p with the data smoothness α+1 via an optimal-transport coupling. Standard OT regularity theory (Brenier, Caffarelli) shows that even for smooth compactly supported densities the transport map need not be Lipschitz or Hölder when d ≥ 2. The manuscript must therefore state, in the hypotheses of the main theorem, the precise density assumptions that guarantee the push-forward measure inherits smoothness α; without this the rate does not follow from the stated OT setup.

    Authors: We agree that the regularity of the transport map must be guaranteed by explicit assumptions. The derivation in the manuscript relies on the source and target densities being positive, bounded away from zero and infinity, and C^α on compact convex supports. Under these conditions, Caffarelli's regularity theory ensures the Brenier map is C^{1,β} with β tied to α, allowing the push-forward to inherit the required smoothness for the rate O(m^{-(α+1)/d}). In the revision we will add these density assumptions explicitly as Assumption 2.1 in the hypotheses of the main theorem (Theorem 3.1) and include a brief discussion in Section 2.2 with references to Brenier (1991) and Caffarelli (1992). revision: yes

  2. Referee: [Introduction and statement of main result] The abstract and introduction present the result as holding for d > 3, yet the derivation appears to rely on an unverified transfer of Hölder regularity through the coupling. A concrete counter-example or reference to the precise condition (e.g., uniform ellipticity of the densities) under which the OT map remains C^{1,β} with β tied to α should be supplied; otherwise the central claim rests on an assumption that is not automatic.

    Authors: The restriction d > 3 is motivated by the regime where the improved exponent (α+1)/d meaningfully exceeds the direct-learning rate p/d for typical p < α+1 in high dimensions; the proof uses Sobolev-type embeddings that are favorable in this range. We will add a reference to the precise conditions (uniform ellipticity and C^α regularity of the densities) together with the statement that the OT map is then C^{1,β} by Caffarelli's theorem. This will appear as a remark following the main theorem and in the introduction. We do not supply a counter-example because the claim is stated under these standard OT assumptions; if the referee can point to a specific density pair violating the rate, we would be grateful for the example. revision: partial

Circularity Check

0 steps flagged

No significant circularity: derivation from OT properties remains independent of target result.

full rationale

The paper derives the improved rate O(m^{-(α+1)/d}) by assuming an optimal transport coupling that transfers smoothness α from source to target measure. This assumption is stated explicitly as a hypothesis on the distributions rather than being defined in terms of the final rate or fitted from the target data. No equation reduces the claimed sample complexity to a self-referential fit, and no load-bearing step collapses to a prior self-citation that itself assumes the result. The bound is obtained from standard OT regularity arguments under the stated coupling condition, keeping the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard statistical-learning assumptions about Hölder or Sobolev smoothness of distributions together with the existence of a sufficiently regular optimal transport map between source and target measures.

axioms (1)
  • domain assumption Source and target probability measures admit an optimal transport coupling whose regularity is controlled by the smoothness parameter α of the data distribution.
    Invoked to obtain the improved rate when d > 3.

pith-pipeline@v0.9.0 · 5753 in / 1280 out tokens · 37102 ms · 2026-05-21T06:16:04.353370+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · 3 internal anchors

  1. [1]

    Automatic ICD-9 coding via deep transfer learning,

    M. Zeng, M. Li, Z. Fei, Y . Yu, Y . Pan, and J. Wang, “Automatic ICD-9 coding via deep transfer learning,”Neurocomputing, vol. 324, pp. 43–50, 2019

  2. [2]

    Transfer learning for retinal vascular disease detection: A pilot study with diabetic retinopathy and retinopathy of prematurity,

    G. Wang, Y . Kikuchi, J. Yi, Q. Zou, R. Zhou, and X. Guo, “Transfer learning for retinal vascular disease detection: A pilot study with diabetic retinopathy and retinopathy of prematurity,”arXiv preprint arXiv:2201.01250, 2022

  3. [3]

    Transfer learning for medical image classification: A literature review,

    H. E. Kim, A. Cosa-Linan, N. Santhanam, M. Jannesari, M. E. Maros, and T. Ganslandt, “Transfer learning for medical image classification: A literature review,”BMC Medical Imaging, vol. 22, no. 1, p. 69, 2022

  4. [4]

    Transfer learning in natural language processing,

    S. Ruder, M. E. Peters, S. Swayamdipta, and T. Wolf, “Transfer learning in natural language processing,” inNACCL, 2019, pp. 15–18

  5. [5]

    BERT: Pre- training of deep bidirectional transformers for language understanding,

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre- training of deep bidirectional transformers for language understanding,” inNACCL, vol. 1, 2019, pp. 4171–4186

  6. [6]

    Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks,

    Y .-L. Sung, J. Cho, and M. Bansal, “Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks,” inCVPR, 2022, pp. 5227–5237

  7. [7]

    GPT-4 Technical Report

    OpenAI, “GPT-4 Technical Report,”arXiv preprint arXiv:2303.08774, 2023

  8. [8]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    G. Comanici, E. Bieber, M. Schaekermann, I. Pasupat, N. Sachdeva, I. Dhillon, M. Blistein, O. Ram, D. Zhang, E. Rosenet al., “Gem- ini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities,”arXiv preprint arXiv:2507.06261, 2025. 8

  9. [9]

    The Llama 3 Herd of Models

    A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughanet al., “The llama 3 herd of models,”arXiv preprint arXiv:2407.21783, 2024

  10. [10]

    Ibex: Information- bottleneck-explored coarse-to-fine molecular generation under limited data,

    D. Xu, J. X. Yao, S. Song, Z. Zhu, and J. Ji, “Ibex: Information- bottleneck-explored coarse-to-fine molecular generation under limited data,”arXiv preprint arXiv:2508.10775, 2025

  11. [11]

    Advancing cancer research with synthetic data generation in low-data scenarios,

    P. A. Apellaniz, B. A. Galende, and A. Jim, “Advancing cancer research with synthetic data generation in low-data scenarios,”IEEE Journal of Biomedical Informatics, 2025

  12. [12]

    On the sample complexity of pac- learning using random and chosen examples

    B. Eisenberg and R. L. Rivest, “On the sample complexity of pac- learning using random and chosen examples.” inCOLT, vol. 90, 1990, pp. 154–162

  13. [13]

    The optimal sample complexity of pac learning,

    S. Hanneke, “The optimal sample complexity of pac learning,”J. Mach. Learn. Res., vol. 17, no. 38, pp. 1–15, 2016

  14. [14]

    Bounds on the sample complexity of bayesian learning using information theory and the vc dimension,

    D. Haussler, M. Kearns, and R. E. Schapire, “Bounds on the sample complexity of bayesian learning using information theory and the vc dimension,”Mach. Learn., vol. 14, no. 1, pp. 83–113, 1994

  15. [15]

    On the hardness of domain adaptation and the utility of unlabeled target samples,

    S. Ben-David and R. Urner, “On the hardness of domain adaptation and the utility of unlabeled target samples,” inALT, 2012, pp. 139–153

  16. [16]

    On the value of target data in transfer learning,

    S. Hanneke and S. Kpotufe, “On the value of target data in transfer learning,” inNeurips, vol. 32, 2019, pp. 9867–9877

  17. [17]

    Transfer learning for nonparametric classification: Minimax rate and adaptive classifier,

    T. T. Cai and H. Wei, “Transfer learning for nonparametric classification: Minimax rate and adaptive classifier,”Ann. Stat., vol. 49, no. 1, pp. 100– 128, 2021

  18. [18]

    Adaptive transfer learning,

    H. W. J. Reeve, T. I. Cannings, and R. J. Samworth, “Adaptive transfer learning,”Ann. Stat., vol. 49, no. 6, pp. 3618–3649, 2021

  19. [19]

    A no-free-lunch theorem for multitask learning,

    S. Hanneke and S. Kpotufe, “A no-free-lunch theorem for multitask learning,”Ann. Stat., vol. 50, no. 6, pp. 3119–3143, 2022

  20. [20]

    Limits of model selection under transfer learning,

    S. Hanneke, S. Kpotufe, and Y . Mahdaviyeh, “Limits of model selection under transfer learning,” inCOLT, vol. 36, 2023, pp. 5781–5812

  21. [21]

    Robust transfer learning with unreliable source data,

    J. Fan, C. Gao, and J. M. Klusowski, “Robust transfer learning with unreliable source data,”Ann. Stat., vol. 53, no. 4, pp. 1728–1752, 2025

  22. [22]

    On the theory of transfer learning: The importance of task diversity,

    N. Tripuraneni, M. Jordan, and C. Jin, “On the theory of transfer learning: The importance of task diversity,” inNeurips, vol. 33, 2020, pp. 7852–7862

  23. [23]

    Few-shot learning via learning the representation, provably,

    S. S. Du, W. Hu, S. M. Kakade, J. D. Lee, and Q. Lei, “Few-shot learning via learning the representation, provably,” inICLR, 2021

  24. [24]

    Provable meta-learning of linear representations,

    N. Tripuraneni, C. Jin, and M. I. Jordan, “Provable meta-learning of linear representations,” inICML, vol. 38, 2021, pp. 10 434–10 443

  25. [25]

    Adaptive and robust multi-task learning,

    Y . Duan and K. Wang, “Adaptive and robust multi-task learning,”Ann. Stat., vol. 51, no. 5, pp. 2015–2039, 2023

  26. [26]

    Learning from similar linear represen- tations: Adaptivity, minimaxity, and robustness,

    Y . Tian, Y . Gu, and Y . Feng, “Learning from similar linear represen- tations: Adaptivity, minimaxity, and robustness,”J. Mach. Learn. Res., vol. 26, no. 187, pp. 1–125, 2025

  27. [27]

    On the sample complexity of representation learning in multi-task bandits with global and local structure,

    A. Russo and A. Proutiere, “On the sample complexity of representation learning in multi-task bandits with global and local structure,” inAAAI, vol. 37, 2023, pp. 9658–9667

  28. [28]

    Sample complexity of multi-task reinforcement learning,

    E. Brunskill and L. Li, “Sample complexity of multi-task reinforcement learning,” inUAI, vol. 29, 2013, p. 122–131

  29. [29]

    Reducing sample complexity in reinforcement learning by transferring transition and reward proba- bilities,

    K. Oguni, K. Narisawa, and A. Shinohara, “Reducing sample complexity in reinforcement learning by transferring transition and reward proba- bilities,” inICAART, vol. 2, 2014, pp. 632–638

  30. [30]

    Prov- able benefits of representational transfer in reinforcement learning,

    A. Agarwal, Y . Song, W. Sun, K. Wang, M. Wang, and X. Zhang, “Prov- able benefits of representational transfer in reinforcement learning,” in COLT, vol. 36, 2023, pp. 2114–2187

  31. [31]

    Transfer learning for contextual multi- armed bandits,

    C. Cai, T. T. Cai, and H. Li, “Transfer learning for contextual multi- armed bandits,”Ann. Stat., vol. 52, no. 1, pp. 207–232, 2024

  32. [32]

    Learning to undo: Transfer reinforcement learning under linear state space transformations,

    M. Mahajan, A. Pacchiano, and X. Zhang, “Learning to undo: Transfer reinforcement learning under linear state space transformations,” in OpenReview, 2025. [Online]. Available: https://openreview.net/forum? id=jI8a3s5xUz

  33. [33]

    Transfer learning for high-dimensional linear regression: Prediction, estimation and minimax optimality,

    S. Li, T. T. Cai, and H. Li, “Transfer learning for high-dimensional linear regression: Prediction, estimation and minimax optimality,”J. R. Stat. Soc. Ser. B Stat. Methodol., vol. 84, no. 1, pp. 149–173, 2022

  34. [34]

    Transfer learning under high-dimensional gener- alized linear models,

    Y . Tian and Y . Feng, “Transfer learning under high-dimensional gener- alized linear models,”J. Am. Stat. Assoc., vol. 118, no. 544, pp. 2684– 2697, 2023

  35. [35]

    Unified transfer learning in high-dimensional linear regres- sion,

    S. S. Liu, “Unified transfer learning in high-dimensional linear regres- sion,” inAISTATS, vol. 27, 2024, pp. 1036–1044

  36. [36]

    TransFusion: Covariate-shift robust transfer learning for high-dimensional regression,

    Z. He, Y . Sun, and R. Li, “TransFusion: Covariate-shift robust transfer learning for high-dimensional regression,” inAISTATS, vol. 27, 2024, pp. 703–711

  37. [37]

    On the provable advantage of unsupervised pretraining,

    J. Ge, S. Tang, J. Fan, and C. Jin, “On the provable advantage of unsupervised pretraining,” inICLR, 2024

  38. [38]

    Provable benefits of unsupervised pre-training and transfer learning via single-index models,

    T. Jones-McCormick, A. Jagannath, and S. Sen, “Provable benefits of unsupervised pre-training and transfer learning via single-index models,” inICLM, vol. 42, 2025, pp. 28 350–28 376

  39. [39]

    Features are fate: A theory of transfer learning in high-dimensional regression,

    J. Tahir, S. Ganguli, and G. M. Rotskoff, “Features are fate: A theory of transfer learning in high-dimensional regression,” inICML, vol. 42, 2025, pp. 58 142–58 168

  40. [40]

    Provable sample-efficient transfer learning conditional diffusion models via representation learn- ing,

    Z. Cheng, T. Xie, S. Zhang, and C. Zhang, “Provable sample-efficient transfer learning conditional diffusion models via representation learn- ing,” inNeurips, vol. 38, 2025

  41. [41]

    On the sample complexity of entropic optimal transport,

    P. Rigollet and A. J. Stromme, “On the sample complexity of entropic optimal transport,”Ann. Stat., vol. 53, no. 1, pp. 61 – 90, 2025

  42. [42]

    A unified analysis of generalization and sample complexity for semi-supervised domain adaptation,

    E. Vural and H. Karaca, “A unified analysis of generalization and sample complexity for semi-supervised domain adaptation,”arXiv preprint arXiv:2507.22632, 2025

  43. [43]

    Partial domain adaptation via importance sampling-based shift correction,

    C. X. Ren, Y . W. Luo, C. J. Guo, and X. L. Xu, “Partial domain adaptation via importance sampling-based shift correction,”IEEE Trans. Neural Netw., 2025

  44. [44]

    Villani,Optimal transport, ser

    C. Villani,Optimal transport, ser. Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 2009, vol. 338, old and new

  45. [45]

    Sinkhorn distances: Lightspeed computation of optimal transport,

    M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport,” inNIPS, vol. 26, 2013, p. 2292–2300

  46. [46]

    Sliced and Radon Wasserstein barycenters of measures,

    N. Bonneel, J. Rabin, G. Peyr ´e, and H. Pfister, “Sliced and Radon Wasserstein barycenters of measures,”J. Math. Imaging Vision, vol. 51, pp. 22–45, 2015

  47. [47]

    Diffusion schr¨odinger bridge with applications to score-based generative model- ing,

    V . De Bortoli, J. Thornton, J. Heng, and A. Doucet, “Diffusion schr¨odinger bridge with applications to score-based generative model- ing,” inNeurips, vol. 34, 2021, pp. 17 695–17 709

  48. [48]

    Optimal transport for domain adaptation,

    N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy, “Optimal transport for domain adaptation,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 9, pp. 1853–1865, 2016

  49. [49]

    An optimal transport approach to estimating causal effects via nonlinear difference-in-differences,

    W. Torous, F. Gunsilius, and P. Rigollet, “An optimal transport approach to estimating causal effects via nonlinear difference-in-differences,” arXiv preprint arXiv:2108.05858, 2021

  50. [50]

    Plugin estimation of smooth optimal transport maps,

    T. Manole, S. Balakrishnan, J. Niles-Weed, and L. Wasserman, “Plugin estimation of smooth optimal transport maps,”Ann. Statist., vol. 52, no. 3, pp. 966–998, 2024

  51. [51]

    Chewi, J

    S. Chewi, J. Niles-Weed, and P. Rigollet, “Statistical optimal transport,” arXiv preprint arXiv:2407.18163, 2024

  52. [52]

    Risk of transfer learning and its applications in finance,

    H. Cao, H. Gu, X. Guo, and M. Rosenbaum, “Risk of transfer learning and its applications in finance,”arXiv preprint arXiv:2311.03283, 2023

  53. [53]

    On the optimality of conditional expectation as a bregman predictor,

    A. Banerjee, X. Guo, and H. Wang, “On the optimality of conditional expectation as a bregman predictor,”IEEE Trans. Inf. Theory, vol. 51, no. 7, pp. 2664–2669, 2005

  54. [54]

    Villani,Topics in optimal transportation, ser

    C. Villani,Topics in optimal transportation, ser. Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2003, vol. 58

  55. [55]

    Optimal rates of convergence for nonparametric estima- tors,

    C. J. Stone, “Optimal rates of convergence for nonparametric estima- tors,”Ann. Statist., vol. 8, no. 6, pp. 1348–1360, 1980

  56. [56]

    Optimal global rates of convergence for nonparametric regres- sion,

    ——, “Optimal global rates of convergence for nonparametric regres- sion,”Ann. Statist., vol. 10, no. 4, pp. 1040–1053, 1982

  57. [57]

    Optimal transportation between unequal dimensions,

    R. J. McCann and B. Pass, “Optimal transportation between unequal dimensions,”Arch. Ration. Mech. Anal., vol. 238, no. 3, pp. 1475–1520, 2020

  58. [58]

    Regularity of optimal transportation between spaces with different dimensions,

    B. Pass, “Regularity of optimal transportation between spaces with different dimensions,”Math. Res. Lett., vol. 19, no. 2, pp. 291–307, 2012

  59. [59]

    The geometry of optimal transportation,

    W. Gangbo and R. J. McCann, “The geometry of optimal transportation,” Acta Math., vol. 177, no. 2, pp. 113–161, 1996

  60. [60]

    D ´ecomposition polaire et r ´earrangement monotone des champs de vecteurs,

    Y . Brenier, “D ´ecomposition polaire et r ´earrangement monotone des champs de vecteurs,”C. R. Acad. Sci. Paris S ´er. I Math., vol. 305, no. 19, pp. 805–808, 1987

  61. [61]

    Polar factorization and monotone rearrangement of vector-valued functions,

    ——, “Polar factorization and monotone rearrangement of vector-valued functions,”Comm. Pure Appl. Math., vol. 44, no. 4, pp. 375–417, 1991

  62. [62]

    Boundary regularity of maps with convex potentials,

    L. A. Caffarelli, “Boundary regularity of maps with convex potentials,” Comm. Pure Appl. Math., vol. 45, no. 9, pp. 1141–1151, 1992

  63. [63]

    The regularity of mappings with a convex potential,

    ——, “The regularity of mappings with a convex potential,”J. Amer. Math. Soc., vol. 5, no. 1, pp. 99–104, 1992

  64. [64]

    Boundary regularity of maps with convex potentials. II,

    ——, “Boundary regularity of maps with convex potentials. II,”Ann. of Math. (2), vol. 144, no. 3, pp. 453–496, 1996

  65. [65]

    Monotonicity properties of optimal transportation and the FKG and related inequalities,

    ——, “Monotonicity properties of optimal transportation and the FKG and related inequalities,”Comm. Math. Phys., vol. 214, no. 3, pp. 547– 563, 2000

  66. [66]

    On optimal transport maps between 1/d-concave densities,

    G. Carlier, A. Figalli, and F. Santambrogio, “On optimal transport maps between 1/d-concave densities,”arXiv preprint arXiv:2404.05456, 2024

  67. [67]

    Lipschitz changes of variables between perturbations of log-concave measures,

    M. Colombo, A. Figalli, and Y . Jhaveri, “Lipschitz changes of variables between perturbations of log-concave measures,”Ann. Sc. Norm. Super. Pisa Cl. Sci. (5), vol. 17, no. 4, pp. 1491–1519, 2017. 9

  68. [68]

    Adapting visual cate- gory models to new domains,

    K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual cate- gory models to new domains,” inProceedings of the 11th European Conference on Computer Vision. Springer, 2010, pp. 213–226

  69. [69]

    Cholletet al., “Keras,” https://keras.io, 2015

    F. Cholletet al., “Keras,” https://keras.io, 2015

  70. [70]

    A study of gaussian mixture models of color and texture features for image classification and seg- mentation,

    H. Permuter, J. Francos, and I. Jermyn, “A study of gaussian mixture models of color and texture features for image classification and seg- mentation,”Pattern recognition, vol. 39, no. 4, pp. 695–706, 2006

  71. [71]

    Screening examination of premature infants for retinopathy of prematurity,

    W. M. Fierson, “Screening examination of premature infants for retinopathy of prematurity,”Pediatrics, vol. 142, no. 6, 2018

  72. [72]

    Preterm- associated visual impairment and estimates of retinopathy of prematurity at regional and global levels for 2010,

    H. Blencowe, J. Lawn, T. Vazquez, A. Fielder, and C. Gilbert, “Preterm- associated visual impairment and estimates of retinopathy of prematurity at regional and global levels for 2010,”Pediatr. Res., vol. 74 Suppl 1, pp. 35–49, 12 2013

  73. [73]

    Retinopa- thy of prematurity in middle-income countries,

    C. Gilbert, J. Rahi, M. Eckstein, J. O’sullivan, and A. Foster, “Retinopa- thy of prematurity in middle-income countries,”The Lancet, vol. 350, no. 9070, p. 12–14, 1997

  74. [74]

    Revised Indications for the Treatment of Retinopathy of Prematurity: Results of the Early Treatment for Retinopathy of Prematurity Random- ized Trial,

    Early Treatment For Retinopathy Of Prematurity Cooperative Group, “Revised Indications for the Treatment of Retinopathy of Prematurity: Results of the Early Treatment for Retinopathy of Prematurity Random- ized Trial,”Archives of Ophthalmology, vol. 121, no. 12, pp. 1684–1694, 2003