Sample Complexity of Transfer Learning: An Optimal Transport Approach

Guan Wang; Haoyang Cao; Wenpin Tang; Xin Guo

arxiv: 2605.20545 · v1 · pith:H3HKQOFXnew · submitted 2026-05-19 · 📊 stat.ML · cs.LG

Sample Complexity of Transfer Learning: An Optimal Transport Approach

Haoyang Cao , Xin Guo , Wenpin Tang , Guan Wang This is my paper

Pith reviewed 2026-05-21 06:16 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords transfer learningsample complexityoptimal transporthigh-dimensional statisticssmoothnessdomain adaptationstatistical learning theory

0 comments

The pith

Transfer learning achieves sample complexity O(m^{-(α+1)/d}) for d>3 by transporting smoothness from source to target via optimal transport.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that transfer learning, viewed as moving a source distribution to a target one, yields a faster error decay than training from scratch once the dimension exceeds three. It derives the improved rate by showing how an optimal transport map can carry over the data distribution's smoothness level to the target task. A sympathetic reader would care because this rate depends on the data smoothness rather than the model's own smoothness, which helps explain why transfer helps most with complex, non-smooth target models such as deep networks. The work therefore supplies a concrete statistical reason for using pre-trained models when target samples are scarce.

Core claim

When the data dimension d is higher than 3, the sample complexity for transfer learning is O(m^{-(α+1)/d}), with α the smoothness of the data distribution. This follows from an optimal transport coupling that preserves α in the transferred measure. Direct learning without transfer, by contrast, is limited to the slower rate O(m^{-p/d}) set by the smoothness p of the target model itself.

What carries the argument

The optimal transport coupling between source and target distributions that preserves the smoothness parameter α when pushing the source measure forward.

If this is right

Transfer learning delivers its largest gain precisely when the target model family has low smoothness p.
The advantage materializes only once dimension d exceeds 3.
Numerical tests on image classification confirm that transfer learning raises accuracy in the low-sample regime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same transport argument could be applied to other forms of domain shift that preserve a smoothness index.
Checking whether the predicted exponent holds in dimensions much larger than 3 would test how far the improvement extends.
Replacing the Euclidean setting with a manifold would show whether the rate improvement survives on non-flat data.

Load-bearing premise

Source and target distributions must admit an optimal transport coupling that preserves the smoothness parameter α of the data distribution in the transferred measure.

What would settle it

An experiment in dimension d>3 that measures empirical excess risk decaying at rate m^{-p/d} rather than the faster m^{-(α+1)/d} would falsify the claimed improvement.

Figures

Figures reproduced from arXiv: 2605.20545 by Guan Wang, Haoyang Cao, Wenpin Tang, Xin Guo.

**Figure 2.** Figure 2: Transfer learning TABLE IV: Relative Improvement of Transfer Learning over Direct Learning for ROP Detection ROP Data % ∆AUROC ∆Accuracy ∆Precision ∆Sensitivity 100% 10.05% 18.75% 58.01% 19.71% 50% 14.09% 15.37% 45.83% 27.19% 10% 16.58% 14.65% 43.68% 26.03% 1% 16.08% 13.04% 46.67% 49.46% Tables II–IV are (again) consistent with the theoretical results in Sections III. As shown in Table II, transfer learnin… view at source ↗

read the original abstract

Transfer learning is an essential technique for many machine learning/AI models of complex structures such as large language models and generative AI. The essence of transfer learning is to leverage knowledge from resolved source tasks for a new target task, especially when the sample size $m$ of the training data for the latter is low. In this work, we rigorously analyze the potential benefit of transfer learning in terms of sample efficiency. Specifically, taking an optimal transport viewpoint of transfer learning, we find that when the data dimension $d$ is higher than $3$, the sample complexity for transfer learning is $O(m^{-(\alpha+1)/d})$, with $\alpha$ indicating the smoothness of the data distribution, as opposed to the $O(m^{-p/d})$ sample complexity for direct learning with $p$ indicating the smoothness of the optimal target model. Our finding theoretically supports a better sample efficiency for transfer learning, when the target task is optimizing over a family of not-so-smooth models (i.e., highly complex networks with the possible use of non-smooth activation functions). Using image classification as an example, we numerically demonstrate the sample efficiency for transfer learning, that is, in the data hungry regime, the model performance can be significantly improved by transfer learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OT transfer gives a concrete rate O(m^{-(α+1)/d}) for d>3 by swapping model smoothness for data smoothness, but the argument needs the transport map to keep that smoothness and the abstract gives no proof sketch or assumption list.

read the letter

The main point is that the authors model transfer learning as an optimal transport problem between source and target distributions and obtain the sample complexity O(m^{-(α+1)/d}) when dimension d exceeds 3, with α the smoothness of the data distribution. This is meant to improve on the usual direct-learning rate O(m^{-p/d}) that depends on the smoothness p of the target function, especially for complex non-smooth models such as deep networks with non-smooth activations. They also run a simple image-classification experiment showing gains in the low-sample regime. That is the core contribution and the numerical check is straightforward to follow. The OT framing is a reasonable way to formalize how information moves from source to target, and it produces a dimension-dependent justification that could matter for high-d regimes. The paper is clear about the practical motivation for large language models and generative AI. The soft spot is the step that assumes the optimal transport coupling carries the smoothness parameter α forward without loss. Standard OT results show that even for smooth compactly supported densities the map can lose Lipschitz continuity or develop singularities once d is at least 2, so the rate does not follow automatically. The abstract states that a rigorous analysis is given but supplies no proof outline, no explicit assumption list, and no discussion of when the regularity transfer holds. Without those details it is hard to judge how broad the claim is. This work is aimed at people who study sample complexity and generalization bounds in transfer settings. A reader who wants to see OT tools applied to these questions could get something useful from the derivation and the experiment. It has enough of a new angle and some empirical backing that it should receive a serious referee rather than a desk rejection. I would send it out for review and expect the referees to press on the conditions needed for the transport map to preserve smoothness.

Referee Report

2 major / 2 minor

Summary. The manuscript claims to provide a rigorous optimal-transport analysis of transfer learning sample complexity. It asserts that, for data dimension d > 3, transfer learning attains the rate O(m^{-(α+1)/d}) where α is the smoothness of the data distribution, improving on the direct-learning rate O(m^{-p/d}) with p the smoothness of the target model. The claim is illustrated numerically on image classification.

Significance. If the central derivation is completed with explicit regularity conditions, the result would supply a concrete theoretical justification for the observed sample-efficiency gains of transfer learning in high-dimensional regimes with complex, non-smooth models. This would be a useful contribution to the statistical understanding of transfer.

major comments (2)

[Main theorem / derivation of sample-complexity bound] The improved rate O(m^{-(α+1)/d}) is obtained by replacing the model smoothness p with the data smoothness α+1 via an optimal-transport coupling. Standard OT regularity theory (Brenier, Caffarelli) shows that even for smooth compactly supported densities the transport map need not be Lipschitz or Hölder when d ≥ 2. The manuscript must therefore state, in the hypotheses of the main theorem, the precise density assumptions that guarantee the push-forward measure inherits smoothness α; without this the rate does not follow from the stated OT setup.
[Introduction and statement of main result] The abstract and introduction present the result as holding for d > 3, yet the derivation appears to rely on an unverified transfer of Hölder regularity through the coupling. A concrete counter-example or reference to the precise condition (e.g., uniform ellipticity of the densities) under which the OT map remains C^{1,β} with β tied to α should be supplied; otherwise the central claim rests on an assumption that is not automatic.

minor comments (2)

[Numerical experiments] The numerical section would benefit from explicit reporting of the source and target model architectures, the precise transfer procedure (e.g., which layers are frozen), and error bars over multiple random seeds.
[Notation and preliminaries] Notation for the smoothness parameters α and p should be introduced once and used consistently; currently the distinction between data smoothness and model smoothness is clear in the abstract but could be reinforced with a short table of symbols.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for emphasizing the need to make the regularity assumptions on the optimal transport map explicit. We address the two major comments below and will revise the manuscript to strengthen the presentation of the hypotheses.

read point-by-point responses

Referee: [Main theorem / derivation of sample-complexity bound] The improved rate O(m^{-(α+1)/d}) is obtained by replacing the model smoothness p with the data smoothness α+1 via an optimal-transport coupling. Standard OT regularity theory (Brenier, Caffarelli) shows that even for smooth compactly supported densities the transport map need not be Lipschitz or Hölder when d ≥ 2. The manuscript must therefore state, in the hypotheses of the main theorem, the precise density assumptions that guarantee the push-forward measure inherits smoothness α; without this the rate does not follow from the stated OT setup.

Authors: We agree that the regularity of the transport map must be guaranteed by explicit assumptions. The derivation in the manuscript relies on the source and target densities being positive, bounded away from zero and infinity, and C^α on compact convex supports. Under these conditions, Caffarelli's regularity theory ensures the Brenier map is C^{1,β} with β tied to α, allowing the push-forward to inherit the required smoothness for the rate O(m^{-(α+1)/d}). In the revision we will add these density assumptions explicitly as Assumption 2.1 in the hypotheses of the main theorem (Theorem 3.1) and include a brief discussion in Section 2.2 with references to Brenier (1991) and Caffarelli (1992). revision: yes
Referee: [Introduction and statement of main result] The abstract and introduction present the result as holding for d > 3, yet the derivation appears to rely on an unverified transfer of Hölder regularity through the coupling. A concrete counter-example or reference to the precise condition (e.g., uniform ellipticity of the densities) under which the OT map remains C^{1,β} with β tied to α should be supplied; otherwise the central claim rests on an assumption that is not automatic.

Authors: The restriction d > 3 is motivated by the regime where the improved exponent (α+1)/d meaningfully exceeds the direct-learning rate p/d for typical p < α+1 in high dimensions; the proof uses Sobolev-type embeddings that are favorable in this range. We will add a reference to the precise conditions (uniform ellipticity and C^α regularity of the densities) together with the statement that the OT map is then C^{1,β} by Caffarelli's theorem. This will appear as a remark following the main theorem and in the introduction. We do not supply a counter-example because the claim is stated under these standard OT assumptions; if the referee can point to a specific density pair violating the rate, we would be grateful for the example. revision: partial

Circularity Check

0 steps flagged

No significant circularity: derivation from OT properties remains independent of target result.

full rationale

The paper derives the improved rate O(m^{-(α+1)/d}) by assuming an optimal transport coupling that transfers smoothness α from source to target measure. This assumption is stated explicitly as a hypothesis on the distributions rather than being defined in terms of the final rate or fitted from the target data. No equation reduces the claimed sample complexity to a self-referential fit, and no load-bearing step collapses to a prior self-citation that itself assumes the result. The bound is obtained from standard OT regularity arguments under the stated coupling condition, keeping the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard statistical-learning assumptions about Hölder or Sobolev smoothness of distributions together with the existence of a sufficiently regular optimal transport map between source and target measures.

axioms (1)

domain assumption Source and target probability measures admit an optimal transport coupling whose regularity is controlled by the smoothness parameter α of the data distribution.
Invoked to obtain the improved rate when d > 3.

pith-pipeline@v0.9.0 · 5753 in / 1280 out tokens · 37102 ms · 2026-05-21T06:16:04.353370+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking contradicts

?

contradicts
CONTRADICTS: the theorem conflicts with this paper passage, or marks a claim that would need revision before publication.

when the data dimension d is higher than 3, the sample complexity for transfer learning is O(m^{-(α+1)/d}) ... densities of P_XT and P_XS are of class C^α

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · 3 internal anchors

[1]

Automatic ICD-9 coding via deep transfer learning,

M. Zeng, M. Li, Z. Fei, Y . Yu, Y . Pan, and J. Wang, “Automatic ICD-9 coding via deep transfer learning,”Neurocomputing, vol. 324, pp. 43–50, 2019

work page 2019
[2]

Transfer learning for retinal vascular disease detection: A pilot study with diabetic retinopathy and retinopathy of prematurity,

G. Wang, Y . Kikuchi, J. Yi, Q. Zou, R. Zhou, and X. Guo, “Transfer learning for retinal vascular disease detection: A pilot study with diabetic retinopathy and retinopathy of prematurity,”arXiv preprint arXiv:2201.01250, 2022

work page arXiv 2022
[3]

Transfer learning for medical image classification: A literature review,

H. E. Kim, A. Cosa-Linan, N. Santhanam, M. Jannesari, M. E. Maros, and T. Ganslandt, “Transfer learning for medical image classification: A literature review,”BMC Medical Imaging, vol. 22, no. 1, p. 69, 2022

work page 2022
[4]

Transfer learning in natural language processing,

S. Ruder, M. E. Peters, S. Swayamdipta, and T. Wolf, “Transfer learning in natural language processing,” inNACCL, 2019, pp. 15–18

work page 2019
[5]

BERT: Pre- training of deep bidirectional transformers for language understanding,

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre- training of deep bidirectional transformers for language understanding,” inNACCL, vol. 1, 2019, pp. 4171–4186

work page 2019
[6]

Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks,

Y .-L. Sung, J. Cho, and M. Bansal, “Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks,” inCVPR, 2022, pp. 5227–5237

work page 2022
[7]

GPT-4 Technical Report

OpenAI, “GPT-4 Technical Report,”arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[8]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

G. Comanici, E. Bieber, M. Schaekermann, I. Pasupat, N. Sachdeva, I. Dhillon, M. Blistein, O. Ram, D. Zhang, E. Rosenet al., “Gem- ini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities,”arXiv preprint arXiv:2507.06261, 2025. 8

work page internal anchor Pith review Pith/arXiv arXiv 2025
[9]

The Llama 3 Herd of Models

A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughanet al., “The llama 3 herd of models,”arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[10]

Ibex: Information- bottleneck-explored coarse-to-fine molecular generation under limited data,

D. Xu, J. X. Yao, S. Song, Z. Zhu, and J. Ji, “Ibex: Information- bottleneck-explored coarse-to-fine molecular generation under limited data,”arXiv preprint arXiv:2508.10775, 2025

work page arXiv 2025
[11]

Advancing cancer research with synthetic data generation in low-data scenarios,

P. A. Apellaniz, B. A. Galende, and A. Jim, “Advancing cancer research with synthetic data generation in low-data scenarios,”IEEE Journal of Biomedical Informatics, 2025

work page 2025
[12]

On the sample complexity of pac- learning using random and chosen examples

B. Eisenberg and R. L. Rivest, “On the sample complexity of pac- learning using random and chosen examples.” inCOLT, vol. 90, 1990, pp. 154–162

work page 1990
[13]

The optimal sample complexity of pac learning,

S. Hanneke, “The optimal sample complexity of pac learning,”J. Mach. Learn. Res., vol. 17, no. 38, pp. 1–15, 2016

work page 2016
[14]

Bounds on the sample complexity of bayesian learning using information theory and the vc dimension,

D. Haussler, M. Kearns, and R. E. Schapire, “Bounds on the sample complexity of bayesian learning using information theory and the vc dimension,”Mach. Learn., vol. 14, no. 1, pp. 83–113, 1994

work page 1994
[15]

On the hardness of domain adaptation and the utility of unlabeled target samples,

S. Ben-David and R. Urner, “On the hardness of domain adaptation and the utility of unlabeled target samples,” inALT, 2012, pp. 139–153

work page 2012
[16]

On the value of target data in transfer learning,

S. Hanneke and S. Kpotufe, “On the value of target data in transfer learning,” inNeurips, vol. 32, 2019, pp. 9867–9877

work page 2019
[17]

Transfer learning for nonparametric classification: Minimax rate and adaptive classifier,

T. T. Cai and H. Wei, “Transfer learning for nonparametric classification: Minimax rate and adaptive classifier,”Ann. Stat., vol. 49, no. 1, pp. 100– 128, 2021

work page 2021
[18]

Adaptive transfer learning,

H. W. J. Reeve, T. I. Cannings, and R. J. Samworth, “Adaptive transfer learning,”Ann. Stat., vol. 49, no. 6, pp. 3618–3649, 2021

work page 2021
[19]

A no-free-lunch theorem for multitask learning,

S. Hanneke and S. Kpotufe, “A no-free-lunch theorem for multitask learning,”Ann. Stat., vol. 50, no. 6, pp. 3119–3143, 2022

work page 2022
[20]

Limits of model selection under transfer learning,

S. Hanneke, S. Kpotufe, and Y . Mahdaviyeh, “Limits of model selection under transfer learning,” inCOLT, vol. 36, 2023, pp. 5781–5812

work page 2023
[21]

Robust transfer learning with unreliable source data,

J. Fan, C. Gao, and J. M. Klusowski, “Robust transfer learning with unreliable source data,”Ann. Stat., vol. 53, no. 4, pp. 1728–1752, 2025

work page 2025
[22]

On the theory of transfer learning: The importance of task diversity,

N. Tripuraneni, M. Jordan, and C. Jin, “On the theory of transfer learning: The importance of task diversity,” inNeurips, vol. 33, 2020, pp. 7852–7862

work page 2020
[23]

Few-shot learning via learning the representation, provably,

S. S. Du, W. Hu, S. M. Kakade, J. D. Lee, and Q. Lei, “Few-shot learning via learning the representation, provably,” inICLR, 2021

work page 2021
[24]

Provable meta-learning of linear representations,

N. Tripuraneni, C. Jin, and M. I. Jordan, “Provable meta-learning of linear representations,” inICML, vol. 38, 2021, pp. 10 434–10 443

work page 2021
[25]

Adaptive and robust multi-task learning,

Y . Duan and K. Wang, “Adaptive and robust multi-task learning,”Ann. Stat., vol. 51, no. 5, pp. 2015–2039, 2023

work page 2015
[26]

Learning from similar linear represen- tations: Adaptivity, minimaxity, and robustness,

Y . Tian, Y . Gu, and Y . Feng, “Learning from similar linear represen- tations: Adaptivity, minimaxity, and robustness,”J. Mach. Learn. Res., vol. 26, no. 187, pp. 1–125, 2025

work page 2025
[27]

On the sample complexity of representation learning in multi-task bandits with global and local structure,

A. Russo and A. Proutiere, “On the sample complexity of representation learning in multi-task bandits with global and local structure,” inAAAI, vol. 37, 2023, pp. 9658–9667

work page 2023
[28]

Sample complexity of multi-task reinforcement learning,

E. Brunskill and L. Li, “Sample complexity of multi-task reinforcement learning,” inUAI, vol. 29, 2013, p. 122–131

work page 2013
[29]

Reducing sample complexity in reinforcement learning by transferring transition and reward proba- bilities,

K. Oguni, K. Narisawa, and A. Shinohara, “Reducing sample complexity in reinforcement learning by transferring transition and reward proba- bilities,” inICAART, vol. 2, 2014, pp. 632–638

work page 2014
[30]

Prov- able benefits of representational transfer in reinforcement learning,

A. Agarwal, Y . Song, W. Sun, K. Wang, M. Wang, and X. Zhang, “Prov- able benefits of representational transfer in reinforcement learning,” in COLT, vol. 36, 2023, pp. 2114–2187

work page 2023
[31]

Transfer learning for contextual multi- armed bandits,

C. Cai, T. T. Cai, and H. Li, “Transfer learning for contextual multi- armed bandits,”Ann. Stat., vol. 52, no. 1, pp. 207–232, 2024

work page 2024
[32]

Learning to undo: Transfer reinforcement learning under linear state space transformations,

M. Mahajan, A. Pacchiano, and X. Zhang, “Learning to undo: Transfer reinforcement learning under linear state space transformations,” in OpenReview, 2025. [Online]. Available: https://openreview.net/forum? id=jI8a3s5xUz

work page 2025
[33]

Transfer learning for high-dimensional linear regression: Prediction, estimation and minimax optimality,

S. Li, T. T. Cai, and H. Li, “Transfer learning for high-dimensional linear regression: Prediction, estimation and minimax optimality,”J. R. Stat. Soc. Ser. B Stat. Methodol., vol. 84, no. 1, pp. 149–173, 2022

work page 2022
[34]

Transfer learning under high-dimensional gener- alized linear models,

Y . Tian and Y . Feng, “Transfer learning under high-dimensional gener- alized linear models,”J. Am. Stat. Assoc., vol. 118, no. 544, pp. 2684– 2697, 2023

work page 2023
[35]

Unified transfer learning in high-dimensional linear regres- sion,

S. S. Liu, “Unified transfer learning in high-dimensional linear regres- sion,” inAISTATS, vol. 27, 2024, pp. 1036–1044

work page 2024
[36]

TransFusion: Covariate-shift robust transfer learning for high-dimensional regression,

Z. He, Y . Sun, and R. Li, “TransFusion: Covariate-shift robust transfer learning for high-dimensional regression,” inAISTATS, vol. 27, 2024, pp. 703–711

work page 2024
[37]

On the provable advantage of unsupervised pretraining,

J. Ge, S. Tang, J. Fan, and C. Jin, “On the provable advantage of unsupervised pretraining,” inICLR, 2024

work page 2024
[38]

Provable benefits of unsupervised pre-training and transfer learning via single-index models,

T. Jones-McCormick, A. Jagannath, and S. Sen, “Provable benefits of unsupervised pre-training and transfer learning via single-index models,” inICLM, vol. 42, 2025, pp. 28 350–28 376

work page 2025
[39]

Features are fate: A theory of transfer learning in high-dimensional regression,

J. Tahir, S. Ganguli, and G. M. Rotskoff, “Features are fate: A theory of transfer learning in high-dimensional regression,” inICML, vol. 42, 2025, pp. 58 142–58 168

work page 2025
[40]

Provable sample-efficient transfer learning conditional diffusion models via representation learn- ing,

Z. Cheng, T. Xie, S. Zhang, and C. Zhang, “Provable sample-efficient transfer learning conditional diffusion models via representation learn- ing,” inNeurips, vol. 38, 2025

work page 2025
[41]

On the sample complexity of entropic optimal transport,

P. Rigollet and A. J. Stromme, “On the sample complexity of entropic optimal transport,”Ann. Stat., vol. 53, no. 1, pp. 61 – 90, 2025

work page 2025
[42]

A unified analysis of generalization and sample complexity for semi-supervised domain adaptation,

E. Vural and H. Karaca, “A unified analysis of generalization and sample complexity for semi-supervised domain adaptation,”arXiv preprint arXiv:2507.22632, 2025

work page arXiv 2025
[43]

Partial domain adaptation via importance sampling-based shift correction,

C. X. Ren, Y . W. Luo, C. J. Guo, and X. L. Xu, “Partial domain adaptation via importance sampling-based shift correction,”IEEE Trans. Neural Netw., 2025

work page 2025
[44]

Villani,Optimal transport, ser

C. Villani,Optimal transport, ser. Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 2009, vol. 338, old and new

work page 2009
[45]

Sinkhorn distances: Lightspeed computation of optimal transport,

M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport,” inNIPS, vol. 26, 2013, p. 2292–2300

work page 2013
[46]

Sliced and Radon Wasserstein barycenters of measures,

N. Bonneel, J. Rabin, G. Peyr ´e, and H. Pfister, “Sliced and Radon Wasserstein barycenters of measures,”J. Math. Imaging Vision, vol. 51, pp. 22–45, 2015

work page 2015
[47]

Diffusion schr¨odinger bridge with applications to score-based generative model- ing,

V . De Bortoli, J. Thornton, J. Heng, and A. Doucet, “Diffusion schr¨odinger bridge with applications to score-based generative model- ing,” inNeurips, vol. 34, 2021, pp. 17 695–17 709

work page 2021
[48]

Optimal transport for domain adaptation,

N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy, “Optimal transport for domain adaptation,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 9, pp. 1853–1865, 2016

work page 2016
[49]

An optimal transport approach to estimating causal effects via nonlinear difference-in-differences,

W. Torous, F. Gunsilius, and P. Rigollet, “An optimal transport approach to estimating causal effects via nonlinear difference-in-differences,” arXiv preprint arXiv:2108.05858, 2021

work page arXiv 2021
[50]

Plugin estimation of smooth optimal transport maps,

T. Manole, S. Balakrishnan, J. Niles-Weed, and L. Wasserman, “Plugin estimation of smooth optimal transport maps,”Ann. Statist., vol. 52, no. 3, pp. 966–998, 2024

work page 2024
[51]

Chewi, J

S. Chewi, J. Niles-Weed, and P. Rigollet, “Statistical optimal transport,” arXiv preprint arXiv:2407.18163, 2024

work page arXiv 2024
[52]

Risk of transfer learning and its applications in finance,

H. Cao, H. Gu, X. Guo, and M. Rosenbaum, “Risk of transfer learning and its applications in finance,”arXiv preprint arXiv:2311.03283, 2023

work page arXiv 2023
[53]

On the optimality of conditional expectation as a bregman predictor,

A. Banerjee, X. Guo, and H. Wang, “On the optimality of conditional expectation as a bregman predictor,”IEEE Trans. Inf. Theory, vol. 51, no. 7, pp. 2664–2669, 2005

work page 2005
[54]

Villani,Topics in optimal transportation, ser

C. Villani,Topics in optimal transportation, ser. Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2003, vol. 58

work page 2003
[55]

Optimal rates of convergence for nonparametric estima- tors,

C. J. Stone, “Optimal rates of convergence for nonparametric estima- tors,”Ann. Statist., vol. 8, no. 6, pp. 1348–1360, 1980

work page 1980
[56]

Optimal global rates of convergence for nonparametric regres- sion,

——, “Optimal global rates of convergence for nonparametric regres- sion,”Ann. Statist., vol. 10, no. 4, pp. 1040–1053, 1982

work page 1982
[57]

Optimal transportation between unequal dimensions,

R. J. McCann and B. Pass, “Optimal transportation between unequal dimensions,”Arch. Ration. Mech. Anal., vol. 238, no. 3, pp. 1475–1520, 2020

work page 2020
[58]

Regularity of optimal transportation between spaces with different dimensions,

B. Pass, “Regularity of optimal transportation between spaces with different dimensions,”Math. Res. Lett., vol. 19, no. 2, pp. 291–307, 2012

work page 2012
[59]

The geometry of optimal transportation,

W. Gangbo and R. J. McCann, “The geometry of optimal transportation,” Acta Math., vol. 177, no. 2, pp. 113–161, 1996

work page 1996
[60]

D ´ecomposition polaire et r ´earrangement monotone des champs de vecteurs,

Y . Brenier, “D ´ecomposition polaire et r ´earrangement monotone des champs de vecteurs,”C. R. Acad. Sci. Paris S ´er. I Math., vol. 305, no. 19, pp. 805–808, 1987

work page 1987
[61]

Polar factorization and monotone rearrangement of vector-valued functions,

——, “Polar factorization and monotone rearrangement of vector-valued functions,”Comm. Pure Appl. Math., vol. 44, no. 4, pp. 375–417, 1991

work page 1991
[62]

Boundary regularity of maps with convex potentials,

L. A. Caffarelli, “Boundary regularity of maps with convex potentials,” Comm. Pure Appl. Math., vol. 45, no. 9, pp. 1141–1151, 1992

work page 1992
[63]

The regularity of mappings with a convex potential,

——, “The regularity of mappings with a convex potential,”J. Amer. Math. Soc., vol. 5, no. 1, pp. 99–104, 1992

work page 1992
[64]

Boundary regularity of maps with convex potentials. II,

——, “Boundary regularity of maps with convex potentials. II,”Ann. of Math. (2), vol. 144, no. 3, pp. 453–496, 1996

work page 1996
[65]

Monotonicity properties of optimal transportation and the FKG and related inequalities,

——, “Monotonicity properties of optimal transportation and the FKG and related inequalities,”Comm. Math. Phys., vol. 214, no. 3, pp. 547– 563, 2000

work page 2000
[66]

On optimal transport maps between 1/d-concave densities,

G. Carlier, A. Figalli, and F. Santambrogio, “On optimal transport maps between 1/d-concave densities,”arXiv preprint arXiv:2404.05456, 2024

work page arXiv 2024
[67]

Lipschitz changes of variables between perturbations of log-concave measures,

M. Colombo, A. Figalli, and Y . Jhaveri, “Lipschitz changes of variables between perturbations of log-concave measures,”Ann. Sc. Norm. Super. Pisa Cl. Sci. (5), vol. 17, no. 4, pp. 1491–1519, 2017. 9

work page 2017
[68]

Adapting visual cate- gory models to new domains,

K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual cate- gory models to new domains,” inProceedings of the 11th European Conference on Computer Vision. Springer, 2010, pp. 213–226

work page 2010
[69]

Cholletet al., “Keras,” https://keras.io, 2015

F. Cholletet al., “Keras,” https://keras.io, 2015

work page 2015
[70]

A study of gaussian mixture models of color and texture features for image classification and seg- mentation,

H. Permuter, J. Francos, and I. Jermyn, “A study of gaussian mixture models of color and texture features for image classification and seg- mentation,”Pattern recognition, vol. 39, no. 4, pp. 695–706, 2006

work page 2006
[71]

Screening examination of premature infants for retinopathy of prematurity,

W. M. Fierson, “Screening examination of premature infants for retinopathy of prematurity,”Pediatrics, vol. 142, no. 6, 2018

work page 2018
[72]

Preterm- associated visual impairment and estimates of retinopathy of prematurity at regional and global levels for 2010,

H. Blencowe, J. Lawn, T. Vazquez, A. Fielder, and C. Gilbert, “Preterm- associated visual impairment and estimates of retinopathy of prematurity at regional and global levels for 2010,”Pediatr. Res., vol. 74 Suppl 1, pp. 35–49, 12 2013

work page 2010
[73]

Retinopa- thy of prematurity in middle-income countries,

C. Gilbert, J. Rahi, M. Eckstein, J. O’sullivan, and A. Foster, “Retinopa- thy of prematurity in middle-income countries,”The Lancet, vol. 350, no. 9070, p. 12–14, 1997

work page 1997
[74]

Revised Indications for the Treatment of Retinopathy of Prematurity: Results of the Early Treatment for Retinopathy of Prematurity Random- ized Trial,

Early Treatment For Retinopathy Of Prematurity Cooperative Group, “Revised Indications for the Treatment of Retinopathy of Prematurity: Results of the Early Treatment for Retinopathy of Prematurity Random- ized Trial,”Archives of Ophthalmology, vol. 121, no. 12, pp. 1684–1694, 2003

work page 2003

[1] [1]

Automatic ICD-9 coding via deep transfer learning,

M. Zeng, M. Li, Z. Fei, Y . Yu, Y . Pan, and J. Wang, “Automatic ICD-9 coding via deep transfer learning,”Neurocomputing, vol. 324, pp. 43–50, 2019

work page 2019

[2] [2]

Transfer learning for retinal vascular disease detection: A pilot study with diabetic retinopathy and retinopathy of prematurity,

G. Wang, Y . Kikuchi, J. Yi, Q. Zou, R. Zhou, and X. Guo, “Transfer learning for retinal vascular disease detection: A pilot study with diabetic retinopathy and retinopathy of prematurity,”arXiv preprint arXiv:2201.01250, 2022

work page arXiv 2022

[3] [3]

Transfer learning for medical image classification: A literature review,

H. E. Kim, A. Cosa-Linan, N. Santhanam, M. Jannesari, M. E. Maros, and T. Ganslandt, “Transfer learning for medical image classification: A literature review,”BMC Medical Imaging, vol. 22, no. 1, p. 69, 2022

work page 2022

[4] [4]

Transfer learning in natural language processing,

S. Ruder, M. E. Peters, S. Swayamdipta, and T. Wolf, “Transfer learning in natural language processing,” inNACCL, 2019, pp. 15–18

work page 2019

[5] [5]

BERT: Pre- training of deep bidirectional transformers for language understanding,

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre- training of deep bidirectional transformers for language understanding,” inNACCL, vol. 1, 2019, pp. 4171–4186

work page 2019

[6] [6]

Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks,

Y .-L. Sung, J. Cho, and M. Bansal, “Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks,” inCVPR, 2022, pp. 5227–5237

work page 2022

[7] [7]

GPT-4 Technical Report

OpenAI, “GPT-4 Technical Report,”arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[8] [8]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

G. Comanici, E. Bieber, M. Schaekermann, I. Pasupat, N. Sachdeva, I. Dhillon, M. Blistein, O. Ram, D. Zhang, E. Rosenet al., “Gem- ini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities,”arXiv preprint arXiv:2507.06261, 2025. 8

work page internal anchor Pith review Pith/arXiv arXiv 2025

[9] [9]

The Llama 3 Herd of Models

A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughanet al., “The llama 3 herd of models,”arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[10] [10]

Ibex: Information- bottleneck-explored coarse-to-fine molecular generation under limited data,

D. Xu, J. X. Yao, S. Song, Z. Zhu, and J. Ji, “Ibex: Information- bottleneck-explored coarse-to-fine molecular generation under limited data,”arXiv preprint arXiv:2508.10775, 2025

work page arXiv 2025

[11] [11]

Advancing cancer research with synthetic data generation in low-data scenarios,

P. A. Apellaniz, B. A. Galende, and A. Jim, “Advancing cancer research with synthetic data generation in low-data scenarios,”IEEE Journal of Biomedical Informatics, 2025

work page 2025

[12] [12]

On the sample complexity of pac- learning using random and chosen examples

B. Eisenberg and R. L. Rivest, “On the sample complexity of pac- learning using random and chosen examples.” inCOLT, vol. 90, 1990, pp. 154–162

work page 1990

[13] [13]

The optimal sample complexity of pac learning,

S. Hanneke, “The optimal sample complexity of pac learning,”J. Mach. Learn. Res., vol. 17, no. 38, pp. 1–15, 2016

work page 2016

[14] [14]

Bounds on the sample complexity of bayesian learning using information theory and the vc dimension,

D. Haussler, M. Kearns, and R. E. Schapire, “Bounds on the sample complexity of bayesian learning using information theory and the vc dimension,”Mach. Learn., vol. 14, no. 1, pp. 83–113, 1994

work page 1994

[15] [15]

On the hardness of domain adaptation and the utility of unlabeled target samples,

S. Ben-David and R. Urner, “On the hardness of domain adaptation and the utility of unlabeled target samples,” inALT, 2012, pp. 139–153

work page 2012

[16] [16]

On the value of target data in transfer learning,

S. Hanneke and S. Kpotufe, “On the value of target data in transfer learning,” inNeurips, vol. 32, 2019, pp. 9867–9877

work page 2019

[17] [17]

Transfer learning for nonparametric classification: Minimax rate and adaptive classifier,

T. T. Cai and H. Wei, “Transfer learning for nonparametric classification: Minimax rate and adaptive classifier,”Ann. Stat., vol. 49, no. 1, pp. 100– 128, 2021

work page 2021

[18] [18]

Adaptive transfer learning,

H. W. J. Reeve, T. I. Cannings, and R. J. Samworth, “Adaptive transfer learning,”Ann. Stat., vol. 49, no. 6, pp. 3618–3649, 2021

work page 2021

[19] [19]

A no-free-lunch theorem for multitask learning,

S. Hanneke and S. Kpotufe, “A no-free-lunch theorem for multitask learning,”Ann. Stat., vol. 50, no. 6, pp. 3119–3143, 2022

work page 2022

[20] [20]

Limits of model selection under transfer learning,

S. Hanneke, S. Kpotufe, and Y . Mahdaviyeh, “Limits of model selection under transfer learning,” inCOLT, vol. 36, 2023, pp. 5781–5812

work page 2023

[21] [21]

Robust transfer learning with unreliable source data,

J. Fan, C. Gao, and J. M. Klusowski, “Robust transfer learning with unreliable source data,”Ann. Stat., vol. 53, no. 4, pp. 1728–1752, 2025

work page 2025

[22] [22]

On the theory of transfer learning: The importance of task diversity,

N. Tripuraneni, M. Jordan, and C. Jin, “On the theory of transfer learning: The importance of task diversity,” inNeurips, vol. 33, 2020, pp. 7852–7862

work page 2020

[23] [23]

Few-shot learning via learning the representation, provably,

S. S. Du, W. Hu, S. M. Kakade, J. D. Lee, and Q. Lei, “Few-shot learning via learning the representation, provably,” inICLR, 2021

work page 2021

[24] [24]

Provable meta-learning of linear representations,

N. Tripuraneni, C. Jin, and M. I. Jordan, “Provable meta-learning of linear representations,” inICML, vol. 38, 2021, pp. 10 434–10 443

work page 2021

[25] [25]

Adaptive and robust multi-task learning,

Y . Duan and K. Wang, “Adaptive and robust multi-task learning,”Ann. Stat., vol. 51, no. 5, pp. 2015–2039, 2023

work page 2015

[26] [26]

Learning from similar linear represen- tations: Adaptivity, minimaxity, and robustness,

Y . Tian, Y . Gu, and Y . Feng, “Learning from similar linear represen- tations: Adaptivity, minimaxity, and robustness,”J. Mach. Learn. Res., vol. 26, no. 187, pp. 1–125, 2025

work page 2025

[27] [27]

On the sample complexity of representation learning in multi-task bandits with global and local structure,

A. Russo and A. Proutiere, “On the sample complexity of representation learning in multi-task bandits with global and local structure,” inAAAI, vol. 37, 2023, pp. 9658–9667

work page 2023

[28] [28]

Sample complexity of multi-task reinforcement learning,

E. Brunskill and L. Li, “Sample complexity of multi-task reinforcement learning,” inUAI, vol. 29, 2013, p. 122–131

work page 2013

[29] [29]

Reducing sample complexity in reinforcement learning by transferring transition and reward proba- bilities,

K. Oguni, K. Narisawa, and A. Shinohara, “Reducing sample complexity in reinforcement learning by transferring transition and reward proba- bilities,” inICAART, vol. 2, 2014, pp. 632–638

work page 2014

[30] [30]

Prov- able benefits of representational transfer in reinforcement learning,

A. Agarwal, Y . Song, W. Sun, K. Wang, M. Wang, and X. Zhang, “Prov- able benefits of representational transfer in reinforcement learning,” in COLT, vol. 36, 2023, pp. 2114–2187

work page 2023

[31] [31]

Transfer learning for contextual multi- armed bandits,

C. Cai, T. T. Cai, and H. Li, “Transfer learning for contextual multi- armed bandits,”Ann. Stat., vol. 52, no. 1, pp. 207–232, 2024

work page 2024

[32] [32]

Learning to undo: Transfer reinforcement learning under linear state space transformations,

M. Mahajan, A. Pacchiano, and X. Zhang, “Learning to undo: Transfer reinforcement learning under linear state space transformations,” in OpenReview, 2025. [Online]. Available: https://openreview.net/forum? id=jI8a3s5xUz

work page 2025

[33] [33]

Transfer learning for high-dimensional linear regression: Prediction, estimation and minimax optimality,

S. Li, T. T. Cai, and H. Li, “Transfer learning for high-dimensional linear regression: Prediction, estimation and minimax optimality,”J. R. Stat. Soc. Ser. B Stat. Methodol., vol. 84, no. 1, pp. 149–173, 2022

work page 2022

[34] [34]

Transfer learning under high-dimensional gener- alized linear models,

Y . Tian and Y . Feng, “Transfer learning under high-dimensional gener- alized linear models,”J. Am. Stat. Assoc., vol. 118, no. 544, pp. 2684– 2697, 2023

work page 2023

[35] [35]

Unified transfer learning in high-dimensional linear regres- sion,

S. S. Liu, “Unified transfer learning in high-dimensional linear regres- sion,” inAISTATS, vol. 27, 2024, pp. 1036–1044

work page 2024

[36] [36]

TransFusion: Covariate-shift robust transfer learning for high-dimensional regression,

Z. He, Y . Sun, and R. Li, “TransFusion: Covariate-shift robust transfer learning for high-dimensional regression,” inAISTATS, vol. 27, 2024, pp. 703–711

work page 2024

[37] [37]

On the provable advantage of unsupervised pretraining,

J. Ge, S. Tang, J. Fan, and C. Jin, “On the provable advantage of unsupervised pretraining,” inICLR, 2024

work page 2024

[38] [38]

Provable benefits of unsupervised pre-training and transfer learning via single-index models,

T. Jones-McCormick, A. Jagannath, and S. Sen, “Provable benefits of unsupervised pre-training and transfer learning via single-index models,” inICLM, vol. 42, 2025, pp. 28 350–28 376

work page 2025

[39] [39]

Features are fate: A theory of transfer learning in high-dimensional regression,

J. Tahir, S. Ganguli, and G. M. Rotskoff, “Features are fate: A theory of transfer learning in high-dimensional regression,” inICML, vol. 42, 2025, pp. 58 142–58 168

work page 2025

[40] [40]

Provable sample-efficient transfer learning conditional diffusion models via representation learn- ing,

Z. Cheng, T. Xie, S. Zhang, and C. Zhang, “Provable sample-efficient transfer learning conditional diffusion models via representation learn- ing,” inNeurips, vol. 38, 2025

work page 2025

[41] [41]

On the sample complexity of entropic optimal transport,

P. Rigollet and A. J. Stromme, “On the sample complexity of entropic optimal transport,”Ann. Stat., vol. 53, no. 1, pp. 61 – 90, 2025

work page 2025

[42] [42]

A unified analysis of generalization and sample complexity for semi-supervised domain adaptation,

E. Vural and H. Karaca, “A unified analysis of generalization and sample complexity for semi-supervised domain adaptation,”arXiv preprint arXiv:2507.22632, 2025

work page arXiv 2025

[43] [43]

Partial domain adaptation via importance sampling-based shift correction,

C. X. Ren, Y . W. Luo, C. J. Guo, and X. L. Xu, “Partial domain adaptation via importance sampling-based shift correction,”IEEE Trans. Neural Netw., 2025

work page 2025

[44] [44]

Villani,Optimal transport, ser

C. Villani,Optimal transport, ser. Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 2009, vol. 338, old and new

work page 2009

[45] [45]

Sinkhorn distances: Lightspeed computation of optimal transport,

M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport,” inNIPS, vol. 26, 2013, p. 2292–2300

work page 2013

[46] [46]

Sliced and Radon Wasserstein barycenters of measures,

N. Bonneel, J. Rabin, G. Peyr ´e, and H. Pfister, “Sliced and Radon Wasserstein barycenters of measures,”J. Math. Imaging Vision, vol. 51, pp. 22–45, 2015

work page 2015

[47] [47]

Diffusion schr¨odinger bridge with applications to score-based generative model- ing,

V . De Bortoli, J. Thornton, J. Heng, and A. Doucet, “Diffusion schr¨odinger bridge with applications to score-based generative model- ing,” inNeurips, vol. 34, 2021, pp. 17 695–17 709

work page 2021

[48] [48]

Optimal transport for domain adaptation,

N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy, “Optimal transport for domain adaptation,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 9, pp. 1853–1865, 2016

work page 2016

[49] [49]

An optimal transport approach to estimating causal effects via nonlinear difference-in-differences,

W. Torous, F. Gunsilius, and P. Rigollet, “An optimal transport approach to estimating causal effects via nonlinear difference-in-differences,” arXiv preprint arXiv:2108.05858, 2021

work page arXiv 2021

[50] [50]

Plugin estimation of smooth optimal transport maps,

T. Manole, S. Balakrishnan, J. Niles-Weed, and L. Wasserman, “Plugin estimation of smooth optimal transport maps,”Ann. Statist., vol. 52, no. 3, pp. 966–998, 2024

work page 2024

[51] [51]

Chewi, J

S. Chewi, J. Niles-Weed, and P. Rigollet, “Statistical optimal transport,” arXiv preprint arXiv:2407.18163, 2024

work page arXiv 2024

[52] [52]

Risk of transfer learning and its applications in finance,

H. Cao, H. Gu, X. Guo, and M. Rosenbaum, “Risk of transfer learning and its applications in finance,”arXiv preprint arXiv:2311.03283, 2023

work page arXiv 2023

[53] [53]

On the optimality of conditional expectation as a bregman predictor,

A. Banerjee, X. Guo, and H. Wang, “On the optimality of conditional expectation as a bregman predictor,”IEEE Trans. Inf. Theory, vol. 51, no. 7, pp. 2664–2669, 2005

work page 2005

[54] [54]

Villani,Topics in optimal transportation, ser

C. Villani,Topics in optimal transportation, ser. Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2003, vol. 58

work page 2003

[55] [55]

Optimal rates of convergence for nonparametric estima- tors,

C. J. Stone, “Optimal rates of convergence for nonparametric estima- tors,”Ann. Statist., vol. 8, no. 6, pp. 1348–1360, 1980

work page 1980

[56] [56]

Optimal global rates of convergence for nonparametric regres- sion,

——, “Optimal global rates of convergence for nonparametric regres- sion,”Ann. Statist., vol. 10, no. 4, pp. 1040–1053, 1982

work page 1982

[57] [57]

Optimal transportation between unequal dimensions,

R. J. McCann and B. Pass, “Optimal transportation between unequal dimensions,”Arch. Ration. Mech. Anal., vol. 238, no. 3, pp. 1475–1520, 2020

work page 2020

[58] [58]

Regularity of optimal transportation between spaces with different dimensions,

B. Pass, “Regularity of optimal transportation between spaces with different dimensions,”Math. Res. Lett., vol. 19, no. 2, pp. 291–307, 2012

work page 2012

[59] [59]

The geometry of optimal transportation,

W. Gangbo and R. J. McCann, “The geometry of optimal transportation,” Acta Math., vol. 177, no. 2, pp. 113–161, 1996

work page 1996

[60] [60]

D ´ecomposition polaire et r ´earrangement monotone des champs de vecteurs,

Y . Brenier, “D ´ecomposition polaire et r ´earrangement monotone des champs de vecteurs,”C. R. Acad. Sci. Paris S ´er. I Math., vol. 305, no. 19, pp. 805–808, 1987

work page 1987

[61] [61]

Polar factorization and monotone rearrangement of vector-valued functions,

——, “Polar factorization and monotone rearrangement of vector-valued functions,”Comm. Pure Appl. Math., vol. 44, no. 4, pp. 375–417, 1991

work page 1991

[62] [62]

Boundary regularity of maps with convex potentials,

L. A. Caffarelli, “Boundary regularity of maps with convex potentials,” Comm. Pure Appl. Math., vol. 45, no. 9, pp. 1141–1151, 1992

work page 1992

[63] [63]

The regularity of mappings with a convex potential,

——, “The regularity of mappings with a convex potential,”J. Amer. Math. Soc., vol. 5, no. 1, pp. 99–104, 1992

work page 1992

[64] [64]

Boundary regularity of maps with convex potentials. II,

——, “Boundary regularity of maps with convex potentials. II,”Ann. of Math. (2), vol. 144, no. 3, pp. 453–496, 1996

work page 1996

[65] [65]

Monotonicity properties of optimal transportation and the FKG and related inequalities,

——, “Monotonicity properties of optimal transportation and the FKG and related inequalities,”Comm. Math. Phys., vol. 214, no. 3, pp. 547– 563, 2000

work page 2000

[66] [66]

On optimal transport maps between 1/d-concave densities,

G. Carlier, A. Figalli, and F. Santambrogio, “On optimal transport maps between 1/d-concave densities,”arXiv preprint arXiv:2404.05456, 2024

work page arXiv 2024

[67] [67]

Lipschitz changes of variables between perturbations of log-concave measures,

M. Colombo, A. Figalli, and Y . Jhaveri, “Lipschitz changes of variables between perturbations of log-concave measures,”Ann. Sc. Norm. Super. Pisa Cl. Sci. (5), vol. 17, no. 4, pp. 1491–1519, 2017. 9

work page 2017

[68] [68]

Adapting visual cate- gory models to new domains,

K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual cate- gory models to new domains,” inProceedings of the 11th European Conference on Computer Vision. Springer, 2010, pp. 213–226

work page 2010

[69] [69]

Cholletet al., “Keras,” https://keras.io, 2015

F. Cholletet al., “Keras,” https://keras.io, 2015

work page 2015

[70] [70]

A study of gaussian mixture models of color and texture features for image classification and seg- mentation,

H. Permuter, J. Francos, and I. Jermyn, “A study of gaussian mixture models of color and texture features for image classification and seg- mentation,”Pattern recognition, vol. 39, no. 4, pp. 695–706, 2006

work page 2006

[71] [71]

Screening examination of premature infants for retinopathy of prematurity,

W. M. Fierson, “Screening examination of premature infants for retinopathy of prematurity,”Pediatrics, vol. 142, no. 6, 2018

work page 2018

[72] [72]

Preterm- associated visual impairment and estimates of retinopathy of prematurity at regional and global levels for 2010,

H. Blencowe, J. Lawn, T. Vazquez, A. Fielder, and C. Gilbert, “Preterm- associated visual impairment and estimates of retinopathy of prematurity at regional and global levels for 2010,”Pediatr. Res., vol. 74 Suppl 1, pp. 35–49, 12 2013

work page 2010

[73] [73]

Retinopa- thy of prematurity in middle-income countries,

C. Gilbert, J. Rahi, M. Eckstein, J. O’sullivan, and A. Foster, “Retinopa- thy of prematurity in middle-income countries,”The Lancet, vol. 350, no. 9070, p. 12–14, 1997

work page 1997

[74] [74]

Revised Indications for the Treatment of Retinopathy of Prematurity: Results of the Early Treatment for Retinopathy of Prematurity Random- ized Trial,

Early Treatment For Retinopathy Of Prematurity Cooperative Group, “Revised Indications for the Treatment of Retinopathy of Prematurity: Results of the Early Treatment for Retinopathy of Prematurity Random- ized Trial,”Archives of Ophthalmology, vol. 121, no. 12, pp. 1684–1694, 2003

work page 2003