pith. sign in

arxiv: 2605.16027 · v1 · pith:AJ3XOPGRnew · submitted 2026-05-15 · 🧮 math.ST · stat.TH

Nearest-Neighbour Matching on Unbounded Supports and Covariate Shift Transfer

Pith reviewed 2026-05-19 18:50 UTC · model grok-4.3

classification 🧮 math.ST stat.TH
keywords nearest-neighbour matchingcovariate shifttransfer learningunbounded supportsconvergence ratestreatment effectstransferability measurenonparametric estimation
0
0 comments X

The pith

Nearest-neighbour matching achieves usual convergence rates on unbounded supports via a transferability measure between source and target distributions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that nearest-neighbour estimators for expectations with missing labels can attain standard convergence rates without assuming that covariates live in a compact set whose density is bounded away from zero. The usual support and density assumptions are replaced by conditions on the source and target probability measures, the key one being a finite transferability measure that controls the bias and variance of the estimator. The framework accommodates distributions supported on manifolds and situations in which the target distribution has heavier tails than the source. The same transferability condition is shown to be necessary for any estimator to achieve good rates. When applied to average treatment effect estimation, the results relax the common requirement that assignment probabilities lie away from zero and one.

Core claim

The central claim is that the usual rates of convergence for nearest-neighbour matching estimators hold under minimal assumptions on covariate supports when these are replaced by conditions on the source and target distributions, including a measure of transferability between the two probability measures. These conditions are general enough to cover distributions supported on manifolds and to allow the target to have heavier tails than the source. The transferability measure is also necessary for good rates, and its use relaxes the positivity assumption on assignment probabilities in treatment-effect settings.

What carries the argument

The transferability measure between source and target probability measures, which bounds the mismatch and thereby controls bias and variance in the nearest-neighbour estimator.

If this is right

  • The nearest-neighbour estimator attains its usual convergence rates even when covariate supports are unbounded.
  • The results apply directly to distributions supported on manifolds.
  • The target distribution is permitted to have heavier tails than the source.
  • The transferability condition is necessary for any estimator to achieve good convergence rates.
  • The positivity assumption on assignment probabilities can be relaxed in average treatment effect estimation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar transferability conditions could be used to justify nearest-neighbour methods in other covariate-shift problems beyond treatment effects.
  • Data analysts might estimate the transferability measure from samples to decide whether nearest-neighbour matching is reliable for a given source-target pair.
  • The framework suggests examining how the transferability measure scales with dimension or tail index in concrete families of distributions.

Load-bearing premise

A finite transferability measure exists between the source and target probability measures.

What would settle it

A concrete distribution pair where the transferability measure is infinite yet a nearest-neighbour estimator still attains the stated convergence rate, or where the measure is finite yet the rate fails.

Figures

Figures reproduced from arXiv: 2605.16027 by Simon Viel.

Figure 1
Figure 1. Figure 1: Bias rates, Setup Exponential-Sin source sample. However, the performances of all the other estimators seem to decay after this critical point. A similar curve appears on the right side of Figure 3b. It happens when µP ≥ 2, that is, when the condition 2 = 2µQ > µP is no longer satisfied, as we have forecast in our analysis; see Propositions 1 and 2. VIII. CONCLUSION We showed that the k-NN matching and loc… view at source ↗
Figure 2
Figure 2. Figure 2: Bias rates, Setup Normal-Poly (a) Setup Exponential-Sin (b) Setup Normal-Poly [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Transferability and mean-squared errors [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗
read the original abstract

Expectations of multivariate functions with missing labels occur in various fields such as transfer learning and average treatment effects. Although non-parametric estimators based on nearest-neighbour matching are frequently used in this context, the existing literature assumes that the covariates live in some well-shaped compact subset of $\R^d$, with densities that are bounded away from zero. In this paper, we show that the usual rates of convergence can be achieved with minimal assumptions on the covariate supports. These assumptions are replaced with conditions on the source and target distributions, among which a measure of the tranferability between the two probability measures. We show that these conditions are general, can be applied to distributions supported on manifolds, and allow the target distribution to have a heavier tail than the source distribution. We also show that this control of the transferability is needed for any estimator to achieve good rates of convergence. Finally, applying our results to the estimation of treatment effects, we could relax the assumption that the assignment probabilities had to be bounded away from zero and one.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript develops theory for nearest-neighbour matching estimators in the presence of covariate shift when the covariate support is unbounded. It replaces standard compactness and density lower bound assumptions with a finite 'transferability' measure between the source and target probability measures, along with other distributional conditions. The authors prove that standard convergence rates are attained under these conditions, show applicability to manifold supports and heavier target tails, establish a necessity result for the transferability control, and apply the framework to average treatment effect estimation by relaxing the usual overlap conditions on propensity scores.

Significance. If the transferability condition successfully controls both bias and variance terms locally, the results would be significant for extending non-parametric methods to realistic high-dimensional or heavy-tailed settings in transfer learning and causal inference. The necessity result and the relaxation for treatment effects are particularly valuable. The work provides a more general framework than existing literature on compact supports.

major comments (2)
  1. [§3.1] Definition of the transferability measure (presumably §2 or §3.1): The measure is presented as a global functional (e.g., an integral or Orlicz norm involving the Radon-Nikodym derivative between source and target). It is unclear whether this global bound automatically guarantees that, in regions where the target places positive mass but the source density decays rapidly due to heavier tails, the local source measure of k-NN balls remains large enough to preserve the usual stochastic error rate. This local-control step is load-bearing for the central rate claim.
  2. [Theorem 3.1] Main convergence theorem (likely Theorem 3.1 or 4.1): The argument that finite transferability plus the other distributional conditions yields the standard NN rate appears to rely on a uniform control of the NN radius. On unbounded supports with tail mismatch, a global transferability quantity does not obviously preclude arbitrarily small local source probabilities around some target points, which would inflate the variance term beyond the claimed rate. A more explicit high-probability bound or truncation argument for tail regions would strengthen the result.
minor comments (2)
  1. [Abstract] Abstract: 'tranferability' is a typo and should read 'transferability'.
  2. [§2] Notation: The precise mathematical definition of the transferability measure (e.g., the exact Orlicz norm or integral) should be displayed as a numbered equation for easy reference in later proofs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading of the manuscript and the constructive comments. We address the two major comments point by point below. Where the comments identify opportunities for greater clarity or explicit bounds, we have revised the manuscript accordingly.

read point-by-point responses
  1. Referee: [§3.1] Definition of the transferability measure (presumably §2 or §3.1): The measure is presented as a global functional (e.g., an integral or Orlicz norm involving the Radon-Nikodym derivative between source and target). It is unclear whether this global bound automatically guarantees that, in regions where the target places positive mass but the source density decays rapidly due to heavier tails, the local source measure of k-NN balls remains large enough to preserve the usual stochastic error rate. This local-control step is load-bearing for the central rate claim.

    Authors: We agree that local control of the source measure within k-NN balls is essential. The transferability functional is global, yet the combination of finite transferability with the paper’s other assumptions (source density bounded away from zero on its support and target moments of order greater than 2) implies the required local lower bound via a direct application of Markov’s inequality to the density ratio. Specifically, the probability that a target point lies in a region where the source measure of a ball of radius r is smaller than the typical value decays sufficiently fast. We have added a new lemma in §3.1 that makes this local implication explicit and shows that the stochastic error term remains of the claimed order. revision: yes

  2. Referee: [Theorem 3.1] Main convergence theorem (likely Theorem 3.1 or 4.1): The argument that finite transferability plus the other distributional conditions yields the standard NN rate appears to rely on a uniform control of the NN radius. On unbounded supports with tail mismatch, a global transferability quantity does not obviously preclude arbitrarily small local source probabilities around some target points, which would inflate the variance term beyond the claimed rate. A more explicit high-probability bound or truncation argument for tail regions would strengthen the result.

    Authors: We thank the referee for this observation. While the original argument already uses transferability to bound both bias and variance, an explicit truncation step clarifies the tail control. In the revised proof of Theorem 3.1 we now truncate the target sample at a slowly growing radius R_n = o(1) chosen so that the target mass outside the truncation is o(n^{-1/2}). Inside the truncated region the finite transferability plus the moment assumptions yield a uniform high-probability lower bound on the source measure of k-NN balls, preventing the variance inflation the referee correctly flags. The updated proof contains the full truncation argument and the resulting high-probability bound on the NN radius. revision: yes

Circularity Check

0 steps flagged

No circularity: transferability measure introduced as external condition on distributions

full rationale

The paper replaces compact-support assumptions with an external condition (a finite transferability measure between source and target measures) and proves that this suffices for standard NN convergence rates while also showing necessity for any estimator. No step fits a parameter to the target data and renames the fit as a prediction, no self-citation chain bears the central load, and the derivation does not reduce any claimed result to its own inputs by construction. The transferability quantity is defined on the pair of measures independently of the estimator's output, making the argument self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central addition is the transferability measure, which is introduced to control rates in place of support assumptions; no free parameters are mentioned in the abstract.

axioms (1)
  • standard math Source and target are probability measures on R^d or manifolds embedded in R^d
    Basic setup invoked to define the covariate distributions and the estimator.
invented entities (1)
  • Transferability measure between source and target distributions no independent evidence
    purpose: Quantifies the relationship that controls convergence rates of the nearest-neighbour estimator
    New condition introduced to replace compact support and bounded density assumptions; no independent evidence outside the paper is provided in the abstract.

pith-pipeline@v0.9.0 · 5696 in / 1260 out tokens · 59589 ms · 2026-05-19T18:50:16.350896+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

  1. [1]

    2025 , journal=

    Convergence rate for Nearest Neighbour matching: geometry of the domain and higher-order regularity , author=. 2025 , journal=

  2. [2]

    The Annals of Statistics , volume=

    Multivariate root-n-consistent smoothing parameter-free matching estimators and estimators of inverse density weighted expectations , author=. The Annals of Statistics , volume=. 2026 , publisher=

  3. [3]

    Journal of Machine Learning Research , volume=

    Nearest Neighbor Sampling for Covariate Shift Adaptation , author=. Journal of Machine Learning Research , volume=

  4. [4]

    Econometrica , volume=

    Large sample properties of matching estimators for average treatment effects , author=. Econometrica , volume=

  5. [5]

    Econometrica , volume=

    On the failure of the bootstrap for matching estimators , author=. Econometrica , volume=

  6. [6]

    Imbens , title=

    Alberto Abadie and Guido W. Imbens , title=. Journal of Business & Economic Statistics , volume=

  7. [7]

    Journal of the American Statistical Association , volume=

    A martingale representation for matching estimators , author=. Journal of the American Statistical Association , volume=

  8. [8]

    Econometrica , volume=

    Lin, Zhexiao and Ding, Peng and Han, Fang , title=. Econometrica , volume=

  9. [9]

    Biometrika , volume =

    On the consistency of bootstrap for matching estimators , author=. Biometrika , volume =. 2026 , publisher=

  10. [10]

    2000 , author=

    Improving predictive inference under covariate shift by weighting the log-likelihood function , journal=. 2000 , author=

  11. [11]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

    A review of domain adaptation without target labels , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2019 , publisher=

  12. [12]

    Advances in Neural Information Processing Systems , volume=

    Correcting sample selection bias by unlabeled data , author=. Advances in Neural Information Processing Systems , volume=

  13. [13]

    Dataset shift in machine learning , volume=

    Covariate shift by kernel mean matching , author=. Dataset shift in machine learning , volume=

  14. [14]

    The Journal of Machine Learning Research , volume=

    A least-squares approach to direct importance estimation , author=. The Journal of Machine Learning Research , volume=. 2009 , publisher=

  15. [15]

    2012 IEEE international workshop on machine learning for signal processing , pages=

    Nearest neighbor-based importance weighting , author=. 2012 IEEE international workshop on machine learning for signal processing , pages=. 2012 , organization=

  16. [16]

    arXiv preprint arXiv:2401.11554 , year=

    Transfer Learning under Covariate Shift: Local k -Nearest Neighbours Regression with Heavy-Tailed Design , author=. arXiv preprint arXiv:2401.11554 , year=

  17. [17]

    arXiv preprint arXiv:2603.05897 , year=

    A Minimax Theory of Nonparametric Regression Under Covariate Shift , author=. arXiv preprint arXiv:2603.05897 , year=

  18. [18]

    The Annals of Statistics , volume=

    Optimally tackling covariate shift in RKHS-based nonparametric regression , author=. The Annals of Statistics , volume=. 2023 , publisher=

  19. [19]

    International Conference on Machine Learning , pages=

    A new similarity measure for covariate shift with applications to nonparametric regression , author=. International Conference on Machine Learning , pages=. 2022 , organization=

  20. [20]

    The Annals of Statistics , number=

    Samory Kpotufe and Guillaume Martinet , title=. The Annals of Statistics , number=

  21. [21]

    Algorithmic Learning Theory , pages=

    Self-tuning bandits over unknown covariate-shifts , author=. Algorithmic Learning Theory , pages=. 2021 , organization=

  22. [22]

    The Annals of Statistics , volume=

    Transfer learning for contextual multi-armed bandits , author=. The Annals of Statistics , volume=. 2024 , publisher=

  23. [23]

    Econometrica: Journal of the Econometric Society , pages=

    Sample selection bias as a specification error , author=. Econometrica: Journal of the Econometric Society , pages=. 1979 , publisher=

  24. [24]

    Advances in Neural Information Processing Systems , volume=

    Dirichlet-enhanced spam filtering based on biased samples , author=. Advances in Neural Information Processing Systems , volume=

  25. [25]

    Proceedings of the 45th Annual Meeting of the Association Computational Linguistics , year=

    Instance weighting for domain adaptation in NLP , author=. Proceedings of the 45th Annual Meeting of the Association Computational Linguistics , year=

  26. [26]

    , author=

    Covariate shift adaptation by importance weighted cross validation. , author=. Journal of Machine Learning Research , volume=

  27. [27]

    Advances in Neural Information Processing Systems , volume=

    Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , author=. Advances in Neural Information Processing Systems , volume=

  28. [28]

    IEEE Transactions on Biomedical Engineering , volume=

    Application of covariate shift adaptation techniques in brain--computer interfaces , author=. IEEE Transactions on Biomedical Engineering , volume=. 2010 , publisher=

  29. [29]

    Proceedings of Machine Learning Research , volume=

    A One-Step Approach to Covariate Shift Adaptation , author=. Proceedings of Machine Learning Research , volume=

  30. [30]

    Tsybakov , title=

    Jean-Yves Audibert and Alexandre B. Tsybakov , title=. The Annals of Statistics , number=

  31. [31]

    Electronic Journal of Statistics , volume=

    A nearest neighbor estimate of the residual variance , author=. Electronic Journal of Statistics , volume=

  32. [32]

    , journal=

    Sricharan, Kumar and Raich, Raviv and Hero, Alfred O. , journal=. Estimation of Nonlinear Functionals of Densities With Confidence , year=

  33. [33]

    Peng Zhao, Yu-Jie Zhang, Lijun Zhang, and Zhi-Hua Zhou

    Yu, Bin. Assouad, Fano, and Le Cam. Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics. 1997. doi:10.1007/978-1-4612-1880-7_29