pith. sign in

arxiv: 2512.19929 · v2 · pith:UFYCO6JInew · submitted 2025-12-22 · 🧮 math.ST · stat.TH

Deconvolution in unlinked linear models

Pith reviewed 2026-05-21 16:19 UTC · model grok-4.3

classification 🧮 math.ST stat.TH
keywords deconvolutionunlinked regressionnonparametric estimationWasserstein distancelinear modelslatent variable
0
0 comments X

The pith

Assuming Z is a linear function of an observable covariate yields a nonparametric deconvolution estimator that converges at the parametric rate in Wasserstein-1 distance regardless of noise smoothness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper merges the unlinked regression setting, where covariates and responses lack known pairings, with classical deconvolution for recovering the law of a latent variable Z. Under the modeling choice that Z equals a linear function of a multidimensional observable covariate, the authors build a nonparametric estimator for the distribution of Z. This estimator reaches the parametric convergence rate in the Wasserstein distance of order 1, and the usual penalty from the smoothness of the additive noise disappears. The work also supplies estimators for the unconditional density of Z and the conditional density of Z given a response, then uses them to recover the latent linear predictor.

Core claim

Under the structural assumption that the latent variable Z is a linear function of an observable multidimensional covariate, a nonparametric estimator of the distribution of Z achieves the parametric rate of convergence in the Wasserstein distance of order 1, where the smoothness of the noise does not affect the rate.

What carries the argument

The linear structural assumption on the latent variable Z relative to the observable covariate, which supplies the unlinked regression structure needed to bypass the usual slow rates in deconvolution.

If this is right

  • The distribution of Z can be estimated at the parametric rate in Wasserstein-1 distance without dependence on noise smoothness.
  • Estimators for the unconditional density of Z and the conditional density of Z given an observed response become available.
  • The latent linear predictor can be estimated even though its direct link to the response is inaccessible.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar structural constraints on Z might be substituted for the linear link to obtain fast rates in other deconvolution problems.
  • The approach could be tested on data where covariates and responses are deliberately separated for privacy reasons.
  • Approximate linearity might still deliver usable rates in finite samples even if the exact assumption fails.

Load-bearing premise

The assumption that the latent variable Z is exactly a linear function of the observable multidimensional covariate.

What would settle it

A simulation study in which the estimator fails to maintain the parametric rate once the noise distribution is made smoother, or once the linear link between Z and the covariate is removed, would settle the claim.

Figures

Figures reproduced from arXiv: 2512.19929 by Antonio Di Noia, C\'ecile Durot, Fadoua Balabdaoui.

Figure 1
Figure 1. Figure 1: Log-linear fit for the first moment W (1) 1 of the Monte Carlo distribution of W1(µ ∗ βbn , µ0) across settings (a)-(d) and for increasing sample size. Slopes of the linear fit: -0.502 (a), -0.493 (b), -0.501 (c), -0.485 (d). The results show strong evidence for the parametric rate of convergence predicted by the theoretical results. fZ. However, as expected, the conditional density estimator bfZ|Y =y0 yie… view at source ↗
Figure 2
Figure 2. Figure 2: Graphical results of Experiment 2 under the settings (a)-(d). It is shown that conditional estimators introduced in Section 2 perform very well for predicting the true z0 linked to y0. The true (z0, y0) in the prediction step are (−3.155, −3.996) in (a), (7.074, 6.233) in (b), (4.809, 3.968) in (c) and (22.367, 21.526) in (d). The performance comparison is done by computing the performance ratios REb = MSE… view at source ↗
read the original abstract

Unlinked regression, in which covariates and responses are observed separately without known correspondence, has recently gained increasing attention. Deconvolution, on the other hand, is a fundamental and challenging problem in nonparametric statistics with the aim of estimating the distribution of a latent random variable $Z$ based on observations contaminated by some additive noise. The complexity of this task is heavily influenced by the smoothness of the noise distribution and often leads to slow estimation rates. In this paper, we combine the recent unlinked linear regression problem with the classical deconvolution framework. Specifically, we study nonparametric deconvolution under the assumption that $Z$ is a linear function of an observable multidimensional covariate. This structural constraint allows us to introduce a nonparametric estimator of the distribution of $Z$ which achieves the parametric rate of convergence in the Wasserstein distance of order 1, where the smoothness of the noise does not affect the rate. Furthermore, we introduce nonparametric estimators for the unconditional density of $Z$ and the conditional density of $Z$ given an observed response. This allows us to study the problem of estimating the value of the latent linear predictor, whose link to the observed response is not accessible. Through several simulations, we illustrate the fast convergence rate of our deconvolution estimator and the performance of the proposed conditional estimators of the latent predictor in different simulation scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper studies deconvolution of a latent variable Z under the structural assumption that Z equals X^T β exactly, where X is a d-dimensional covariate (d ≥ 2) observed separately from the noisy response Y = Z + noise (unlinked setting). It constructs a nonparametric estimator of the law of Z that attains the parametric rate n^{-1/2} in Wasserstein-1 distance, independent of the smoothness of the noise distribution. The manuscript also develops estimators for the unconditional density of Z and the conditional density of Z given Y, and reports simulation results illustrating the claimed rate.

Significance. If the central result holds, the work is significant for nonparametric statistics: the exact linear structure in dimension d ≥ 2 supplies enough identification to achieve a smoothness-independent parametric rate in a classically ill-posed deconvolution problem. The simulations provide concrete empirical support for the fast rate across different noise distributions. The paper supplies reproducible simulation code and explicit construction of the estimator, which strengthens the contribution.

major comments (2)
  1. [§3.1, Theorem 3.1] §3.1, Theorem 3.1: the proof that the W_1 error is O_p(n^{-1/2}) proceeds by first obtaining a root-n estimator of β from the unlinked marginals of X and Y, then plugging the estimated linear predictor into an empirical-measure deconvolution step; the argument that this plug-in step preserves the parametric rate without any loss from the noise characteristic function is only sketched and requires an explicit bound on the remainder term that is currently missing.
  2. [§2.3] §2.3: the identification argument that the joint law of (X, Y) determines the law of Z at parametric speed relies on d ≥ 2 and on the covariate distribution having a non-degenerate second-moment matrix; the paper does not state whether the rate remains parametric when these conditions are relaxed to d = 1 or to singular designs, which is load-bearing for the claim that the linear structure alone removes the usual ill-posedness.
minor comments (3)
  1. [Introduction] The definition of the Wasserstein-1 distance and the precise form of the estimator (e.g., whether it is an empirical measure on estimated linear predictors or involves kernel smoothing) should be stated in the main text rather than deferred entirely to the appendix.
  2. [Section 5] Simulation section: the number of Monte Carlo replications, the exact parameter values for the noise distributions, and the bandwidth choices (if any) are not reported in the captions of Figures 1–3.
  3. [References] A few recent references on unlinked regression (e.g., works from 2022–2024 on moment-based estimation in unlinked models) are absent from the bibliography.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's report. We appreciate the positive assessment of the significance of our work and the constructive feedback on the technical details. Below we address the major comments point by point, indicating the changes we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§3.1, Theorem 3.1] §3.1, Theorem 3.1: the proof that the W_1 error is O_p(n^{-1/2}) proceeds by first obtaining a root-n estimator of β from the unlinked marginals of X and Y, then plugging the estimated linear predictor into an empirical-measure deconvolution step; the argument that this plug-in step preserves the parametric rate without any loss from the noise characteristic function is only sketched and requires an explicit bound on the remainder term that is currently missing.

    Authors: We agree that the current sketch of the plug-in argument in the proof of Theorem 3.1 leaves the remainder term implicit. In the revised manuscript, we will expand this part of the proof to include an explicit bound. Specifically, we will decompose the Wasserstein-1 error into the error from estimating β and the error from the empirical deconvolution step, and show using properties of the Wasserstein distance and the root-n consistency of the estimator for β that the cross term is o_p(n^{-1/2}). This will make the preservation of the parametric rate fully rigorous without relying on the smoothness of the noise. revision: yes

  2. Referee: [§2.3] §2.3: the identification argument that the joint law of (X, Y) determines the law of Z at parametric speed relies on d ≥ 2 and on the covariate distribution having a non-degenerate second-moment matrix; the paper does not state whether the rate remains parametric when these conditions are relaxed to d = 1 or to singular designs, which is load-bearing for the claim that the linear structure alone removes the usual ill-posedness.

    Authors: The referee correctly identifies that our identification and rate results rely on d ≥ 2 and the non-degeneracy of the second-moment matrix of X. For d = 1, the linear structure does not provide sufficient variation to achieve the parametric rate independently of the noise smoothness, as the problem reduces to a standard deconvolution setting with potential ill-posedness. Similarly, singular designs would limit the identifiability. We will add a remark in Section 2.3 clarifying these conditions and noting that the parametric rate is specific to d ≥ 2 with non-degenerate covariance. This strengthens the statement of the main result. revision: yes

Circularity Check

0 steps flagged

No significant circularity; estimator construction is independent under stated linearity assumption

full rationale

The paper introduces a new nonparametric estimator for the law of Z under the explicit modeling assumption that Z equals a linear function of an observable multidimensional covariate X. This structural constraint is invoked to obtain identification of the target distribution from the separate marginals of X and Y, which in turn yields the parametric W1 rate independent of noise smoothness. The estimator is presented as newly constructed rather than defined in terms of fitted values of the target itself or by renaming a known result. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the derivation chain described in the abstract and overview. The central claim therefore retains independent content once the modeling assumption is granted.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the modeling assumption that the latent variable is exactly linear in the observed covariates; no free parameters, invented entities, or additional axioms are visible in the abstract.

axioms (1)
  • domain assumption Z is a linear function of an observable multidimensional covariate
    This structural constraint is the key modeling choice stated in the abstract that enables the parametric rate.

pith-pipeline@v0.9.0 · 5769 in / 1332 out tokens · 48679 ms · 2026-05-21T16:19:52.275269+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

  1. [1]

    and Balabdaoui, F

    Azadkia, M. and Balabdaoui, F. (2024). Linear regression with unmatched data: A deconvolution perspective. Journal of Machine Learning Research , 25(197):1--55

  2. [2]

    R., and Durot, C

    Balabdaoui, F., Doss, C. R., and Durot, C. (2021). Unlinked monotone regression. Journal of Machine Learning Research , 22:172

  3. [3]

    Balabdaoui, F., Slwaski, M., and Jonathan, S. (2025). Identifiability in unlinked linear regression: Some results and open problems. arXiv:2507.14986

  4. [4]

    and Ledoux, M

    Bobkov, S. and Ledoux, M. (2019). One-dimensional empirical measures, order statistics, and Kantorovich transport distances , volume 261. American Mathematical Society

  5. [5]

    and Matias, C

    Butucea, C. and Matias, C. (2005). Minimax estimation of the noise level and of the deconvolution density in a semiparametric convolution model. Bernoulli , 11(2):309--340

  6. [6]

    Caillerie, C., Chazal, F., Dedecker, J., and Michel, B. (2013). Deconvolution for the wasserstein metric and geometric inference. In International Conference on Geometric Science of Information , pages 561--568. Springer

  7. [7]

    and Schl \"u ter, T

    Carpentier, A. and Schl \"u ter, T. (2016). Learning relationships between data obtained independently. In Artificial Intelligence and Statistics , pages 658--666. PMLR

  8. [8]

    and Florens, J.-P

    Carrasco, M. and Florens, J.-P. (2011). A spectral method for deconvolving a density. Econometric Theory , 27(3):546--581

  9. [9]

    Carroll, R. J. and Hall, P. (1988). Optimal rates of convergence for deconvolving a density. Journal of the American Statistical Association , 83(404):1184--1186

  10. [10]

    Carroll, R. J. and Hall, P. (2004). Low order approximations in deconvolution and regression with errors in variables. Journal of the Royal Statistical Society Series B: Statistical Methodology , 66(1):31--46

  11. [11]

    Comte, F., Rozenholc, Y., and Taupin, M.-L. (2006). Penalized contrast estimator for adaptive density deconvolution. Canadian Journal of Statistics , 34(3):431--452

  12. [12]

    Comte, F., Rozenholc, Y., and Taupin, M.-L. (2007). Finite sample penalization in adaptive density deconvolution. Journal of Statistical Computation and Simulation , 77(11):977--1000

  13. [13]

    Dedecker, J., Fischer, A., and Michel, B. (2015). Improved rates for wasserstein deconvolution with ordinary smooth error in dimension one

  14. [14]

    and Michel, B

    Dedecker, J. and Michel, B. (2013). Minimax rates of convergence for wasserstein deconvolution with supersmooth errors in any dimension. Journal of Multivariate Analysis , 122:278--291

  15. [15]

    and Hall, P

    Delaigle, A. and Hall, P. (2014). Parametrically assisted nonparametric estimation of a density in the deconvolution problem. Journal of the American Statistical Association , 109(506):717--729

  16. [16]

    and Mukherjee, D

    Durot, C. and Mukherjee, D. (2024). Minimax optimal rates of convergence in the shuffled regression, unlinked regression, and deconvolution under vanishing noise. arXiv preprint arXiv:2404.09306

  17. [17]

    Es, B. V. and Uh, H.-w. (2005). Asymptotic normality of kernel-type deconvolution estimators. Scandinavian Journal of Statistics , 32(3):467--483

  18. [18]

    Fan, J. (1991a). Asymptotic normality for deconvolution kernel density estimators. Sankhy\= a Ser. A , 53(1):97--110

  19. [19]

    Fan, J. (1991b). On the optimal rates of convergence for nonparametric deconvolution problems. The Annals of Statistics , pages 1257--1272

  20. [20]

    Fan, J. (1992). Deconvolution with supersmooth distributions. Canadian Journal of Statistics , 20(2):155--169

  21. [21]

    Fan, J. (1993). Adaptively local one-dimensional subproblems with application to a deconvolution problem. The Annals of Statistics , pages 600--610

  22. [22]

    and Jongbloed, G

    Groeneboom, P. and Jongbloed, G. (2003). Density estimation in the uniform deconvolution model. Statistica Neerlandica , 57(1):136--157

  23. [23]

    Guan, Z. (2021). Fast nonparametric maximum likelihood density deconvolution using bernstein polynomials. Statistica Sinica , 31(2):891--908

  24. [24]

    and Qiu, P

    Hall, P. and Qiu, P. (2005). Discrete-transform approach to deconvolution problems. Biometrika , 92(1):135--148

  25. [25]

    J., Shi, K., and Sun, X

    Hsu, D. J., Shi, K., and Sun, X. (2017). Linear regression without correspondence. Advances in Neural Information Processing Systems , 30

  26. [26]

    and Waegeman, W

    H \"u llermeier, E. and Waegeman, W. (2021). Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine learning , 110(3):457--506

  27. [27]

    Liu, M. C. and Taylor, R. L. (1989). A consistent nonparametric density estimator for the deconvolution problem. Canadian Journal of Statistics , 17(4):427--438

  28. [28]

    and Nickl, R

    Lounici, K. and Nickl, R. (2011). Global uniform risk bounds for wavelet deconvolution estimators . The Annals of Statistics , 39(1):201 -- 231

  29. [29]

    and Mammen, E

    Meis, J. and Mammen, E. (2020). Uncoupled isotonic regression with discrete errors. Personal communication

  30. [30]

    Meister, A. (2006). Density estimation with normal measurement error with unknown variance. Statistica Sinica , pages 195--211

  31. [31]

    Meister, A. (2009). Deconvolution Problems in Nonparametric Statistics . Springer Berlin Heidelberg

  32. [32]

    J., and Courtade, T

    Pananjady, A., Wainwright, M. J., and Courtade, T. A. (2017). Linear regression with shuffled data: Statistical and computational limits of permutation recovery. IEEE Transactions on Information Theory , 64(5):3286--3300

  33. [33]

    and Vidakovic, B

    Pensky, M. and Vidakovic, B. (1999). Adaptive wavelet estimator for nonparametric density deconvolution. The Annals of Statistics , 27(6):2033--2053

  34. [34]

    and Weed, J

    Rigollet, P. and Weed, J. (2019). Uncoupled isotonic regression via minimum wasserstein deconvolution. Information and Inference: A Journal of the IMA , 8(4):691--717

  35. [35]

    and Ben-David, E

    Slawski, M. and Ben-David, E. (2019). Linear regression with sparsely permuted data. Electronic Journal of Statististics , 13(1):1--36

  36. [36]

    Slawski, M., Ben-David, E., and Li, P. (2020). Two-stage approach to multivariate linear regression with sparsely mismatched data. The Journal of Machine Learning Research , 21(1):8422--8463

  37. [37]

    Slawski, M., Diao, G., and Ben-David, E. (2021). A pseudo-likelihood approach to linear regression with partially shuffled data. Journal of Computational and Graphical Statistics , 30(4):991--1003

  38. [38]

    and Sen, B

    Slawski, M. and Sen, B. (2024). Permuted and unlinked monotone regression in r\^ d: an approach based on mixture modeling and optimal transport. Journal of Machine Learning Research , 25(183):1--57

  39. [39]

    Stefanski, L. A. and Carroll, R. J. (1990). Deconvolving kernel density estimators. Statistics , 21(2):169--184

  40. [40]

    C., Peng, L., Conca, A., Kneip, L., Shi, Y., and Choi, H

    Tsakiris, M. C., Peng, L., Conca, A., Kneip, L., Shi, Y., and Choi, H. (2020). An algebraic-geometric approach for linear regression without correspondences. IEEE Transactions on Information Theory , 66(8):5130--5144

  41. [41]

    Unnikrishnan, J., Haghighatshoar, S., and Vetterli, M. (2018). Unlabeled sensing with random linear measurements. IEEE Trans. Inform. Theory , 64(5):3237--3253

  42. [42]

    van de Geer, S. (2000). Empirical Processes in M-estimation , volume 6. Cambridge university press

  43. [43]

    van der Vaart, A. W. and Wellner, J. A. (2023). Weak Convergence and Empirical Processes . Springer Series in Statistics. Springer Cham

  44. [44]

    Zhang, H., Slawski, M., and Li, P. (2021). The benefits of diversity: Permutation recovery in unlabeled sensing from multiple measurement vectors. IEEE Transactions on Information Theory , 68(4):2509--2529