Deconvolution in unlinked linear models
Pith reviewed 2026-05-21 16:19 UTC · model grok-4.3
The pith
Assuming Z is a linear function of an observable covariate yields a nonparametric deconvolution estimator that converges at the parametric rate in Wasserstein-1 distance regardless of noise smoothness.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under the structural assumption that the latent variable Z is a linear function of an observable multidimensional covariate, a nonparametric estimator of the distribution of Z achieves the parametric rate of convergence in the Wasserstein distance of order 1, where the smoothness of the noise does not affect the rate.
What carries the argument
The linear structural assumption on the latent variable Z relative to the observable covariate, which supplies the unlinked regression structure needed to bypass the usual slow rates in deconvolution.
If this is right
- The distribution of Z can be estimated at the parametric rate in Wasserstein-1 distance without dependence on noise smoothness.
- Estimators for the unconditional density of Z and the conditional density of Z given an observed response become available.
- The latent linear predictor can be estimated even though its direct link to the response is inaccessible.
Where Pith is reading between the lines
- Similar structural constraints on Z might be substituted for the linear link to obtain fast rates in other deconvolution problems.
- The approach could be tested on data where covariates and responses are deliberately separated for privacy reasons.
- Approximate linearity might still deliver usable rates in finite samples even if the exact assumption fails.
Load-bearing premise
The assumption that the latent variable Z is exactly a linear function of the observable multidimensional covariate.
What would settle it
A simulation study in which the estimator fails to maintain the parametric rate once the noise distribution is made smoother, or once the linear link between Z and the covariate is removed, would settle the claim.
Figures
read the original abstract
Unlinked regression, in which covariates and responses are observed separately without known correspondence, has recently gained increasing attention. Deconvolution, on the other hand, is a fundamental and challenging problem in nonparametric statistics with the aim of estimating the distribution of a latent random variable $Z$ based on observations contaminated by some additive noise. The complexity of this task is heavily influenced by the smoothness of the noise distribution and often leads to slow estimation rates. In this paper, we combine the recent unlinked linear regression problem with the classical deconvolution framework. Specifically, we study nonparametric deconvolution under the assumption that $Z$ is a linear function of an observable multidimensional covariate. This structural constraint allows us to introduce a nonparametric estimator of the distribution of $Z$ which achieves the parametric rate of convergence in the Wasserstein distance of order 1, where the smoothness of the noise does not affect the rate. Furthermore, we introduce nonparametric estimators for the unconditional density of $Z$ and the conditional density of $Z$ given an observed response. This allows us to study the problem of estimating the value of the latent linear predictor, whose link to the observed response is not accessible. Through several simulations, we illustrate the fast convergence rate of our deconvolution estimator and the performance of the proposed conditional estimators of the latent predictor in different simulation scenarios.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies deconvolution of a latent variable Z under the structural assumption that Z equals X^T β exactly, where X is a d-dimensional covariate (d ≥ 2) observed separately from the noisy response Y = Z + noise (unlinked setting). It constructs a nonparametric estimator of the law of Z that attains the parametric rate n^{-1/2} in Wasserstein-1 distance, independent of the smoothness of the noise distribution. The manuscript also develops estimators for the unconditional density of Z and the conditional density of Z given Y, and reports simulation results illustrating the claimed rate.
Significance. If the central result holds, the work is significant for nonparametric statistics: the exact linear structure in dimension d ≥ 2 supplies enough identification to achieve a smoothness-independent parametric rate in a classically ill-posed deconvolution problem. The simulations provide concrete empirical support for the fast rate across different noise distributions. The paper supplies reproducible simulation code and explicit construction of the estimator, which strengthens the contribution.
major comments (2)
- [§3.1, Theorem 3.1] §3.1, Theorem 3.1: the proof that the W_1 error is O_p(n^{-1/2}) proceeds by first obtaining a root-n estimator of β from the unlinked marginals of X and Y, then plugging the estimated linear predictor into an empirical-measure deconvolution step; the argument that this plug-in step preserves the parametric rate without any loss from the noise characteristic function is only sketched and requires an explicit bound on the remainder term that is currently missing.
- [§2.3] §2.3: the identification argument that the joint law of (X, Y) determines the law of Z at parametric speed relies on d ≥ 2 and on the covariate distribution having a non-degenerate second-moment matrix; the paper does not state whether the rate remains parametric when these conditions are relaxed to d = 1 or to singular designs, which is load-bearing for the claim that the linear structure alone removes the usual ill-posedness.
minor comments (3)
- [Introduction] The definition of the Wasserstein-1 distance and the precise form of the estimator (e.g., whether it is an empirical measure on estimated linear predictors or involves kernel smoothing) should be stated in the main text rather than deferred entirely to the appendix.
- [Section 5] Simulation section: the number of Monte Carlo replications, the exact parameter values for the noise distributions, and the bandwidth choices (if any) are not reported in the captions of Figures 1–3.
- [References] A few recent references on unlinked regression (e.g., works from 2022–2024 on moment-based estimation in unlinked models) are absent from the bibliography.
Simulated Author's Rebuttal
Thank you for the opportunity to respond to the referee's report. We appreciate the positive assessment of the significance of our work and the constructive feedback on the technical details. Below we address the major comments point by point, indicating the changes we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3.1, Theorem 3.1] §3.1, Theorem 3.1: the proof that the W_1 error is O_p(n^{-1/2}) proceeds by first obtaining a root-n estimator of β from the unlinked marginals of X and Y, then plugging the estimated linear predictor into an empirical-measure deconvolution step; the argument that this plug-in step preserves the parametric rate without any loss from the noise characteristic function is only sketched and requires an explicit bound on the remainder term that is currently missing.
Authors: We agree that the current sketch of the plug-in argument in the proof of Theorem 3.1 leaves the remainder term implicit. In the revised manuscript, we will expand this part of the proof to include an explicit bound. Specifically, we will decompose the Wasserstein-1 error into the error from estimating β and the error from the empirical deconvolution step, and show using properties of the Wasserstein distance and the root-n consistency of the estimator for β that the cross term is o_p(n^{-1/2}). This will make the preservation of the parametric rate fully rigorous without relying on the smoothness of the noise. revision: yes
-
Referee: [§2.3] §2.3: the identification argument that the joint law of (X, Y) determines the law of Z at parametric speed relies on d ≥ 2 and on the covariate distribution having a non-degenerate second-moment matrix; the paper does not state whether the rate remains parametric when these conditions are relaxed to d = 1 or to singular designs, which is load-bearing for the claim that the linear structure alone removes the usual ill-posedness.
Authors: The referee correctly identifies that our identification and rate results rely on d ≥ 2 and the non-degeneracy of the second-moment matrix of X. For d = 1, the linear structure does not provide sufficient variation to achieve the parametric rate independently of the noise smoothness, as the problem reduces to a standard deconvolution setting with potential ill-posedness. Similarly, singular designs would limit the identifiability. We will add a remark in Section 2.3 clarifying these conditions and noting that the parametric rate is specific to d ≥ 2 with non-degenerate covariance. This strengthens the statement of the main result. revision: yes
Circularity Check
No significant circularity; estimator construction is independent under stated linearity assumption
full rationale
The paper introduces a new nonparametric estimator for the law of Z under the explicit modeling assumption that Z equals a linear function of an observable multidimensional covariate X. This structural constraint is invoked to obtain identification of the target distribution from the separate marginals of X and Y, which in turn yields the parametric W1 rate independent of noise smoothness. The estimator is presented as newly constructed rather than defined in terms of fitted values of the target itself or by renaming a known result. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the derivation chain described in the abstract and overview. The central claim therefore retains independent content once the modeling assumption is granted.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Z is a linear function of an observable multidimensional covariate
Reference graph
Works this paper leans on
-
[1]
Azadkia, M. and Balabdaoui, F. (2024). Linear regression with unmatched data: A deconvolution perspective. Journal of Machine Learning Research , 25(197):1--55
work page 2024
-
[2]
Balabdaoui, F., Doss, C. R., and Durot, C. (2021). Unlinked monotone regression. Journal of Machine Learning Research , 22:172
work page 2021
- [3]
-
[4]
Bobkov, S. and Ledoux, M. (2019). One-dimensional empirical measures, order statistics, and Kantorovich transport distances , volume 261. American Mathematical Society
work page 2019
-
[5]
Butucea, C. and Matias, C. (2005). Minimax estimation of the noise level and of the deconvolution density in a semiparametric convolution model. Bernoulli , 11(2):309--340
work page 2005
-
[6]
Caillerie, C., Chazal, F., Dedecker, J., and Michel, B. (2013). Deconvolution for the wasserstein metric and geometric inference. In International Conference on Geometric Science of Information , pages 561--568. Springer
work page 2013
-
[7]
Carpentier, A. and Schl \"u ter, T. (2016). Learning relationships between data obtained independently. In Artificial Intelligence and Statistics , pages 658--666. PMLR
work page 2016
-
[8]
Carrasco, M. and Florens, J.-P. (2011). A spectral method for deconvolving a density. Econometric Theory , 27(3):546--581
work page 2011
-
[9]
Carroll, R. J. and Hall, P. (1988). Optimal rates of convergence for deconvolving a density. Journal of the American Statistical Association , 83(404):1184--1186
work page 1988
-
[10]
Carroll, R. J. and Hall, P. (2004). Low order approximations in deconvolution and regression with errors in variables. Journal of the Royal Statistical Society Series B: Statistical Methodology , 66(1):31--46
work page 2004
-
[11]
Comte, F., Rozenholc, Y., and Taupin, M.-L. (2006). Penalized contrast estimator for adaptive density deconvolution. Canadian Journal of Statistics , 34(3):431--452
work page 2006
-
[12]
Comte, F., Rozenholc, Y., and Taupin, M.-L. (2007). Finite sample penalization in adaptive density deconvolution. Journal of Statistical Computation and Simulation , 77(11):977--1000
work page 2007
-
[13]
Dedecker, J., Fischer, A., and Michel, B. (2015). Improved rates for wasserstein deconvolution with ordinary smooth error in dimension one
work page 2015
-
[14]
Dedecker, J. and Michel, B. (2013). Minimax rates of convergence for wasserstein deconvolution with supersmooth errors in any dimension. Journal of Multivariate Analysis , 122:278--291
work page 2013
-
[15]
Delaigle, A. and Hall, P. (2014). Parametrically assisted nonparametric estimation of a density in the deconvolution problem. Journal of the American Statistical Association , 109(506):717--729
work page 2014
-
[16]
Durot, C. and Mukherjee, D. (2024). Minimax optimal rates of convergence in the shuffled regression, unlinked regression, and deconvolution under vanishing noise. arXiv preprint arXiv:2404.09306
-
[17]
Es, B. V. and Uh, H.-w. (2005). Asymptotic normality of kernel-type deconvolution estimators. Scandinavian Journal of Statistics , 32(3):467--483
work page 2005
-
[18]
Fan, J. (1991a). Asymptotic normality for deconvolution kernel density estimators. Sankhy\= a Ser. A , 53(1):97--110
-
[19]
Fan, J. (1991b). On the optimal rates of convergence for nonparametric deconvolution problems. The Annals of Statistics , pages 1257--1272
-
[20]
Fan, J. (1992). Deconvolution with supersmooth distributions. Canadian Journal of Statistics , 20(2):155--169
work page 1992
-
[21]
Fan, J. (1993). Adaptively local one-dimensional subproblems with application to a deconvolution problem. The Annals of Statistics , pages 600--610
work page 1993
-
[22]
Groeneboom, P. and Jongbloed, G. (2003). Density estimation in the uniform deconvolution model. Statistica Neerlandica , 57(1):136--157
work page 2003
-
[23]
Guan, Z. (2021). Fast nonparametric maximum likelihood density deconvolution using bernstein polynomials. Statistica Sinica , 31(2):891--908
work page 2021
-
[24]
Hall, P. and Qiu, P. (2005). Discrete-transform approach to deconvolution problems. Biometrika , 92(1):135--148
work page 2005
-
[25]
Hsu, D. J., Shi, K., and Sun, X. (2017). Linear regression without correspondence. Advances in Neural Information Processing Systems , 30
work page 2017
-
[26]
H \"u llermeier, E. and Waegeman, W. (2021). Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine learning , 110(3):457--506
work page 2021
-
[27]
Liu, M. C. and Taylor, R. L. (1989). A consistent nonparametric density estimator for the deconvolution problem. Canadian Journal of Statistics , 17(4):427--438
work page 1989
-
[28]
Lounici, K. and Nickl, R. (2011). Global uniform risk bounds for wavelet deconvolution estimators . The Annals of Statistics , 39(1):201 -- 231
work page 2011
-
[29]
Meis, J. and Mammen, E. (2020). Uncoupled isotonic regression with discrete errors. Personal communication
work page 2020
-
[30]
Meister, A. (2006). Density estimation with normal measurement error with unknown variance. Statistica Sinica , pages 195--211
work page 2006
-
[31]
Meister, A. (2009). Deconvolution Problems in Nonparametric Statistics . Springer Berlin Heidelberg
work page 2009
-
[32]
Pananjady, A., Wainwright, M. J., and Courtade, T. A. (2017). Linear regression with shuffled data: Statistical and computational limits of permutation recovery. IEEE Transactions on Information Theory , 64(5):3286--3300
work page 2017
-
[33]
Pensky, M. and Vidakovic, B. (1999). Adaptive wavelet estimator for nonparametric density deconvolution. The Annals of Statistics , 27(6):2033--2053
work page 1999
-
[34]
Rigollet, P. and Weed, J. (2019). Uncoupled isotonic regression via minimum wasserstein deconvolution. Information and Inference: A Journal of the IMA , 8(4):691--717
work page 2019
-
[35]
Slawski, M. and Ben-David, E. (2019). Linear regression with sparsely permuted data. Electronic Journal of Statististics , 13(1):1--36
work page 2019
-
[36]
Slawski, M., Ben-David, E., and Li, P. (2020). Two-stage approach to multivariate linear regression with sparsely mismatched data. The Journal of Machine Learning Research , 21(1):8422--8463
work page 2020
-
[37]
Slawski, M., Diao, G., and Ben-David, E. (2021). A pseudo-likelihood approach to linear regression with partially shuffled data. Journal of Computational and Graphical Statistics , 30(4):991--1003
work page 2021
-
[38]
Slawski, M. and Sen, B. (2024). Permuted and unlinked monotone regression in r\^ d: an approach based on mixture modeling and optimal transport. Journal of Machine Learning Research , 25(183):1--57
work page 2024
-
[39]
Stefanski, L. A. and Carroll, R. J. (1990). Deconvolving kernel density estimators. Statistics , 21(2):169--184
work page 1990
-
[40]
C., Peng, L., Conca, A., Kneip, L., Shi, Y., and Choi, H
Tsakiris, M. C., Peng, L., Conca, A., Kneip, L., Shi, Y., and Choi, H. (2020). An algebraic-geometric approach for linear regression without correspondences. IEEE Transactions on Information Theory , 66(8):5130--5144
work page 2020
-
[41]
Unnikrishnan, J., Haghighatshoar, S., and Vetterli, M. (2018). Unlabeled sensing with random linear measurements. IEEE Trans. Inform. Theory , 64(5):3237--3253
work page 2018
-
[42]
van de Geer, S. (2000). Empirical Processes in M-estimation , volume 6. Cambridge university press
work page 2000
-
[43]
van der Vaart, A. W. and Wellner, J. A. (2023). Weak Convergence and Empirical Processes . Springer Series in Statistics. Springer Cham
work page 2023
-
[44]
Zhang, H., Slawski, M., and Li, P. (2021). The benefits of diversity: Permutation recovery in unlabeled sensing from multiple measurement vectors. IEEE Transactions on Information Theory , 68(4):2509--2529
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.