pith. sign in

arxiv: 2605.20601 · v1 · pith:GC2VUPIFnew · submitted 2026-05-20 · 💰 econ.EM

Endogenous Quantile Regression with Measurement Error in Dependent Variable

Pith reviewed 2026-05-21 02:42 UTC · model grok-4.3

classification 💰 econ.EM
keywords quantile regressionendogeneitymeasurement errorcontrol functionsieve ML estimatornonparametric identificationtriangular system
0
0 comments X

The pith

A control function approach makes quantile regression coefficients identifiable despite endogenous regressors and measurement error in the dependent variable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper demonstrates that the conditional quantile coefficient functions are nonparametrically identifiable in a triangular system using a control function to handle endogeneity, even when the dependent variable has additive measurement error. The identification result covers the quantile coefficients and other distributional parameters. The authors then construct a two-step sieve maximum likelihood estimator, estimating the control function first and then using it in a likelihood with copula weights. The estimator is consistent and asymptotically normal when the number of quantile grid knots grows at an appropriate rate, allowing for bootstrap inference. Simulations show it reduces bias markedly compared to standard quantile regression that ignores these problems.

Core claim

The conditional quantile coefficient functions, together with all other distributional parameters, are nonparametrically identifiable under a control function approach in a triangular system, despite additive measurement error in the dependent variable. This leads to a two-step sieve ML estimator that is consistent and asymptotically normal with suitable growth in quantile grid knots.

What carries the argument

Control function in the triangular system that captures endogeneity, used in a sieve likelihood maximization with copula weights for the generated control variable.

If this is right

  • The method corrects for bias in quantile estimates caused by both endogeneity and measurement error.
  • Inference can be conducted using bootstrap methods.
  • The approach applies to a wide range of econometric settings with these data issues.
  • Nonparametric identification holds for the full distribution, not just specific quantiles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extensions could include relaxing the additive measurement error assumption to multiplicative or other forms.
  • This could be applied to panel data or other structures common in empirical economics.
  • Testing the independence assumption between measurement error and control variable could be a useful robustness check.

Load-bearing premise

The data-generating process follows a triangular system in which the control function fully captures the endogeneity and the measurement error is additive and independent of the control variable conditional on the observed covariates.

What would settle it

Running the estimator on simulated data from the triangular system with known true quantile coefficients and checking if the estimates converge to the truth as the sample size grows large.

Figures

Figures reproduced from arXiv: 2605.20601 by Xuanjing Su.

Figure 1
Figure 1. Figure 1: MC Results Comparison: ε ∼ 3N and ρ = 0.5 Note [PITH_FULL_IMAGE:figures/full_fig_p021_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: MC Results: 2SSMLE ε ∼ 3N Note [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: MC Results Comparison: ε ∼ N (0, 1) and ρ = 0.5 Notes [PITH_FULL_IMAGE:figures/full_fig_p032_4.png] view at source ↗
read the original abstract

This paper studies quantile regression with an endogenous regressor and measurement error in the dependent variable. Standard quantile regression estimators ignoring these two elements can induce substantial bias. We adopt a control-function approach in a triangular system and show that the conditional quantile coefficient functions, together with all other distributional parameters, are nonparametrically identifiable. Building on this constructive identification result, we propose a two-step sieve ML estimator. The first step estimates the control function. The second step performs a sieve likelihood maximization that incorporates the generated control variable through copula weights. When the number of quantile grid knots grows at an appropriate speed, the estimator is consistent and asymptotically normal, permitting inference via bootstrap. Monte Carlo simulations demonstrate that the estimator markedly reduces bias relative to existing methods, confirming its effectiveness in settings with endogeneity and additive measurement error in the outcome.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper develops a control-function approach for quantile regression in the presence of an endogenous regressor and additive measurement error in the outcome variable. It establishes nonparametric identification of the conditional quantile coefficient functions and all distributional parameters under a triangular system, then proposes a two-step sieve maximum-likelihood estimator in which the first step recovers the control function and the second step incorporates the generated control via copula weights. Consistency and asymptotic normality are claimed when the number of quantile-grid knots grows at a suitable rate, with bootstrap inference and Monte Carlo evidence of bias reduction relative to standard quantile regression.

Significance. If the identification result and the asymptotic theory hold under the stated assumptions, the paper would provide a useful addition to the literature on quantile regression with endogeneity and measurement error. The Monte Carlo simulations supply concrete evidence of bias reduction, and the constructive identification via the triangular system is a clear strength. The practical estimator could be applied in empirical settings where both endogeneity and outcome measurement error are plausible.

major comments (3)
  1. [§2] §2 (Identification): The nonparametric identification of the conditional quantile functions rests on the assumption that the measurement error u is independent of the control function conditional on the observed covariates (u ⊥ control | X). This conditional independence is load-bearing for the triangular-system argument; if it fails due to unobserved heterogeneity that correlates the measurement error with the first-stage residual, the generated-regressor correction does not recover the target parameters and both consistency and bootstrap validity break down.
  2. [§4] §4 (Asymptotics): The statement that the estimator is asymptotically normal when the number of quantile-grid knots grows at an appropriate speed is central to the inference claim, yet the precise rate condition and the handling of the generated-regressor estimation error in the sieve likelihood are not fully detailed. Without explicit verification of these steps, it is difficult to confirm that the bootstrap remains valid under the knot-growth schedule.
  3. [§3] §3 (Estimator): The second-step sieve ML uses copula weights that embed the conditional independence assumption from the first step. Any finite-sample dependence introduced by the estimated control function could affect the likelihood maximization; the paper should clarify whether additional trimming or adjustment is required to preserve the asymptotic properties.
minor comments (2)
  1. [Abstract] The abstract refers to an 'appropriate speed' for knot growth; the exact rate condition should be stated explicitly in the main text or theorem statement for clarity.
  2. [Monte Carlo section] Monte Carlo results would benefit from reporting standard errors or confidence bands around the bias and RMSE figures to allow readers to assess the precision of the reported improvements.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [§2] §2 (Identification): The nonparametric identification of the conditional quantile functions rests on the assumption that the measurement error u is independent of the control function conditional on the observed covariates (u ⊥ control | X). This conditional independence is load-bearing for the triangular-system argument; if it fails due to unobserved heterogeneity that correlates the measurement error with the first-stage residual, the generated-regressor correction does not recover the target parameters and both consistency and bootstrap validity break down.

    Authors: We agree that the conditional independence assumption u ⊥ V | X (where V denotes the control function) is central to the nonparametric identification result in Section 2 and is stated explicitly as Assumption 3. This restriction ensures that the measurement error is independent of the endogenous component after conditioning on observables and the control. While we acknowledge that violations arising from additional unobserved heterogeneity could invalidate the generated-regressor correction, the assumption is standard in triangular control-function models and is maintained throughout the analysis. In the revision we will expand the discussion in Section 2 to provide further intuition, discuss plausible empirical settings (e.g., wage equations with reporting error), and note the consequences of potential violations. revision: partial

  2. Referee: [§4] §4 (Asymptotics): The statement that the estimator is asymptotically normal when the number of quantile-grid knots grows at an appropriate speed is central to the inference claim, yet the precise rate condition and the handling of the generated-regressor estimation error in the sieve likelihood are not fully detailed. Without explicit verification of these steps, it is difficult to confirm that the bootstrap remains valid under the knot-growth schedule.

    Authors: The referee correctly notes that the rate conditions and the propagation of first-step estimation error into the second-step sieve likelihood require more explicit treatment. Theorem 4 states consistency and asymptotic normality when the number of knots K_n satisfies K_n = o(n^{1/3}) together with standard sieve regularity conditions, but the appendix sketch is concise on bounding the generated-regressor term. We will revise Section 4 and the appendix to supply a fuller asymptotic expansion that isolates the contribution of the estimated control function, verify that this term is asymptotically negligible under the stated rate, and confirm bootstrap validity via the continuous-mapping theorem applied to the sieve estimator. revision: yes

  3. Referee: [§3] §3 (Estimator): The second-step sieve ML uses copula weights that embed the conditional independence assumption from the first step. Any finite-sample dependence introduced by the estimated control function could affect the likelihood maximization; the paper should clarify whether additional trimming or adjustment is required to preserve the asymptotic properties.

    Authors: The copula weights are constructed from the conditional distribution implied by the maintained independence assumption. Although using an estimated control function introduces finite-sample dependence, the asymptotic theory shows that this dependence vanishes at the required rate when the knot number grows appropriately. Our Monte Carlo experiments in Section 5 exhibit stable performance without extra trimming. In the revision we will add a clarifying remark in Section 3 stating that no additional trimming or adjustment is required to preserve the asymptotic properties, with the bootstrap procedure already incorporating the two-step estimation uncertainty. revision: partial

Circularity Check

0 steps flagged

No circularity: identification and estimation are derived from model assumptions and standard sieve theory

full rationale

The paper derives nonparametric identification of the conditional quantile coefficient functions from the triangular system and control-function assumptions, including additive measurement error independent of the control variable conditional on covariates. The two-step sieve ML estimator is constructed by first estimating the control function and then maximizing a sieve likelihood that incorporates it via copula weights. Consistency and asymptotic normality are established under a knot-growth rate condition using standard arguments from sieve estimation theory. No step reduces the target parameters to fitted values by construction, no self-citation is load-bearing for uniqueness or identification, and no ansatz or renaming is smuggled in. The derivation is self-contained against external benchmarks in econometric identification and nonparametric estimation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based solely on the abstract; the identification rests on the triangular system and additive measurement-error assumptions that are standard in the control-function literature but not independently verified here.

axioms (2)
  • domain assumption Triangular system structure with control function capturing endogeneity
    Invoked to achieve nonparametric identification of quantile coefficients.
  • domain assumption Additive measurement error independent of the control variable conditional on covariates
    Required for the measurement-error correction via copula weights.

pith-pipeline@v0.9.0 · 5657 in / 1397 out tokens · 35440 ms · 2026-05-21T02:42:48.706594+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

  1. [1]

    Abrevaya, J. and J. A. Hausman (1999). Semiparametric estimation with mismeasured dependent variables: an application to duration models for unemployment spells. Annales d'Economie et de Statistique\/ , 243--275

  2. [2]

    Chen, and D

    Blundell, R., X. Chen, and D. Kristensen (2007). Semi-nonparametric iv estimation of shape-invariant engel curves. Econometrica\/ 75\/ (6), 1613--1669

  3. [3]

    Blundell, R. and J. L. Powell (2007). Censored regression quantiles with endogenous regressors. Journal of Econometrics\/ 141\/ (1), 65--83

  4. [4]

    Brown, and N

    Bound, J., C. Brown, and N. Mathiowetz (2001). Measurement error in survey data. In Handbook of econometrics , Volume 5, pp.\ 3705--3843. Elsevier

  5. [5]

    Bound, J. and A. B. Krueger (1991). The extent of measurement error in longitudinal earnings data: Do two wrongs make a right? Journal of labor economics\/ 9\/ (1), 1--24

  6. [6]

    Schoenbaum, T

    Bound, J., M. Schoenbaum, T. R. Stinebrickner, and T. Waidmann (1999). The dynamic effects of health on the labor force transitions of older workers. Labour economics\/ 6\/ (2), 179--202

  7. [7]

    Harding, and J

    Burda, M., M. Harding, and J. Hausman (2008). A bayesian mixed logit--probit model for multinomial choice. Journal of econometrics\/ 147\/ (2), 232--246

  8. [8]

    Harding, and J

    Burda, M., M. Harding, and J. Hausman (2012). A poisson mixture model of discrete choice. Journal of econometrics\/ 166\/ (2), 184--203

  9. [9]

    Goodman-Bacon, and P

    Callaway, B., A. Goodman-Bacon, and P. H. Sant'Anna (2024). Difference-in-differences with a continuous treatment. Technical report, National Bureau of Economic Research

  10. [10]

    Li, and I

    Callaway, B., T. Li, and I. Murtazashvili (2021). Distributional effects with two-sided measurement error: An application to intergenerational income mobility. arXiv preprint arXiv:2107.09235\/

  11. [11]

    Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. Handbook of econometrics\/ 6 , 5549--5632

  12. [12]

    Fan, and V

    Chen, X., Y. Fan, and V. Tsyrennikov (2006). Efficient estimation of semiparametric multivariate copula models. Journal of the American Statistical Association\/ 101\/ (475), 1228--1240

  13. [13]

    Linton, and I

    Chen, X., O. Linton, and I. Van Keilegom (2003). Estimation of semiparametric models when the criterion function is not smooth. Econometrica\/ 71\/ (5), 1591--1608

  14. [14]

    Chen, X. and X. Shen (1998). Sieve extremum estimates for weakly dependent data. Econometrica\/ , 289--314

  15. [15]

    Fernandez-Val, and A

    Chernozhukov, V., I. Fernandez-Val, and A. Galichon (2009). Improving point and interval estimators of monotone functions by rearrangement. Biometrika\/ 96\/ (3), 559--575

  16. [16]

    Chernozhukov, V. and C. Hansen (2005). An iv model of quantile treatment effects. Econometrica\/ 73\/ (1), 245--261

  17. [17]

    Chesher, A. (2017). Understanding the effect of measurement error on quantile regressions. Journal of Econometrics\/ 200\/ (2), 223--237

  18. [18]

    Cosslett, S. R. (2004). Efficient semiparametric estimation of censored and truncated regressions via a smoothed self-consistency equation. Econometrica\/ 72\/ (4), 1277--1293

  19. [19]

    Doty, J. and S. Song (2023). Nonparametric identification and estimation of quantile production functions. PDF available on Google Drive. Last accessed: 2026-05-16

  20. [20]

    Hoderlein, and Y

    D’Haultf uille, X., S. Hoderlein, and Y. Sasaki (2023). Nonparametric difference-in-differences in repeated cross-sections with continuous treatments. Journal of Econometrics\/ 234\/ (2), 664--690

  21. [21]

    Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. The Annals of Statistics\/ , 1257--1272

  22. [22]

    Firpo, S., A. F. Galvao, and S. Song (2017). Measurement errors in quantile regression models. Journal of econometrics\/ 198\/ (1), 146--164

  23. [23]

    Hahn, J. and G. Ridder (2013). Asymptotic variance of semiparametric estimators with generated regressors. Econometrica\/ 81\/ (1), 315--340

  24. [24]

    Han, S. and E. J. Vytlacil (2017). Identification in a generalization of bivariate probit models with dummy endogenous regressors. Journal of Econometrics\/ 199\/ (1), 63--73

  25. [25]

    Hausman, J. (2001). Mismeasured variables in econometric analysis: problems from the right and problems from the left. Journal of Economic perspectives\/ 15\/ (4), 57--67

  26. [26]

    Hausman, J., H. Liu, Y. Luo, and C. Palmer (2021). Errors in the dependent variable of quantile regression models. Econometrica\/ 89\/ (2), 849--873

  27. [27]

    Hausman, J. A., J. Abrevaya, and F. M. Scott-Morton (1998). Misclassification of the dependent variable in a discrete-response setting. Journal of econometrics\/ 87\/ (2), 239--269

  28. [28]

    Imbens, G. W. and W. K. Newey (2009). Identification and estimation of triangular simultaneous equations models without additivity. Econometrica\/ 77\/ (5), 1481--1512

  29. [29]

    Khaled, M. A. and R. Kohn (2017). On approximating copulas by finite mixtures. arXiv preprint arXiv:1705.10440\/

  30. [30]

    Koenker, R. (2005). Quantile regression , Volume 38. Cambridge university press

  31. [31]

    Koenker, R. and G. Bassett Jr (1978). Regression quantiles. Econometrica: journal of the Econometric Society\/ , 33--50

  32. [32]

    Lee, S. (2007). Endogeneity in quantile regression models: A control function approach. Journal of Econometrics\/ 141\/ (2), 1131--1158

  33. [33]

    Lehmann, E. L. and H. J. D'Abrera (2006). Nonparametrics: statistical methods based on ranks , Volume 464. Springer New York

  34. [34]

    Meyer, B. D., W. K. Mok, and J. X. Sullivan (2009). The under-reporting of transfers in household surveys: Its nature and consequences. Technical report, National Bureau of Economic Research

  35. [35]

    Newey, W. K. (1997). Convergence rates and asymptotic normality for series estimators. Journal of econometrics\/ 79\/ (1), 147--168

  36. [36]

    Newey, W. K. and D. McFadden (1994). Large sample estimation and hypothesis testing. Handbook of econometrics\/ 4 , 2111--2245

  37. [37]

    Newey, W. K., J. L. Powell, and F. Vella (1999). Nonparametric estimation of triangular simultaneous equations models. Econometrica\/ 67\/ (3), 565--603

  38. [38]

    Petrin, A. and K. Train (2010). A control function approach to endogeneity in consumer choice models. Journal of marketing research\/ 47\/ (1), 3--13

  39. [39]

    Pollard, D. (1989). Asymptotics via empirical processes. Statistical science\/ , 341--354

  40. [40]

    Qu, L. and Y. Lu (2021). Copula density estimation by finite mixture of parametric copula densities. Communications in statistics-simulation and computation\/ 50\/ (11), 3315--3337

  41. [41]

    Schennach, S. M. (2008). Quantile regression with mismeasured covariates. Econometric Theory\/ 24\/ (4), 1010--1043

  42. [42]

    Song, S. (2026). Identification and estimation of nonseparable triangular models with measurement error. PDF available on Google Drive. Last accessed: 2026-05-16

  43. [43]

    Wei, Y. and R. J. Carroll (2009). Quantile regression with measurement error. Journal of the American Statistical Association\/ 104\/ (487), 1129--1143