pith. sign in

arxiv: 2605.26515 · v1 · pith:5ZP52B7Gnew · submitted 2026-05-26 · 📊 stat.ME

Learning a directed acyclic graph with additive heteroscedastic errors

Pith reviewed 2026-06-29 16:13 UTC · model grok-4.3

classification 📊 stat.ME
keywords causal discoverydirected acyclic graphheteroscedastic errorsquantile regressionstructural equation modelidentifiabilitytopological order
0
0 comments X

The pith

Heteroscedastic errors identify DAG directions via quantile-invariant scales

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes identifiability results for location-scale noise models on directed acyclic graphs, showing that heteroscedasticity supplies information to recover causal directions. It introduces the RESQUE procedure, an iterative method that constructs residuals and applies composite quantile regression to exploit the invariance of conditional scale coefficients across quantiles and thereby locate sink nodes recursively. A sympathetic reader would care because standard causal discovery often relies solely on mean relationships while ignoring structured variance signals that can resolve edge directions. The procedure carries consistency guarantees that continue to hold when the number of variables grows with the sample size. Simulations indicate stronger performance precisely when causal information resides in the variance component.

Core claim

Under a structural equation model with additive heteroscedastic errors, the conditional scale coefficients remain invariant across quantiles. This invariance permits the RESQUE procedure to identify sink nodes iteratively via residual construction and composite quantile regression, recovering both the topological order and the full graph structure, with theoretical consistency even when the number of variables diverges with the sample size.

What carries the argument

The invariance of conditional scale coefficients across quantiles in the location-scale noise model, used by the RESQUE iterative procedure to recursively identify sink nodes.

Load-bearing premise

Conditional scale coefficients remain unchanged regardless of the quantile level examined.

What would settle it

Generate data from a known DAG under additive heteroscedastic errors but with scale coefficients that deliberately vary across quantiles; the procedure should then fail to recover the correct topological order.

Figures

Figures reproduced from arXiv: 2605.26515 by Chunlin Li, Li Chen, Xintao Xia, Yue Hu.

Figure 1
Figure 1. Figure 1: Illustration of the topological-layer decomposition of a DAG. [PITH_FULL_IMAGE:figures/full_fig_p017_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Empirical performance of CAM, rank-PC, TL, NOTEARS, and the proposed [PITH_FULL_IMAGE:figures/full_fig_p023_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Empirical performance of CAM, rank-PC, TL, NOTEARS, and RESQUE in [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Consensus graph and estimated graph by RESQUE using the Sachs dataset. [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗
read the original abstract

This paper studies causal discovery for a directed acyclic graph under a structural equation model with additive heteroscedastic errors. We first establish new identifiability results for location-scale noise models, showing that heteroscedasticity can be leveraged to recover causal directions. Based on these insights, we propose a novel iterative procedure, Residual Simultaneous Quantile Estimation (RESQUE), where each iteration consists of a residual-construction stage and a composite quantile regression stage, enabling recursive identification of sink nodes via the invariance of conditional scale coefficients across quantiles. We then establish its theoretical guarantees for recovering topological order and graph structure, even when the number of variables diverges with the sample size. Simulation studies and application to benchmark datasets show that RESQUE performs favorably compared with existing methods, especially when causal information is partly encoded in the variance component. These results highlight exploiting structured variance signals for causal discovery and provide a principled framework for multivariate causal discovery beyond mean-based modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper studies causal discovery for DAGs under a structural equation model with additive heteroscedastic errors. It establishes new identifiability results showing that heteroscedasticity can be leveraged to recover causal directions via quantile-invariant conditional scale coefficients. It proposes the RESQUE iterative procedure (residual construction followed by composite quantile regression) for recursive sink-node identification, proves theoretical guarantees for topological order and graph recovery even when p diverges with n, and reports favorable simulation and benchmark performance relative to existing methods when variance components carry causal information.

Significance. If the identifiability and high-dimensional consistency results hold, the work provides a principled extension of causal discovery beyond mean-based modeling by exploiting structured variance signals. This is potentially valuable in domains where heteroscedasticity encodes directional information, and the allowance for diverging p broadens applicability.

major comments (2)
  1. [Abstract] Abstract (identifiability paragraph): the central claim that 'heteroscedasticity can be leveraged to recover causal directions' rests on the location-scale model with quantile-invariant conditional scales; without the explicit theorem statement, assumptions, and proof, it is impossible to verify whether the invariance holds generically or only under additional restrictions that may not be stated.
  2. [Abstract] Abstract (theoretical guarantees paragraph): the claim of 'theoretical guarantees for recovering topological order and graph structure, even when the number of variables diverges with the sample size' is load-bearing for the paper's contribution; the provided text gives no derivation, rate conditions, or proof sketch, leaving the soundness of the high-dimensional result unverified.
minor comments (1)
  1. The abstract mentions simulation studies and benchmark applications but provides no details on data exclusion rules, simulation design, or performance metrics; these should be expanded for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their comments. We address the two major comments on the abstract below by directing to the corresponding formal results in the manuscript. The abstract is intended as a concise summary; the full statements, assumptions, and proofs appear in the body of the paper.

read point-by-point responses
  1. Referee: [Abstract] Abstract (identifiability paragraph): the central claim that 'heteroscedasticity can be leveraged to recover causal directions' rests on the location-scale model with quantile-invariant conditional scales; without the explicit theorem statement, assumptions, and proof, it is impossible to verify whether the invariance holds generically or only under additional restrictions that may not be stated.

    Authors: The identifiability result is stated precisely as Theorem 3.1 in Section 3. Under the location-scale structural equation model and assumptions (A1)–(A3), the theorem establishes that the conditional scale coefficients are invariant across quantiles if and only if the corresponding edge is absent. The proof in Appendix A shows that the invariance property holds under these model assumptions without further restrictions. The abstract condenses this result; the explicit statement, assumptions, and proof are provided in the main text. revision: no

  2. Referee: [Abstract] Abstract (theoretical guarantees paragraph): the claim of 'theoretical guarantees for recovering topological order and graph structure, even when the number of variables diverges with the sample size' is load-bearing for the paper's contribution; the provided text gives no derivation, rate conditions, or proof sketch, leaving the soundness of the high-dimensional result unverified.

    Authors: The high-dimensional consistency results appear as Theorem 4.2 and Corollary 4.3 in Section 4. These establish recovery of the topological order and graph structure when p diverges with n, subject to explicit rate conditions (p = o(n^{1/3}) under sub-Gaussian tails). A proof sketch is given in the main text of Section 4, with the complete derivation in Appendix B. The abstract summarizes these guarantees; the rate conditions and proofs are contained in the manuscript. revision: no

Circularity Check

0 steps flagged

No significant circularity

full rationale

The abstract and context describe identifiability results derived from the location-scale structural equation model assumptions (additive heteroscedastic errors with quantile-invariant conditional scales) and the RESQUE iterative procedure for sink-node identification. No equations, proofs, or self-citations are supplied that reduce any claimed prediction or first-principles result to fitted inputs by construction. The theoretical guarantees for topological order recovery (even with diverging p) are presented as following from the model properties rather than from renaming or self-referential fitting. This is the expected self-contained case.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Ledger constructed from abstract only; full paper details on parameters and assumptions unavailable.

axioms (1)
  • domain assumption Data generated from structural equation model with additive heteroscedastic errors
    Explicitly stated as the model class for which new identifiability results are derived.

pith-pipeline@v0.9.1-grok · 5689 in / 1281 out tokens · 53954 ms · 2026-06-29T16:13:24.459922+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references · 1 canonical work pages · 1 internal anchor

  1. [1]

    Adamczak, A

    R. Adamczak, A. E. Litvak, A. Pajor, and N. Tomczak-Jaegermann. Restricted isom- etry property of matrices with independent columns and neighborly polytopes by ran- dom sampling.Constructive Approximation, 34:61–88, 2011

  2. [2]

    Bello, B

    K. Bello, B. Aragam, and P. Ravikumar. DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization. InAdvances in Neural Information Processing Systems, 2022

  3. [3]

    Belloni and V

    A. Belloni and V. Chernozhukov. l1-penalized quantile regression in high-dimensional sparse models.The Annals of Statistics, 39(1):82 – 130, 2011

  4. [4]

    Inference for High-Dimensional Sparse Econometric Models

    A. Belloni, V. Chernozhukov, and C. Hansen. Inference for high-dimensional sparse econometric models.arXiv preprint arXiv:1201.0220, 2011

  5. [5]

    Belloni, V

    A. Belloni, V. Chernozhukov, and K. Kato. Valid post-selection inference in high- dimensional approximately sparse quantile regression models.Journal of the American Statistical Association, 114(526):749–758, 2019

  6. [6]

    P. M. Bentler. Causal modeling via structural equation systems. InHandbook of multivariate experimental psychology, pages 317–335. Springer, 1988

  7. [7]

    Berrevoets, J

    J. Berrevoets, J. Raymaekers, M. Van der Schaar, T. Verdonck, and R. Yao. Dif- ferentiable causal structure learning with identifiability by notime. InProceedings of machine learning research, volume 258, pages 3115–3123. PMLR, 2025. 28

  8. [8]

    P. J. Bickel, Y. Ritov, and A. B. Tsybakov. Simultaneous analysis of lasso and dantzig selector.The Annals of Statistics, 37(4):1705–1732, 2009

  9. [9]

    Bl¨ obaum, D

    P. Bl¨ obaum, D. Janzing, T. Washio, S. Shimizu, and B. Sch¨ olkopf. Cause-effect infer- ence by comparing regression errors. InInternational Conference on Artificial Intelli- gence and Statistics, pages 900–909. PMLR, 2018

  10. [10]

    B¨ uhlmann, J

    P. B¨ uhlmann, J. Peters, and J. Ernest. CAM: Causal additive models, high- dimensional order search and penalized regression.The Annals of Statistics, 42(6):2526 – 2556, 2014

  11. [11]

    Chang, Z

    T.-H. Chang, Z. Guo, and D. Malinsky. Post-selection inference for causal effects after causal discovery.Biometrika, 113(1):asaf073, 2026

  12. [12]

    D. M. Chickering. Optimal structure identification with greedy search.Journal of machine learning research, 3(Nov):507–554, 2002

  13. [13]

    Fan and R

    J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties.Journal of the American statistical Association, 96(456):1348–1360, 2001

  14. [14]

    J. Fan, L. Xue, and H. Zou. Strong oracle optimality of folded concave penalized estimation.Annals of statistics, 42(3):819, 2014

  15. [15]

    Friedman, T

    J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso.Biostatistics, 9(3):432–441, 2008

  16. [16]

    M. Gao, Y. Ding, and B. Aragam. A polynomial-time algorithm for learning nonpara- metric causal graphs.Advances in Neural Information Processing Systems, 33:11599– 11611, 2020

  17. [17]

    Glymour, K

    C. Glymour, K. Zhang, and P. Spirtes. Review of causal discovery methods based on graphical models.Frontiers in genetics, 10:524, 2019

  18. [18]

    Gradu, T

    P. Gradu, T. Zrnic, Y. Wang, and M. I. Jordan. Valid inference after causal discovery. Journal of the American Statistical Association, 120(550):1127–1138, 2025

  19. [19]

    Harris and M

    N. Harris and M. Drton. PC algorithm for nonparanormal graphical models.Journal of Machine Learning Research, 14(11), 2013

  20. [20]

    X. He, X. Pan, K. M. Tan, and W.-X. Zhou. Smoothed quantile regression with large-scale inference.Journal of Econometrics, 2021

  21. [21]

    Heinze-Deml, M

    C. Heinze-Deml, M. H. Maathuis, and N. Meinshausen. Causal structure learning. Annual Review of Statistics and Its Application, 5:371–391, 2018

  22. [22]

    Hoyer, D

    P. Hoyer, D. Janzing, J. Mooij, J. Peters, and B. Sch¨ olkopf. Nonlinear causal dis- covery with additive noise models. InTwenty-Second Annual Conference on Neural Information Processing Systems (NIPS 2008), pages 689–696. Curran, 2009

  23. [23]

    Immer, C

    A. Immer, C. Schultheiss, J. E. Vogt, B. Sch¨ olkopf, P. B¨ uhlmann, and A. Marx. On the identifiability and estimation of causal location-scale noise models. InInternational Conference on Machine Learning, pages 14316–14332. PMLR, 2023. 29

  24. [24]

    Kalisch and P

    M. Kalisch and P. B¨ uhlman. Estimating high-dimensional directed acyclic graphs with the pc-algorithm.Journal of Machine Learning Research, 8(3), 2007

  25. [25]

    Koenker and G

    R. Koenker and G. Bassett, Jr. Regression quantiles.Econometrica: journal of the Econometric Society, pages 33–50, 1978

  26. [26]

    C. Li, X. Shen, and W. Pan. Likelihood ratio tests for a large directed acyclic graph. Journal of the American Statistical Association, 2020

  27. [27]

    C. Li, X. Shen, and W. Pan. Nonlinear causal discovery with confounders.Journal of the American Statistical Association, pages 1–10, 2023

  28. [28]

    Y. Li, A. Torralba, A. Anandkumar, D. Fox, and A. Garg. Causal discovery in physical systems from videos.Advances in Neural Information Processing Systems, 33:9180– 9192, 2020

  29. [29]

    Li and J

    Y. Li and J. Zhu. L 1-norm quantile regression.Journal of Computational and Graph- ical Statistics, 17(1):163–185, 2008

  30. [30]

    Y. Lin, Y. Huang, W. Liu, H. Deng, I. Ng, K. Zhang, M. Gong, Y. Ma, and B. Huang. A skewness-based criterion for addressing heteroscedastic noise in causal discovery. InInternational Conference on Learning Representations, volume 2025, pages 89283– 89310, 2025

  31. [31]

    M. H. Maathuis, D. Colombo, M. Kalisch, and P. B¨ uhlmann. Predicting causal effects in large-scale systems from observational data.Nature methods, 7(4):247–248, 2010

  32. [32]

    Meinshausen and B

    N. Meinshausen and B. Yu. Lasso-type recovery of sparse representations for high- dimensional data.The Annals of Statistics, 37(1):246–270, 2009

  33. [33]

    Mendelson, A

    S. Mendelson, A. Pajor, and N. Tomczak-Jaegermann. Uniform uncertainty principle for bernoulli and subgaussian ensembles.Constructive Approximation, 28:277–289, 2008

  34. [34]

    J. M. Mooij and T. Heskes. Cyclic causal discovery from continuous equilibrium data. InProceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, pages 431–439, 2013

  35. [35]

    J. M. Mooij, J. Peters, D. Janzing, J. Zscheischler, and B. Sch¨ olkopf. Distinguishing cause from effect using observational data: methods and benchmarks.Journal of Machine Learning Research, 17(32):1–102, 2016

  36. [36]

    G. Park. Identifiability of additive noise models using conditional variances.Journal of Machine Learning Research, 21(75):1–34, 2020

  37. [37]

    Pearl.Causality

    J. Pearl.Causality. Cambridge university press, 2009

  38. [38]

    Peters and P

    J. Peters and P. B¨ uhlmann. Identifiability of gaussian structural equation models with equal error variances.Biometrika, 101(1):219–228, 2014. 30

  39. [39]

    Peters, D

    J. Peters, D. Janzing, and B. Sch¨ olkopf. Identifying cause and effect on discrete data using additive noise models. InProceedings of the thirteenth international conference on artificial intelligence and statistics, pages 597–604. JMLR Workshop and Confer- ence Proceedings, 2010

  40. [40]

    Peters, J

    J. Peters, J. M. Mooij, D. Janzing, and B. Sch¨ olkopf. Causal discovery with continuous additive noise models.Journal of Machine Learning Research, 15:2009–2053, 2014

  41. [41]

    Y. Qiu, J. Tao, and X.-H. Zhou. Inference of heterogeneous treatment effects using observational data with high-dimensional covariates.Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(5):1016–1043, 2021

  42. [42]

    Raskutti, M

    G. Raskutti, M. J. Wainwright, and B. Yu. Restricted eigenvalue properties for cor- related gaussian designs.The Journal of Machine Learning Research, 11:2241–2259, 2010

  43. [43]

    Sachs, O

    K. Sachs, O. Perez, D. Pe’er, D. A. Lauffenburger, and G. P. Nolan. Causal protein-signaling networks derived from multiparameter single-cell data.Science, 308(5721):523–529, 2005

  44. [44]

    Sch¨ olkopf, F

    B. Sch¨ olkopf, F. Locatello, S. Bauer, N. R. Ke, N. Kalchbrenner, A. Goyal, and Y. Ben- gio. Toward causal representation learning.Proceedings of the IEEE, 109(5):612–634, 2021

  45. [45]

    Schultheiss and P

    C. Schultheiss and P. B¨ uhlmann. Ancestor regression in linear structural equation models.Biometrika, 110(4):1117–1124, 2023

  46. [46]

    Schultheiss and P

    C. Schultheiss and P. B¨ uhlmann. On the pitfalls of gaussian likelihood scoring for causal discovery.Journal of Causal Inference, 11(1):20220068, 2023

  47. [47]

    Shimizu, P

    S. Shimizu, P. O. Hoyer, A. Hyv¨ arinen, A. Kerminen, and M. Jordan. A linear non- Gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7(10), 2006

  48. [48]

    P. Spirtes. An anytime algorithm for causal inference. InInternational Workshop on Artificial Intelligence and Statistics, pages 278–285. PMLR, 2001

  49. [49]

    Spirtes and C

    P. Spirtes and C. Glymour. An algorithm for fast recovery of sparse causal graphs. Social science computer review, 9(1):62–72, 1991

  50. [50]

    Spirtes, C

    P. Spirtes, C. N. Glymour, and R. Scheines.Causation, prediction, and search. MIT press, 2000

  51. [51]

    E. V. Strobl and T. A. Lasko. Identifying patient-specific root causes with the het- eroscedastic noise model.Journal of Computational Science, 72:102099, 2023

  52. [52]

    Sun and O

    X. Sun and O. Schulte. Cause-effect inference in location-scale noise models: Maxi- mum likelihood vs. independence testing.Advances in Neural Information Processing Systems, 36:5447–5483, 2023. 31

  53. [53]

    K. M. Tan, L. Wang, and W.-X. Zhou. High-dimensional quantile regression: Convo- lution smoothing and concave regularization.Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(1):205–233, 2022

  54. [54]

    Q.-D. Tran, B. Duong, P. Nguyen, and T. Nguyen. Robust estimation of causal het- eroscedastic noise models. InProceedings of the 2024 SIAM International Conference on Data Mining (SDM), pages 788–796. SIAM, 2024

  55. [55]

    Tsamardinos, L

    I. Tsamardinos, L. E. Brown, and C. F. Aliferis. The max-min hill-climbing bayesian network structure learning algorithm.Machine learning, 65(1):31–78, 2006

  56. [56]

    M. J. Vowels, N. C. Camgoz, and R. Bowden. D’ya like DAGs? a survey on structure learning and causal discovery.ACM Computing Surveys, 55(4):1–36, 2022

  57. [57]

    Y. S. Wang, M. Kolar, and M. Drton. Confidence sets for causal orderings.Journal of the American Statistical Association, pages 1–14, 2025

  58. [58]

    H. Wold. Causality and econometrics.Econometrica: Journal of the Econometric Society, pages 162–177, 1954

  59. [59]

    S. Xu, O. A. Mian, A. Marx, and J. Vreeken. Inferring cause and effect in the presence of heteroscedastic noise. InInternational Conference on Machine Learning, pages 24615–24630. PMLR, 2022

  60. [60]

    Y. Yang, S. Bom, and X. Shen. A hierarchical ensemble causal structure learning approach for wafer manufacturing.Journal of Intelligent Manufacturing, 35(6):2961– 2978, 2024

  61. [61]

    Ye and C.-H

    F. Ye and C.-H. Zhang. Rate minimaxity of the lasso and dantzig selector for the lq loss in lr balls.The Journal of Machine Learning Research, 11:3519–3540, 2010

  62. [62]

    N. Yin, T. Gao, Y. Yu, and Q. Ji. Effective causal discovery under identifiable het- eroscedastic noise model. InProceedings of the AAAI Conference on Artificial Intel- ligence, volume 38, pages 16486–16494, 2024

  63. [63]

    Y. Yuan, X. Shen, W. Pan, and Z. Wang. Constrained likelihood for reconstructing a directed acyclic gaussian graph.Biometrika, 106(1):109–125, 2019

  64. [64]

    C.-H. Zhang. Nearly unbiased variable selection under minimax concave penalty.An- nals of statistics, 38(2):894–942, 2010

  65. [65]

    Zhang and A

    K. Zhang and A. Hyv¨ arinen. On the identifiability of the post-nonlinear causal model. InProceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pages 647–655, 2009

  66. [66]

    Zhang and A

    K. Zhang and A. Hyv¨ arinen. Distinguishing causes from effects using nonlinear acyclic causal models. InCausality: Objectives and Assessment, pages 157–164. PMLR, 2010

  67. [67]

    Zhang, Y

    T. Zhang, Y. Zhang, and T. Zhou. Statistical insights into HSIC in high dimensions. Advances in Neural Information Processing Systems, 36:19145–19156, 2023. 32

  68. [68]

    Zhao and B

    P. Zhao and B. Yu. On model selection consistency of lasso.The Journal of Machine Learning Research, 7:2541–2563, 2006

  69. [69]

    R. Zhao, X. He, and J. Wang. Learning linear non-gaussian directed acyclic graph with diverging number of nodes.The Journal of Machine Learning Research, 23(1):12314– 12347, 2022

  70. [70]

    Zheng, B

    X. Zheng, B. Aragam, P. K. Ravikumar, and E. P. Xing. Dags with no tears: Contin- uous optimization for structure learning.Advances in neural information processing systems, 31, 2018

  71. [71]

    Zhou and H

    L. Zhou and H. Zou. Cross-fitted residual regression for high-dimensional heteroscedas- ticity pursuit.Journal of the American Statistical Association, 118(542):1056–1065, 2023

  72. [72]

    Zou and M

    H. Zou and M. Yuan. Composite quantile regression and the oracle model selection theory.Annals of Statistics, 36(3):1108–1126, 2008. 33