Concomitant DAG Learning: On the Roles of Noise Adaptivity, Sparsity, and Non-negativity

Gonzalo Mateos; Hamed Ajorlou; Mariano Tepper; Samuel Rey

arxiv: 2605.23537 · v1 · pith:OHASSWD7new · submitted 2026-05-22 · 📊 stat.ML · eess.SP

Concomitant DAG Learning: On the Roles of Noise Adaptivity, Sparsity, and Non-negativity

Gonzalo Mateos , Samuel Rey , Hamed Ajorlou , Mariano Tepper This is my paper

Pith reviewed 2026-05-25 03:20 UTC · model grok-4.3

classification 📊 stat.ML eess.SP

keywords DAG learningcausal discoveryscore-based methodsheteroscedasticitynoise adaptivitystructural equation modelsgraph learningsparsity

0 comments

The pith

Concomitant DAG estimation jointly infers sparse causal structure and exogenous noise levels for robustness under heteroscedasticity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The tutorial surveys score-based approaches to recovering directed acyclic graphs from observational data, tracing their development from combinatorial searches to continuous optimization over adjacency matrices. It focuses on concomitant estimation methods that learn the graph and noise statistics at the same time rather than in separate stages. This joint inference makes the resulting structure estimates adaptive to different noise variances across variables. A sympathetic reader would care because many real datasets exhibit heterogeneous noise or distribution shifts that break non-adaptive estimators.

Core claim

Concomitant DAG estimation methods jointly infer sparse causal structure and exogenous noise levels, improving robustness under heteroscedasticity and distribution shifts by rendering the estimator noise adaptive. The tutorial presents this after a didactic introduction to structural equation models and a historical overview of score-based DAG recovery, then outlines opportunities at the intersection of causal inference, high-dimensional statistics, and scalable graph learning.

What carries the argument

Concomitant DAG estimation, which simultaneously optimizes a score over graph adjacency matrices and exogenous noise variances.

If this is right

The learned graphs remain stable when noise levels differ across nodes or when test data comes from a shifted distribution.
Sparsity and non-negativity constraints become jointly enforceable with the noise parameters inside a single continuous optimization.
The same framework supports extensions to online learning by updating both structure and noise estimates incrementally.
Identifiability holds under milder conditions once noise adaptivity is built into the estimator.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

In signal-processing pipelines the joint formulation may allow real-time recalibration of causal models without re-running a separate variance estimation step.
The approach could be tested on mildly nonlinear data by replacing the linear structural equations with small neural modules while keeping the concomitant noise term.
High-dimensional applications might see reduced sensitivity to hyperparameter choice because noise statistics are learned rather than tuned externally.

Load-bearing premise

The observed variables arise from linear or mildly nonlinear structural equation models with additive exogenous noise whose statistics can be estimated jointly without creating new identifiability problems.

What would settle it

Performance comparison on synthetic data where noise variances are drawn from a heavy-tailed distribution or where the additive-noise assumption is deliberately violated would show whether the joint estimator loses its reported advantage over separate structure-only methods.

Figures

Figures reproduced from arXiv: 2605.23537 by Gonzalo Mateos, Hamed Ajorlou, Mariano Tepper, Samuel Rey.

**Figure 2.** Figure 2: Mean DAG recovery performance, plus/minus one standard deviation, under heteroscedastic noise for both ER4 (top [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 3.** Figure 3: Tracking performance of mini-batch stochastic gradient descent relative to the full-data CoLiDE-EV algorithm. The left [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: Recovery of ER4 DAGs with non-negative weights: (a)–(b) use [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗

read the original abstract

Directed acyclic graphs (DAGs) constitute a central modeling tool to enable principled reasoning about cause-effect interactions in complex systems. However, since the causal structure underlying a group of variables is often unknown and interventions may be infeasible or ethically challenging to implement, there is a need to address the task of inferring DAGs from observational data. However, most classical structure identification approaches face two key obstacles: the combinatorial challenge of enforcing acyclicity, which severely limits scalability, and identifiability challenges arising from latent confounding or heterogeneous noise. This tutorial offers an overview of recent signal processing and optimization advances that address these issues by recasting DAG structure learning as a continuous, score-based estimation problem over adjacency matrices. We begin with a didactic introduction to structural equation models and the formulation of causal graph recovery, followed by a historical survey of score-based methods ranging from early combinatorial search schemes and greedy heuristics to modern continuous frameworks that leverage smooth characterizations of acyclicity. Building on this foundation, we describe concomitant DAG estimation methods that jointly infer sparse causal structure and exogenous noise levels, improving robustness under heteroscedasticity and distribution shifts by rendering the estimator noise adaptive. All in all, the tutorial introduces readers to challenges and opportunities for signal processing research at the crossroads of causal inference, high-dimensional statistics, and scalable graph learning, while outlining emerging directions including online, nonlinear, and neural causal discovery.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A clear tutorial that organizes score-based DAG methods around concomitant noise estimation but adds no new results or experiments.

read the letter

This paper is a tutorial surveying continuous score-based approaches to learning DAGs, with the main thread being concomitant estimation that jointly recovers the graph and the noise variances. The authors walk through structural equation models, the shift from combinatorial search to smooth acyclicity penalties, and then show how tying the noise parameters into the optimization can make the procedure more robust when noise is heteroscedastic or the distribution shifts. That framing is the useful part: it collects scattered ideas from the last few years into one narrative and points out why noise adaptivity matters for real data. The historical survey is straightforward and the explanations of the continuous relaxations are accessible without heavy notation. For someone new to the area, the paper gives a readable entry point and flags open directions like nonlinear or online variants. The limitation is that nothing here is original. There are no new derivations, no fresh experiments, and no resolution of open identifiability questions; every claim traces back to the cited papers. The robustness story therefore stands or falls on the quality of those earlier works rather than anything demonstrated in this manuscript. The standard linear or mildly nonlinear additive-noise assumptions are taken as given without additional scrutiny. This is the sort of document that helps a reading group get oriented, but it does not move the technical frontier. I would bring it to a reading group as background reading. I would not cite it in my own papers. If the submission is positioned strictly as a tutorial for a venue that publishes surveys, it could go to review; otherwise the absence of new technical content makes it a weak candidate for a methods journal.

Referee Report

0 major / 2 minor

Summary. This tutorial surveys score-based methods for learning DAGs from observational data. It introduces structural equation models and causal graph recovery, reviews the progression from combinatorial search and greedy heuristics to continuous optimization frameworks that use smooth acyclicity characterizations, and describes concomitant estimation approaches that jointly infer sparse structures and exogenous noise levels to achieve noise adaptivity and robustness under heteroscedasticity and distribution shifts. The paper positions these advances at the intersection of causal inference, high-dimensional statistics, and scalable graph learning, while outlining future directions such as online, nonlinear, and neural causal discovery.

Significance. As a tutorial rather than a source of new theorems or experiments, the manuscript's value lies in its synthesis of recent signal-processing and optimization literature on continuous DAG learning. If the survey is accurate and balanced, it could usefully orient researchers to noise-adaptive concomitant estimators and their claimed robustness benefits under standard linear or mildly nonlinear SEM assumptions with additive noise.

minor comments (2)

The abstract and introduction refer to 'concomitant DAG estimation methods' without an early, explicit definition or pointer to the specific section where the joint optimization objective is first written down; adding a short definitional paragraph or equation reference in §2 would improve readability for readers new to the topic.
Several historical citations (early combinatorial schemes, greedy heuristics) are mentioned in the survey section but lack explicit reference numbers in the provided abstract; ensuring each named method is paired with its citation in the full text would strengthen the tutorial's utility as a reference.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their thorough summary of the manuscript and for recommending acceptance. The report accurately reflects the tutorial's focus on score-based DAG learning, continuous optimization frameworks, and concomitant estimation for noise adaptivity.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The document is a tutorial survey recasting DAG learning as continuous score-based optimization and describing concomitant estimation of structure plus noise levels. No new derivation chain is presented that reduces a claimed prediction or uniqueness result to a fitted parameter, self-citation, or ansatz by construction. All central claims rest on standard linear SEM assumptions with additive noise and on externally cited prior literature; the text itself contains no load-bearing steps that equate outputs to inputs via the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Because this is an abstract-only review of a tutorial, no specific free parameters, axioms, or invented entities can be extracted from new derivations; the work relies on standard assumptions from the structural equation model literature.

pith-pipeline@v0.9.0 · 5792 in / 1053 out tokens · 16627 ms · 2026-05-25T03:20:19.350378+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 1 internal anchor

[1]

DAGMA: Learning DAGs via M-matrices and a log-determinant acyclicity characterization,

K. Bello, B. Aragam, and P. Ravikumar, “DAGMA: Learning DAGs via M-matrices and a log-determinant acyclicity characterization,” inProc. Adv. Neural. Inf. Process. Syst., vol. 35, 2022, pp. 8226–8239

work page 2022
[2]

Square-root lasso: pivotal recovery of sparse signals via conic programming,

A. Belloni, V . Chernozhukov, and L. Wang, “Square-root lasso: pivotal recovery of sparse signals via conic programming,” Biometrika, vol. 98, no. 4, pp. 791–806, 2011

work page 2011
[3]

Simultaneous analysis of Lasso and Dantzig selector,

P. J. Bickel, Y . Ritov, and A. B. Tsybakov, “Simultaneous analysis of Lasso and Dantzig selector,”Ann. Statist., vol. 37, pp. 1705–1732, 2009

work page 2009
[4]

Differentiable causal discovery from interventional data,

P. Brouillard, S. Lachapelle, A. Lacoste, S. Lacoste-Julien, and A. Drouin, “Differentiable causal discovery from interventional data,” inProc. Adv. Neural. Inf. Process. Syst., vol. 33, 2020, pp. 21 865–21 877. May 25, 2026 DRAFT IEEE SIGNAL PROCESSING MAGAZINE, VOL. XX, NO. XX, MAY 2026 23

work page 2020
[5]

Differentiable DAG sampling,

B. Charpentier, S. Kibler, and S. G ¨unnemann, “Differentiable DAG sampling,” inProc. Int. Conf. Learn. Representations, 2022

work page 2022
[6]

Large-sample learning of Bayesian networks is NP-hard,

D. M. Chickering, D. Heckerman, and C. Meek, “Large-sample learning of Bayesian networks is NP-hard,”J. Mach. Learn. Res., vol. 5, 2004

work page 2004
[7]

Optimal structure identification with greedy search,

D. M. Chickering, “Optimal structure identification with greedy search,”J. Mach. Learn. Res., vol. 3, no. Nov, pp. 507–554, 2002

work page 2002
[8]

BCD Nets: Scalable variational approaches for Bayesian causal discovery,

C. Cundy, A. Grover, and S. Ermon, “BCD Nets: Scalable variational approaches for Bayesian causal discovery,” inProc. Adv. Neural. Inf. Process. Syst., vol. 34, 2021, pp. 7095–7110

work page 2021
[9]

Global optimality in bivariate gradient-based DAG learning,

C. Deng, K. Bello, P. K. Ravikumar, and B. Aragam, “Global optimality in bivariate gradient-based DAG learning,” in Proc. Adv. Neural. Inf. Process. Syst., 2023, pp. 17 929–17 968

work page 2023
[10]

Optimizing NOTEARS objectives via topological swaps,

C. Deng, K. Bello, B. Aragam, and P. K. Ravikumar, “Optimizing NOTEARS objectives via topological swaps,” inProc. Int. Conf. Mach. Learn., 2023, pp. 7563–7595

work page 2023
[11]

Characterizing distribution equivalence and structure learning for cyclic and acyclic directed graphs,

A. Ghassami, A. Yang, N. Kiyavash, and K. Zhang, “Characterizing distribution equivalence and structure learning for cyclic and acyclic directed graphs,” inProc. Int. Conf. Mach. Learn., 2020, pp. 3494–3504

work page 2020
[12]

Topology identification and learning over graphs: Accounting for nonlinearities and dynamics,

G. B. Giannakis, Y . Shen, and G. V . Karanikolas, “Topology identification and learning over graphs: Accounting for nonlinearities and dynamics,”Proc. IEEE, vol. 106, no. 5, pp. 787–807, 2018

work page 2018
[13]

P. J. Huber,Robust Statistics. New York: John Wiley & Sons Inc., 1981

work page 1981
[14]

On fast convergence of proximal algorithms for sqrt-lasso optimization: Don’t worry about its nonsmooth loss function,

X. Li, H. Jiang, J. Haupt, R. Arora, H. Liu, M. Hong, and T. Zhao, “On fast convergence of proximal algorithms for sqrt-lasso optimization: Don’t worry about its nonsmooth loss function,” inProc. Conf. Uncertainty Artif. Intell., 2020, pp. 49–59

work page 2020
[15]

High-dimensional learning of linear causal networks via inverse covariance estimation,

P.-L. Loh and P. B ¨uhlmann, “High-dimensional learning of linear causal networks via inverse covariance estimation,”J. Mach. Learn. Res., vol. 15, no. 1, pp. 3065–3105, 2014

work page 2014
[16]

Meta-DAG: Meta causal discovery via bilevel optimization,

S. Lu and T. Gao, “Meta-DAG: Meta causal discovery via bilevel optimization,” inProc. IEEE Intl. Conf. Acoustics, Speech Signal Process., 2023, pp. 1–5

work page 2023
[17]

Bayesian networks in biomedicine and health-care,

P. J. Lucas, L. C. Van der Gaag, and A. Abu-Hanna, “Bayesian networks in biomedicine and health-care,”Artif. Intell. Med., vol. 30, pp. 201–214, 2004

work page 2004
[18]

Online learning for matrix factorization and sparse coding,

J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online learning for matrix factorization and sparse coding,”J. Mach. Learn. Res., vol. 11, no. 1, 2010

work page 2010
[19]

Generalized concomitant multi-task lasso for sparse multimodal regression,

M. Massias, O. Fercoq, A. Gramfort, and J. Salmon, “Generalized concomitant multi-task lasso for sparse multimodal regression,” inProc. Int. Conf. Artif. Intell. Statist., 2018, pp. 998–1007

work page 2018
[20]

Connecting the dots: Identifying network structure via graph signal processing,

G. Mateos, S. Segarra, A. G. Marques, and A. Ribeiro, “Connecting the dots: Identifying network structure via graph signal processing,”IEEE Signal Process. Mag., vol. 36, no. 3, pp. 16–43, 2019

work page 2019
[21]

Efficient smoothed concomitant lasso estimation for high dimensional regression,

E. Ndiaye, O. Fercoq, A. Gramfort, V . Lecl `ere, and J. Salmon, “Efficient smoothed concomitant lasso estimation for high dimensional regression,” inJournal of Physics: Conference Series, vol. 904, 2017, p. 012006

work page 2017
[22]

On the role of sparsity and DAG constraints for learning linear DAGs,

I. Ng, A. Ghassami, and K. Zhang, “On the role of sparsity and DAG constraints for learning linear DAGs,” inProc. Adv. Neural. Inf. Process. Syst., vol. 33, 2020, pp. 17 943–17 954

work page 2020
[23]

A robust hybrid of lasso and ridge regression,

A. B. Owen, “A robust hybrid of lasso and ridge regression,”Contemp. Math., vol. 443, no. 7, pp. 59–72, 2007

work page 2007
[24]

Pearl,Causality, 2nd ed

J. Pearl,Causality, 2nd ed. Cambridge University Press, 2009

work page 2009
[25]

Causal graph identification under soft intervention,

C. Peng and U. Mitra, “Causal graph identification under soft intervention,” inProc. IEEE Intl. Symp. Information Theory, 2025, pp. 1–6. May 25, 2026 DRAFT IEEE SIGNAL PROCESSING MAGAZINE, VOL. XX, NO. XX, MAY 2026 24

work page 2025
[26]

Peters, D

J. Peters, D. Janzing, and B. Sch ¨olkopf,Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press, 2017

work page 2017
[27]

Beware of the simulated DAG! Causal discovery benchmarks may be easy to game,

A. Reisach, C. Seiler, and S. Weichwald, “Beware of the simulated DAG! Causal discovery benchmarks may be easy to game,” inProc. Adv. Neural. Inf. Process. Syst., vol. 34, 2021, pp. 27 772–27 784

work page 2021
[28]

Direted acyclic graph convolutional networks,

S. Rey, H. Ajorlou, and G. Mateos, “Direted acyclic graph convolutional networks,”IEEE Trans. Signal Process., vol. 74, pp. 1–16, 2026

work page 2026
[29]

Exploiting Non-Negativity in DAG Structure Learning

S. Rey, M. Navarro, and G. Mateos, “Exploiting non-negativity in DAG structure learning,”IEEE Trans. Signal Process., vol. 74, 2026 (submitted; see also arXiv preprint arXiv:2605.19947)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[30]

CoLiDE: Concomitant linear DAG estimation,

S. S. Saboksayr, G. Mateos, and M. Tepper, “CoLiDE: Concomitant linear DAG estimation,”Proc. Int. Conf. Learn. Representations, 2024

work page 2024
[31]

Block successive convex approximation for concomitant linear DAG estimation,

——, “Block successive convex approximation for concomitant linear DAG estimation,” inProc. IEEE Sensor Array and Mulichannel Signal Process. Workshop. Corvallis, OR, Jul. 8-11, 2024

work page 2024
[32]

Causal protein-signaling networks derived from multiparameter single-cell data,

K. Sachs, O. Perez, D. Pe’er, D. A. Lauffenburger, and G. P. Nolan, “Causal protein-signaling networks derived from multiparameter single-cell data,”Science, vol. 308, no. 5721, pp. 523–529, 2005

work page 2005
[33]

A Bayesian network structure for operational risk modelling in structured finance operations,

A. D. Sanford and I. A. Moosa, “A Bayesian network structure for operational risk modelling in structured finance operations,”J. Oper. Res. Soc., vol. 63, pp. 431–444, 2012

work page 2012
[34]

Toward causal representation learning,

B. Sch ¨olkopf, F. Locatello, S. Bauer, N. R. Ke, N. Kalchbrenner, A. Goyal, and Y . Bengio, “Toward causal representation learning,”Proc. IEEE, vol. 109, no. 5, pp. 612–634, 2021

work page 2021
[35]

Causal Fourier analysis on directed acyclic graphs and posets,

B. Seifert, C. Wendler, and M. P ¨uschel, “Causal Fourier analysis on directed acyclic graphs and posets,”IEEE Trans. Signal Process., vol. 71, pp. 3805–3820, 2023

work page 2023
[36]

Spirtes, C

P. Spirtes, C. Glymour, and R. Scheines,Causation, Prediction, and Search. MIT Press, 2001

work page 2001
[37]

Towards scalable Bayesian learning of causal DAGs,

J. Viinikka, A. Hyttinen, J. Pensar, and M. Koivisto, “Towards scalable Bayesian learning of causal DAGs,” inProc. Adv. Neural. Inf. Process. Syst., vol. 33, 2020, pp. 6584–6594

work page 2020
[38]

D’ya like DAGs? A survey on structure learning and causal discovery,

M. J. V owels, N. C. Camgoz, and R. Bowden, “D’ya like DAGs? A survey on structure learning and causal discovery,” ACM Computing Surveys, vol. 55, no. 4, pp. 1–36, 2022

work page 2022
[39]

DAGs with no fears: A closer look at continuous optimization for learning Bayesian networks,

D. Wei, T. Gao, and Y . Yu, “DAGs with no fears: A closer look at continuous optimization for learning Bayesian networks,” inProc. Adv. Neural. Inf. Process. Syst., vol. 33, 2020, pp. 3895–3906

work page 2020
[40]

dotears: Scalable and consistent directed acyclic graph estimation using observational and interventional data,

A. Xue, J. Rao, S. Sankararaman, and H. Pimentel, “dotears: Scalable and consistent directed acyclic graph estimation using observational and interventional data,”iScience, vol. 28, no. 2, p. 111673, 2025

work page 2025
[41]

Inexact block coordinate descent algorithms for nonsmooth nonconvex optimization,

Y . Yang, M. Pesavento, Z.-Q. Luo, and B. Ottersten, “Inexact block coordinate descent algorithms for nonsmooth nonconvex optimization,”IEEE Trans. Signal Process., vol. 68, pp. 947–961, 2020

work page 2020
[42]

DAG-GNN: DAG structure learning with graph neural networks,

Y . Yu, J. Chen, T. Gao, and M. Yu, “DAG-GNN: DAG structure learning with graph neural networks,” inProc. Int. Conf. Mach. Learn., 2019, pp. 7154–7163

work page 2019
[43]

DAG learning on the permutahedron,

V . Zantedeschi, L. Franceschi, J. Kaddour, M. Kusner, and V . Niculae, “DAG learning on the permutahedron,” inProc. Int. Conf. Learn. Representations, 2023

work page 2023
[44]

Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease,

B. Zhang, C. Gaiteri, L.-G. Bodea, Z. Wang, J. McElwee, A. A. Podtelezhnikov, C. Zhang, T. Xie, L. Tran, R. Dobrin et al., “Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease,”Cell, vol. 153, no. 3, pp. 707–720, 2013

work page 2013
[45]

DAGs with no tears: Continuous optimization for structure learning,

X. Zheng, B. Aragam, P. K. Ravikumar, and E. P. Xing, “DAGs with no tears: Continuous optimization for structure learning,” inProc. Adv. Neural. Inf. Process. Syst., vol. 31, 2018. May 25, 2026 DRAFT IEEE SIGNAL PROCESSING MAGAZINE, VOL. XX, NO. XX, MAY 2026 25

work page 2018
[46]

DAG-PnP: Plug-and-play causal discovery with diffusion priors,

N. Zilberstein, A. Azizpour, G. Mateos, and S. Segarra, “DAG-PnP: Plug-and-play causal discovery with diffusion priors,” inProc. Asilomar Conf. Signals, Syst., Computers, Oct. 24-28, 2026. BIOGRAPHIES Gonzalo Mateosreceived his B.Sc. degree in Electrical Engineering from Universidad de la Rep ´ublica, Montevideo, Uruguay in 2005 and the M.Sc. and Ph.D. de...

work page 2026

[1] [1]

DAGMA: Learning DAGs via M-matrices and a log-determinant acyclicity characterization,

K. Bello, B. Aragam, and P. Ravikumar, “DAGMA: Learning DAGs via M-matrices and a log-determinant acyclicity characterization,” inProc. Adv. Neural. Inf. Process. Syst., vol. 35, 2022, pp. 8226–8239

work page 2022

[2] [2]

Square-root lasso: pivotal recovery of sparse signals via conic programming,

A. Belloni, V . Chernozhukov, and L. Wang, “Square-root lasso: pivotal recovery of sparse signals via conic programming,” Biometrika, vol. 98, no. 4, pp. 791–806, 2011

work page 2011

[3] [3]

Simultaneous analysis of Lasso and Dantzig selector,

P. J. Bickel, Y . Ritov, and A. B. Tsybakov, “Simultaneous analysis of Lasso and Dantzig selector,”Ann. Statist., vol. 37, pp. 1705–1732, 2009

work page 2009

[4] [4]

Differentiable causal discovery from interventional data,

P. Brouillard, S. Lachapelle, A. Lacoste, S. Lacoste-Julien, and A. Drouin, “Differentiable causal discovery from interventional data,” inProc. Adv. Neural. Inf. Process. Syst., vol. 33, 2020, pp. 21 865–21 877. May 25, 2026 DRAFT IEEE SIGNAL PROCESSING MAGAZINE, VOL. XX, NO. XX, MAY 2026 23

work page 2020

[5] [5]

Differentiable DAG sampling,

B. Charpentier, S. Kibler, and S. G ¨unnemann, “Differentiable DAG sampling,” inProc. Int. Conf. Learn. Representations, 2022

work page 2022

[6] [6]

Large-sample learning of Bayesian networks is NP-hard,

D. M. Chickering, D. Heckerman, and C. Meek, “Large-sample learning of Bayesian networks is NP-hard,”J. Mach. Learn. Res., vol. 5, 2004

work page 2004

[7] [7]

Optimal structure identification with greedy search,

D. M. Chickering, “Optimal structure identification with greedy search,”J. Mach. Learn. Res., vol. 3, no. Nov, pp. 507–554, 2002

work page 2002

[8] [8]

BCD Nets: Scalable variational approaches for Bayesian causal discovery,

C. Cundy, A. Grover, and S. Ermon, “BCD Nets: Scalable variational approaches for Bayesian causal discovery,” inProc. Adv. Neural. Inf. Process. Syst., vol. 34, 2021, pp. 7095–7110

work page 2021

[9] [9]

Global optimality in bivariate gradient-based DAG learning,

C. Deng, K. Bello, P. K. Ravikumar, and B. Aragam, “Global optimality in bivariate gradient-based DAG learning,” in Proc. Adv. Neural. Inf. Process. Syst., 2023, pp. 17 929–17 968

work page 2023

[10] [10]

Optimizing NOTEARS objectives via topological swaps,

C. Deng, K. Bello, B. Aragam, and P. K. Ravikumar, “Optimizing NOTEARS objectives via topological swaps,” inProc. Int. Conf. Mach. Learn., 2023, pp. 7563–7595

work page 2023

[11] [11]

Characterizing distribution equivalence and structure learning for cyclic and acyclic directed graphs,

A. Ghassami, A. Yang, N. Kiyavash, and K. Zhang, “Characterizing distribution equivalence and structure learning for cyclic and acyclic directed graphs,” inProc. Int. Conf. Mach. Learn., 2020, pp. 3494–3504

work page 2020

[12] [12]

Topology identification and learning over graphs: Accounting for nonlinearities and dynamics,

G. B. Giannakis, Y . Shen, and G. V . Karanikolas, “Topology identification and learning over graphs: Accounting for nonlinearities and dynamics,”Proc. IEEE, vol. 106, no. 5, pp. 787–807, 2018

work page 2018

[13] [13]

P. J. Huber,Robust Statistics. New York: John Wiley & Sons Inc., 1981

work page 1981

[14] [14]

On fast convergence of proximal algorithms for sqrt-lasso optimization: Don’t worry about its nonsmooth loss function,

X. Li, H. Jiang, J. Haupt, R. Arora, H. Liu, M. Hong, and T. Zhao, “On fast convergence of proximal algorithms for sqrt-lasso optimization: Don’t worry about its nonsmooth loss function,” inProc. Conf. Uncertainty Artif. Intell., 2020, pp. 49–59

work page 2020

[15] [15]

High-dimensional learning of linear causal networks via inverse covariance estimation,

P.-L. Loh and P. B ¨uhlmann, “High-dimensional learning of linear causal networks via inverse covariance estimation,”J. Mach. Learn. Res., vol. 15, no. 1, pp. 3065–3105, 2014

work page 2014

[16] [16]

Meta-DAG: Meta causal discovery via bilevel optimization,

S. Lu and T. Gao, “Meta-DAG: Meta causal discovery via bilevel optimization,” inProc. IEEE Intl. Conf. Acoustics, Speech Signal Process., 2023, pp. 1–5

work page 2023

[17] [17]

Bayesian networks in biomedicine and health-care,

P. J. Lucas, L. C. Van der Gaag, and A. Abu-Hanna, “Bayesian networks in biomedicine and health-care,”Artif. Intell. Med., vol. 30, pp. 201–214, 2004

work page 2004

[18] [18]

Online learning for matrix factorization and sparse coding,

J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online learning for matrix factorization and sparse coding,”J. Mach. Learn. Res., vol. 11, no. 1, 2010

work page 2010

[19] [19]

Generalized concomitant multi-task lasso for sparse multimodal regression,

M. Massias, O. Fercoq, A. Gramfort, and J. Salmon, “Generalized concomitant multi-task lasso for sparse multimodal regression,” inProc. Int. Conf. Artif. Intell. Statist., 2018, pp. 998–1007

work page 2018

[20] [20]

Connecting the dots: Identifying network structure via graph signal processing,

G. Mateos, S. Segarra, A. G. Marques, and A. Ribeiro, “Connecting the dots: Identifying network structure via graph signal processing,”IEEE Signal Process. Mag., vol. 36, no. 3, pp. 16–43, 2019

work page 2019

[21] [21]

Efficient smoothed concomitant lasso estimation for high dimensional regression,

E. Ndiaye, O. Fercoq, A. Gramfort, V . Lecl `ere, and J. Salmon, “Efficient smoothed concomitant lasso estimation for high dimensional regression,” inJournal of Physics: Conference Series, vol. 904, 2017, p. 012006

work page 2017

[22] [22]

On the role of sparsity and DAG constraints for learning linear DAGs,

I. Ng, A. Ghassami, and K. Zhang, “On the role of sparsity and DAG constraints for learning linear DAGs,” inProc. Adv. Neural. Inf. Process. Syst., vol. 33, 2020, pp. 17 943–17 954

work page 2020

[23] [23]

A robust hybrid of lasso and ridge regression,

A. B. Owen, “A robust hybrid of lasso and ridge regression,”Contemp. Math., vol. 443, no. 7, pp. 59–72, 2007

work page 2007

[24] [24]

Pearl,Causality, 2nd ed

J. Pearl,Causality, 2nd ed. Cambridge University Press, 2009

work page 2009

[25] [25]

Causal graph identification under soft intervention,

C. Peng and U. Mitra, “Causal graph identification under soft intervention,” inProc. IEEE Intl. Symp. Information Theory, 2025, pp. 1–6. May 25, 2026 DRAFT IEEE SIGNAL PROCESSING MAGAZINE, VOL. XX, NO. XX, MAY 2026 24

work page 2025

[26] [26]

Peters, D

J. Peters, D. Janzing, and B. Sch ¨olkopf,Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press, 2017

work page 2017

[27] [27]

Beware of the simulated DAG! Causal discovery benchmarks may be easy to game,

A. Reisach, C. Seiler, and S. Weichwald, “Beware of the simulated DAG! Causal discovery benchmarks may be easy to game,” inProc. Adv. Neural. Inf. Process. Syst., vol. 34, 2021, pp. 27 772–27 784

work page 2021

[28] [28]

Direted acyclic graph convolutional networks,

S. Rey, H. Ajorlou, and G. Mateos, “Direted acyclic graph convolutional networks,”IEEE Trans. Signal Process., vol. 74, pp. 1–16, 2026

work page 2026

[29] [29]

Exploiting Non-Negativity in DAG Structure Learning

S. Rey, M. Navarro, and G. Mateos, “Exploiting non-negativity in DAG structure learning,”IEEE Trans. Signal Process., vol. 74, 2026 (submitted; see also arXiv preprint arXiv:2605.19947)

work page internal anchor Pith review Pith/arXiv arXiv 2026

[30] [30]

CoLiDE: Concomitant linear DAG estimation,

S. S. Saboksayr, G. Mateos, and M. Tepper, “CoLiDE: Concomitant linear DAG estimation,”Proc. Int. Conf. Learn. Representations, 2024

work page 2024

[31] [31]

Block successive convex approximation for concomitant linear DAG estimation,

——, “Block successive convex approximation for concomitant linear DAG estimation,” inProc. IEEE Sensor Array and Mulichannel Signal Process. Workshop. Corvallis, OR, Jul. 8-11, 2024

work page 2024

[32] [32]

Causal protein-signaling networks derived from multiparameter single-cell data,

K. Sachs, O. Perez, D. Pe’er, D. A. Lauffenburger, and G. P. Nolan, “Causal protein-signaling networks derived from multiparameter single-cell data,”Science, vol. 308, no. 5721, pp. 523–529, 2005

work page 2005

[33] [33]

A Bayesian network structure for operational risk modelling in structured finance operations,

A. D. Sanford and I. A. Moosa, “A Bayesian network structure for operational risk modelling in structured finance operations,”J. Oper. Res. Soc., vol. 63, pp. 431–444, 2012

work page 2012

[34] [34]

Toward causal representation learning,

B. Sch ¨olkopf, F. Locatello, S. Bauer, N. R. Ke, N. Kalchbrenner, A. Goyal, and Y . Bengio, “Toward causal representation learning,”Proc. IEEE, vol. 109, no. 5, pp. 612–634, 2021

work page 2021

[35] [35]

Causal Fourier analysis on directed acyclic graphs and posets,

B. Seifert, C. Wendler, and M. P ¨uschel, “Causal Fourier analysis on directed acyclic graphs and posets,”IEEE Trans. Signal Process., vol. 71, pp. 3805–3820, 2023

work page 2023

[36] [36]

Spirtes, C

P. Spirtes, C. Glymour, and R. Scheines,Causation, Prediction, and Search. MIT Press, 2001

work page 2001

[37] [37]

Towards scalable Bayesian learning of causal DAGs,

J. Viinikka, A. Hyttinen, J. Pensar, and M. Koivisto, “Towards scalable Bayesian learning of causal DAGs,” inProc. Adv. Neural. Inf. Process. Syst., vol. 33, 2020, pp. 6584–6594

work page 2020

[38] [38]

D’ya like DAGs? A survey on structure learning and causal discovery,

M. J. V owels, N. C. Camgoz, and R. Bowden, “D’ya like DAGs? A survey on structure learning and causal discovery,” ACM Computing Surveys, vol. 55, no. 4, pp. 1–36, 2022

work page 2022

[39] [39]

DAGs with no fears: A closer look at continuous optimization for learning Bayesian networks,

D. Wei, T. Gao, and Y . Yu, “DAGs with no fears: A closer look at continuous optimization for learning Bayesian networks,” inProc. Adv. Neural. Inf. Process. Syst., vol. 33, 2020, pp. 3895–3906

work page 2020

[40] [40]

dotears: Scalable and consistent directed acyclic graph estimation using observational and interventional data,

A. Xue, J. Rao, S. Sankararaman, and H. Pimentel, “dotears: Scalable and consistent directed acyclic graph estimation using observational and interventional data,”iScience, vol. 28, no. 2, p. 111673, 2025

work page 2025

[41] [41]

Inexact block coordinate descent algorithms for nonsmooth nonconvex optimization,

Y . Yang, M. Pesavento, Z.-Q. Luo, and B. Ottersten, “Inexact block coordinate descent algorithms for nonsmooth nonconvex optimization,”IEEE Trans. Signal Process., vol. 68, pp. 947–961, 2020

work page 2020

[42] [42]

DAG-GNN: DAG structure learning with graph neural networks,

Y . Yu, J. Chen, T. Gao, and M. Yu, “DAG-GNN: DAG structure learning with graph neural networks,” inProc. Int. Conf. Mach. Learn., 2019, pp. 7154–7163

work page 2019

[43] [43]

DAG learning on the permutahedron,

V . Zantedeschi, L. Franceschi, J. Kaddour, M. Kusner, and V . Niculae, “DAG learning on the permutahedron,” inProc. Int. Conf. Learn. Representations, 2023

work page 2023

[44] [44]

Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease,

B. Zhang, C. Gaiteri, L.-G. Bodea, Z. Wang, J. McElwee, A. A. Podtelezhnikov, C. Zhang, T. Xie, L. Tran, R. Dobrin et al., “Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease,”Cell, vol. 153, no. 3, pp. 707–720, 2013

work page 2013

[45] [45]

DAGs with no tears: Continuous optimization for structure learning,

X. Zheng, B. Aragam, P. K. Ravikumar, and E. P. Xing, “DAGs with no tears: Continuous optimization for structure learning,” inProc. Adv. Neural. Inf. Process. Syst., vol. 31, 2018. May 25, 2026 DRAFT IEEE SIGNAL PROCESSING MAGAZINE, VOL. XX, NO. XX, MAY 2026 25

work page 2018

[46] [46]

DAG-PnP: Plug-and-play causal discovery with diffusion priors,

N. Zilberstein, A. Azizpour, G. Mateos, and S. Segarra, “DAG-PnP: Plug-and-play causal discovery with diffusion priors,” inProc. Asilomar Conf. Signals, Syst., Computers, Oct. 24-28, 2026. BIOGRAPHIES Gonzalo Mateosreceived his B.Sc. degree in Electrical Engineering from Universidad de la Rep ´ublica, Montevideo, Uruguay in 2005 and the M.Sc. and Ph.D. de...

work page 2026