pith. sign in

arxiv: 2506.12218 · v2 · pith:GIF5PX5Nnew · submitted 2025-06-13 · 📡 eess.SP · cs.LG

Directed Acyclic Graph Convolutional Networks

Pith reviewed 2026-05-21 23:49 UTC · model grok-4.3

classification 📡 eess.SP cs.LG
keywords directed acyclic graphsgraph neural networkscausal graph filtersconvolutional learningnode representationspermutation equivariancegraph signal processing
0
0 comments X

The pith

The DAG Convolutional Network uses causal graph filters to respect partial ordering when learning node representations on directed acyclic graphs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the DAG Convolutional Network (DCN) as a graph neural network built specifically for signals on directed acyclic graphs, which arise in causal inference, scheduling, and architecture search. Conventional GNNs overlook the directional and acyclic structure, but the DCN applies causal graph filters that process nodes according to their partial order. A parallel version called PDCN feeds the input through multiple causal shift operators and then a shared multilayer perceptron, keeping parameter count independent of graph size. The work also proves permutation equivariance and expressive power for both models. Experiments across tasks and datasets show competitive accuracy, robustness, and speed relative to existing baselines.

Core claim

By defining convolutional operations via causal graph-shift operators that admit spectral representations, the DCN learns nodal features that incorporate the topological order of a DAG, an inductive bias absent from standard GNNs, and the parallel PDCN variant achieves this while decoupling model complexity from graph size.

What carries the argument

Causal graph filters, constructed from a graph-shift operator adapted to the DAG partial order, that enable directional convolution in both vertex and spectral domains.

If this is right

  • The architecture can be applied directly to causal-inference and scheduling problems while preserving directional constraints.
  • PDCN scales to larger DAGs without a proportional rise in parameters.
  • Permutation equivariance guarantees that node relabelings do not alter the learned representations.
  • The spectral formulation opens the door to frequency-domain analysis of signals on acyclic graphs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same causal-filter idea could be tested on other ordered structures such as temporal or hierarchical graphs.
  • Joint training of DCN layers with causal-discovery routines might allow simultaneous structure and representation learning.
  • Stability or generalization bounds derived from the spectral properties could be derived for DAG-specific tasks.

Load-bearing premise

That the causal graph filters supply an inductive bias strong enough to produce clear accuracy or efficiency gains over ordinary GNNs on actual DAG datasets.

What would settle it

A controlled test on a standard DAG benchmark in which a conventional GNN without causal filters matches or exceeds the DCN accuracy would undermine the claimed advantage of the proposed filters.

Figures

Figures reproduced from arXiv: 2506.12218 by Gonzalo Mateos, Hamed Ajorlou, Samuel Rey.

Figure 1
Figure 1. Figure 1: A DAG D and its adjacency matrix A. A. Graph-Theoretic Preliminaries: DAGs and Signals Let D = (V, E) be a DAG, where V is the set of N nodes and E ⊆ V × V represents the set of directed edges; see [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An example of how information is propagated by causal shifts [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of the DCN (a) and PDCN (b) architectures. The DCN is structured sequentially as a deep architecture, [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (a) NMSE in the network diffusion task as the noise in the observations increases. For the source identification task, [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of Erdos–R ˝ enyi (ER) and scale-free (SF) ´ graphs in diffusion learning (left) and source identification (right). Performance is fairly invariant across graph types. identification task, [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: A DAG representing the flow of the River Thames [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
read the original abstract

Directed acyclic graphs (DAGs) are central to science and engineering applications including causal inference, scheduling, and neural architecture search. In this work, we introduce the DAG Convolutional Network (DCN), a novel graph neural network (GNN) architecture designed specifically for convolutional learning from signals supported on DAGs. The DCN leverages causal graph filters to learn nodal representations that account for the partial ordering inherent to DAGs, a strong inductive bias does not present in conventional GNNs. Unlike prior art in machine learning over DAGs, DCN builds on formal convolutional operations that admit spectral-domain representations. We further propose the Parallel DCN (PDCN), a model that feeds input DAG signals to a parallel bank of causal graph-shift operators and processes these DAG-aware features using a shared multilayer perceptron. This way, PDCN decouples model complexity from graph size while maintaining satisfactory predictive performance. The architectures' permutation equivariance and expressive power properties are also established. Comprehensive numerical tests across several tasks, datasets, and experimental conditions demonstrate that (P)DCN compares favorably with state-of-the-art baselines in terms of accuracy, robustness, and computational efficiency. These results position (P)DCN as a viable framework for deep learning from DAG-structured data that is designed from first (graph) signal processing principles.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces the DAG Convolutional Network (DCN) and Parallel DCN (PDCN) for learning nodal representations from signals on directed acyclic graphs. It constructs causal graph filters based on graph signal processing that respect the partial order of DAGs via the graph shift operator, establishes permutation equivariance and expressive power, and reports that the models compare favorably to state-of-the-art baselines in accuracy, robustness, and efficiency across multiple tasks and datasets.

Significance. If the central claims hold, the work supplies a principled GSP-derived inductive bias for partial orders that standard GNNs lack, which could benefit causal inference, scheduling, and neural architecture search. Credit is due for the formal convolutional operations with spectral representations, the proofs of equivariance and expressivity, and the PDCN design that decouples complexity from graph size while retaining performance.

major comments (2)
  1. [Section 5] Section 5 (Numerical Experiments): the abstract states that comprehensive tests demonstrate favorable comparison, yet the reported results lack error bars, statistical significance tests, and ablations isolating the causal filter component from other architectural choices. This is load-bearing for the claim that the partial-order bias translates into measurable gains.
  2. [Section 3.2] Section 3.2 (Causal Graph Filters): the spectral representation of the causal filters is presented as respecting acyclicity by construction, but the derivation does not explicitly verify that the filter coefficients remain valid under arbitrary topological orderings of the same DAG; a counter-example or invariance proof would be required.
minor comments (2)
  1. [Abstract] Abstract: the clause 'a strong inductive bias does not present in conventional GNNs' contains a grammatical error and should read 'a strong inductive bias that is not present in conventional GNNs'.
  2. [Throughout] Notation: the graph-shift operator is introduced with multiple symbols across sections; a single consistent symbol and a forward reference to its definition would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and describe the revisions we will make to strengthen the empirical and theoretical sections.

read point-by-point responses
  1. Referee: [Section 5] Section 5 (Numerical Experiments): the abstract states that comprehensive tests demonstrate favorable comparison, yet the reported results lack error bars, statistical significance tests, and ablations isolating the causal filter component from other architectural choices. This is load-bearing for the claim that the partial-order bias translates into measurable gains.

    Authors: We agree that the current experimental section would benefit from additional statistical rigor. In the revised manuscript we will rerun all experiments over at least five independent random seeds, report mean performance together with standard-deviation error bars, and include paired t-tests (or Wilcoxon signed-rank tests where appropriate) to establish statistical significance against the strongest baselines. We will also add a dedicated ablation subsection that replaces the causal graph-shift operators with ordinary (non-causal) polynomial filters while keeping all other architectural choices fixed; the resulting performance drop will quantify the contribution of the partial-order inductive bias. These changes will appear in an expanded Section 5 and the associated appendix. revision: yes

  2. Referee: [Section 3.2] Section 3.2 (Causal Graph Filters): the spectral representation of the causal filters is presented as respecting acyclicity by construction, but the derivation does not explicitly verify that the filter coefficients remain valid under arbitrary topological orderings of the same DAG; a counter-example or invariance proof would be required.

    Authors: The causal filters are constructed from the adjacency matrix of the DAG, which is nilpotent under any valid topological ordering. Because the filter is ultimately applied in the vertex domain, its action on a signal is independent of the particular ordering chosen to triangularize the matrix. Nevertheless, we acknowledge that an explicit invariance argument is missing. In the revision we will insert a short lemma (with proof) in Section 3.2 showing that the output of any polynomial causal filter is identical for all topological sorts of the same DAG; the proof relies on the fact that different orderings correspond to permutation-similar matrices whose nilpotency index and spectrum remain unchanged. A brief counter-example illustrating what would break if the filter were not causal will also be added for clarity. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper constructs DCN and PDCN directly from graph signal processing principles by defining causal graph filters via the DAG shift operator that respects partial ordering. Permutation equivariance and expressive power follow as standard consequences of the convolutional construction. No step reduces a claimed prediction or first-principles result to a fitted parameter or self-citation by construction; the framework remains self-contained with independent content relative to external GSP benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard graph signal processing assumptions plus the domain-specific choice of causal filters for DAGs.

axioms (1)
  • domain assumption Signals on DAGs admit a well-defined partial order that can be exploited by causal graph-shift operators.
    Invoked when defining the convolutional operations that respect the DAG topology.

pith-pipeline@v0.9.0 · 5759 in / 1117 out tokens · 38778 ms · 2026-05-21T23:49:43.659048+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    Convolutional learning on directed acyclic graphs,

    S. Rey, H. Ajorlou, and G. Mateos, “Convolutional learning on directed acyclic graphs,” inProc. Asilomar Conf. Signals, Syst., Computers, 2024, pp. 423–427

  2. [2]

    Geometric deep learning: Going beyond Euclidean data,

    M. M. Bronstein, J. Bruna, Y . LeCun, A. Szlam, and P. Vandergheynst, “Geometric deep learning: Going beyond Euclidean data,” IEEE Signal Process. Mag., vol. 34, no. 4, pp. 18–42, July 2017

  3. [3]

    Graph signal processing: Overview, challenges, and ap- plications,

    A. Ortega, P. Frossard, J. Kova ˇcevi´c, J. M. F. Moura, and P. Van- dergheynst, “Graph signal processing: Overview, challenges, and ap- plications,” Proc. IEEE, vol. 106, no. 5, pp. 808–828, 2018

  4. [4]

    Graph signal processing for machine learning: A review and new perspectives,

    X. Dong, D. Thanou, L. Toni, M. Bronstein, and P. Frossard, “Graph signal processing for machine learning: A review and new perspectives,” IEEE Signal Process. Mag. , vol. 37, no. 6, p. 117–127, Nov. 2020

  5. [5]

    Graph signal processing: History, development, impact, and outlook,

    G. Leus, A. G. Marques, J. M. Moura, A. Ortega, and D. I. Shuman, “Graph signal processing: History, development, impact, and outlook,” IEEE Signal Process. Mag. , vol. 40, no. 4, pp. 49–60, 2023

  6. [6]

    Graph neural networks: Architec- tures, stability, and transferability,

    L. Ruiz, F. Gama, and A. Ribeiro, “Graph neural networks: Architec- tures, stability, and transferability,” Proc. IEEE , vol. 109, no. 5, pp. 660–682, 2021

  7. [7]

    A comprehensive survey on graph neural networks,

    Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” IEEE Trans. Neural Netw. Learn. Syst. , vol. 32, no. 1, pp. 4–24, 2021

  8. [8]

    Semi-supervised classification with graph convolutional networks,

    T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in Proc. Int. Conf. Learn. Representations , 2017, pp. 1–14

  9. [9]

    How powerful are graph neural networks?

    K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?” in Proc. Int. Conf. Learn. Representations , 2019, pp. 1–17

  10. [10]

    Graph attention networks,

    P. Veli ˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Li `o, and Y . Bengio, “Graph attention networks,” in Proc. Int. Conf. Learn. Representations, 2018, pp. 1–12

  11. [11]

    MGAE: Marginalized graph autoencoder for graph clustering,

    C. Wang, S. Pan, G. Long, X. Zhu, and J. Jiang, “MGAE: Marginalized graph autoencoder for graph clustering,” inAssoc. Comput. Mach., 2017, pp. 889–898

  12. [12]

    Overparametrized deep encoder-decoder schemes for inputs and outputs defined over graphs,

    S. Rey, V . M. Tenorio, S. Rozada, L. Martino, and A. G. Marques, “Overparametrized deep encoder-decoder schemes for inputs and outputs defined over graphs,” in Proc. European Signal Process. Conf. (EU- SIPCO). IEEE, 2021, pp. 855–859

  13. [13]

    Signal processing on directed graphs: The role of edge directionality when processing and learning from network data,

    A. G. Marques, S. Segarra, and G. Mateos, “Signal processing on directed graphs: The role of edge directionality when processing and learning from network data,” IEEE Signal Process. Mag., vol. 37, no. 6, pp. 99–116, 2020

  14. [14]

    Causal Fourier analysis on directed acyclic graphs and posets,

    B. Seifert, C. Wendler, and M. P ¨uschel, “Causal Fourier analysis on directed acyclic graphs and posets,”IEEE Trans. Signal Process., vol. 71, pp. 3805–3820, 2023

  15. [15]

    Peters, D

    J. Peters, D. Janzing, and B. Sch ¨olkopf, Elements of Causal Inference: Foundations and Learning Algorithms . The MIT Press, 2017

  16. [16]

    Identifiability of Gaussian structural equa- tion models with equal error variances,

    J. Peters and P. B ¨uhlmann, “Identifiability of Gaussian structural equa- tion models with equal error variances,” Biometrika, vol. 101, no. 1, pp. 219–228, 2014

  17. [17]

    DAGs with no tears: Continuous optimization for structure learning,

    X. Zheng, B. Aragam, P. K. Ravikumar, and E. P. Xing, “DAGs with no tears: Continuous optimization for structure learning,”Proc. Adv. Neural. Inf. Process. Syst. , vol. 31, 2018

  18. [18]

    CoLiDE: Concomitant linear DAG estimation,

    S. S. Saboksayr, G. Mateos, and M. Tepper, “CoLiDE: Concomitant linear DAG estimation,” in Proc. Int. Conf. Learn. Representations , 2024

  19. [19]

    Multiscale causal structure learning,

    G. D’Acunto, P. D. Lorenzo, and S. Barbarossa, “Multiscale causal structure learning,” Trans. Mach. Learn. Res. , pp. 1–39, 2023

  20. [20]

    A survey of machine learning for big code and naturalness,

    M. Allamanis, E. T. Barr, P. Devanbu, and C. Sutton, “A survey of machine learning for big code and naturalness,” ACM Computing Surveys (CSUR), vol. 51, no. 4, pp. 1–37, 2018

  21. [21]

    Graph hypernetworks for neural architecture search,

    C. Zhang, M. Ren, and R. Urtasun, “Graph hypernetworks for neural architecture search,” in Proc. Int. Conf. Learn. Representations , 2019

  22. [22]

    Discrete signal processing on meet/join lattices,

    M. P ¨uschel, B. Seifert, and C. Wendler, “Discrete signal processing on meet/join lattices,” IEEE Trans. Signal Process., vol. 69, pp. 3571–3584, 2021

  23. [23]

    D-V AE: A variational autoencoder for directed acyclic graphs,

    M. Zhang, S. Jiang, Z. Cui, R. Garnett, and Y . Chen, “D-V AE: A variational autoencoder for directed acyclic graphs,” in Proc. Adv. Neural. Inf. Process. Syst. , 2019

  24. [24]

    Directed acyclic graph neural networks,

    V . Thost and J. Chen, “Directed acyclic graph neural networks,” in Int. Conf. Learn. Representations , 2021

  25. [25]

    A reduction of a graph to a canonical form and an algebra arising during this reduction,

    B. Y . Weisfeiler and A. A. Lehman, “A reduction of a graph to a canonical form and an algebra arising during this reduction,” Nauchno- Technicheskaya Informatsia, vol. 2, no. 9, pp. 12–16, 1968

  26. [26]

    Transformers over directed acyclic graphs,

    Y . Luo, V . Thost, and L. Shi, “Transformers over directed acyclic graphs,” in Proc. Adv. Neural. Inf. Process. Syst. , vol. 36, 2023, pp. 47 764–47 782

  27. [27]

    Graph filters for signal processing and machine learning on graphs,

    E. Isufi, F. Gama, D. I. Shuman, and S. Segarra, “Graph filters for signal processing and machine learning on graphs,” IEEE Trans. Signal Process., vol. 72, pp. 4745–4781, 2024

  28. [28]

    Redesigning graph filter-based GNNs to relax the homophily assump- tion,

    S. Rey, M. Navarro, V . M. Tenorio, S. Segarra, and A. G. Marques, “Redesigning graph filter-based GNNs to relax the homophily assump- tion,” in Proc. IEEE Intl. Conf. Acoustics, Speech and Signal Process. (ICASSP). IEEE, 2025, pp. 1–5

  29. [29]

    Algebraic structures for transitive closure,

    D. J. Lehmann, “Algebraic structures for transitive closure,” Theoretical Comput. Sci., vol. 4, no. 1, pp. 59–76, 1977

  30. [30]

    On the foundations of combinatorial theory i. theory of m¨obius functions,

    G.-C. Rota, “On the foundations of combinatorial theory i. theory of m¨obius functions,” Probability Theory and Related Fields , vol. 2, pp. 340–368, 1964

  31. [31]

    Untrained graph neural networks for denoising,

    S. Rey, S. Segarra, R. Heckel, and A. G. Marques, “Untrained graph neural networks for denoising,” IEEE Trans. Signal Process. , vol. 70, pp. 5708–5723, 2022

  32. [32]

    Graph neural networks with parallel neighborhood aggregations for graph classification,

    S. Doshi and S. P. Chepuri, “Graph neural networks with parallel neighborhood aggregations for graph classification,” IEEE Trans. Signal Process., vol. 70, pp. 4883–4896, 2022

  33. [33]

    Inductive representation learning on large graphs,

    W. L. Hamilton, R. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Proc. Adv. Neural. Inf. Process. Syst., 2017, pp. 1025–1035

  34. [34]

    Learning repre- sentations by back-propagating errors,

    D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning repre- sentations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, 1986

  35. [35]

    Emergence of scaling in random net- works,

    A.-L. Barab ´asi and R. Albert, “Emergence of scaling in random net- works,” Science, vol. 286, no. 5439, pp. 509–512, 1999

  36. [36]

    From correlation to causation networks: A simple approximate learning algorithm and its application to high-dimensional plant gene expression data,

    R. Opgen-Rhein and K. Strimmer, “From correlation to causation networks: A simple approximate learning algorithm and its application to high-dimensional plant gene expression data,” BMC Systems Biology, vol. 1, no. 1, p. 37, 2007

  37. [37]

    Weekly water quality data from the River Thames and its major tributaries (2009–2017),

    M. J. Bowes, L. K. Armstrong, S. A. Harman, D. J. E. Nicholls, H. D. Wickham, P. M. Scarlett, and M. D. Juergens, “Weekly water quality data from the River Thames and its major tributaries (2009–2017),” 2020

  38. [38]

    Adam: A method for stochastic optimization,

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learn. Representations , 2015