pith. sign in

arxiv: 2605.24280 · v1 · pith:7PRLBYYBnew · submitted 2026-05-22 · 💻 cs.DS

A Comprehensive Evaluation of Vertex Elimination Algorithms for Algorithmic Differentiation

Pith reviewed 2026-06-30 14:02 UTC · model grok-4.3

classification 💻 cs.DS
keywords algorithmic differentiationvertex eliminationinteger programmingoptimal orderingheuristicsapproximation algorithmsdirected acyclic graphsNP-complete
0
0 comments X

The pith

New integer programming formulations solve optimal vertex elimination on graphs one to two orders of magnitude larger than before.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces integer programming models for finding minimum-cost sequences of vertex eliminations in directed acyclic graphs that represent computations in algorithmic differentiation. Because the underlying ordering problem is NP-complete, prior work relied on heuristics whose practical performance could not be checked exactly on medium-sized instances. The new models produce a collection of graphs with verified optimal solutions and show that widely used heuristics recover high-quality orderings on real data. Additional results include a tight characterization of the costs of forward and reverse modes together with the first approximation lower bounds for the problem.

Core claim

We design and engineer new integer programming formulations for Optimal Vertex Elimination and for a related objective we call Minimum Edge Count. Our implementations scale to graphs one-to-two orders of magnitude larger than existing techniques, enabling the assembly of a corpus of medium-sized graphs for which optimal solutions are known. This corpus facilitates a study of existing heuristics, confirming that on real data popular methods achieve high quality solutions. We also give a tight analysis of the forward and reverse modes of AD, extend the techniques to a simple approximation algorithm parameterized by the size of a minimum source-sink separator, and establish the first approximat

What carries the argument

Integer programming formulations that encode sequences of vertex eliminations and their associated costs in a DAG to compute an optimal elimination ordering or minimum edge count.

If this is right

  • Popular heuristics for vertex elimination can be evaluated exactly against optimal solutions on a new corpus of medium-sized real graphs.
  • Forward and reverse modes of algorithmic differentiation admit a tight cost analysis.
  • Optimal Vertex Elimination admits an approximation algorithm whose ratio is parameterized by the size of a minimum source-sink separator.
  • Both Optimal Vertex Elimination and Minimum Edge Count have provable approximation lower bounds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The new solvable instances could serve as training or validation data for designing improved heuristics that generalize beyond the current corpus.
  • Software systems performing automatic differentiation might incorporate the formulations for exact solving on graphs that fall inside the new size range.
  • The approximation results may suggest similar parameterized algorithms for other ordering problems on DAGs that arise in compiler optimization.

Load-bearing premise

The integer programming formulations correctly encode the true costs of every possible sequence of vertex eliminations without omissions or deviations from the AD cost model.

What would settle it

An instance in which the integer program returns an elimination ordering whose cost differs from the cost obtained by exhaustive enumeration on the same graph would falsify the claim that the formulations are correct.

Figures

Figures reproduced from arXiv: 2605.24280 by Alex Crane, Andrew Lyons, Blair D. Sullivan, Eli Friedman, Jan H\"uckelheim, Mac\'eo Ottavy, P{\aa}l Gr{\o}n{\aa}s Drange, Paul D. Hovland, Yosuke Mizutani.

Figure 1
Figure 1. Figure 1: Gadgets (see the construction below) for vertices of type 1 (left), 2a (center), and 3 [PITH_FULL_IMAGE:figures/full_fig_p012_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The construction used in the proof of Proposition 6. The edges between horizontally [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The construction used in the proof of Proposition 7. The sets [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The construction used to show that the bound of Theorem 9 is tight. [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Example of a DAG D where MiddleOut is an Ω(|C| 2 )-approximation. Let C = {b1, . . . , bn, w1, . . . , wn}. Then MiddleOut(D, C) gives elimination sequence (a, u, b1, . . . , bn, w1, . . . , wn) and incurs cost n 3 + 3n 2 = Θ(|C| 3 ), while the optimal elimination sequence (b1, . . . , bn, w1, . . . , wn, u, a) incurs cost 3n + 1 = Θ(|C|). Now, we pick up from Equation (24), once more bounding |N + D′ π(v)… view at source ↗
Figure 6
Figure 6. Figure 6: Gadget for vertex v used in the proof of Theorem 19. The vertex v ′ is a replica vertex, and the sets Iv and Ov contain auxiliary vertices; the cardinalities of these sets are indicated above. Bolded arrows represent multiple edges. Dashed arrows represent edges to or from the replica vertices in other gadgets. T is a global set of cardinality 3n 2 , shared by all vertex gadgets. We can now state our resul… view at source ↗
Figure 10
Figure 10. Figure 10: fig10.4 10 12 [PITH_FULL_IMAGE:figures/full_fig_p031_10.png] view at source ↗
Figure 10
Figure 10. Figure 10: fig10.4 10 12 [PITH_FULL_IMAGE:figures/full_fig_p033_10.png] view at source ↗
Figure 10
Figure 10. Figure 10: fig10.4 10 12 [PITH_FULL_IMAGE:figures/full_fig_p036_10.png] view at source ↗
Figure 10
Figure 10. Figure 10: fig10.4 10 12 [PITH_FULL_IMAGE:figures/full_fig_p038_10.png] view at source ↗
Figure 10
Figure 10. Figure 10: fig10.4 10 12 [PITH_FULL_IMAGE:figures/full_fig_p040_10.png] view at source ↗
Figure 10
Figure 10. Figure 10: fig10.3 18 20 [PITH_FULL_IMAGE:figures/full_fig_p042_10.png] view at source ↗
Figure 10
Figure 10. Figure 10: fig10.3 18 20 [PITH_FULL_IMAGE:figures/full_fig_p043_10.png] view at source ↗
Figure 10
Figure 10. Figure 10: fig10.3 18 20 [PITH_FULL_IMAGE:figures/full_fig_p044_10.png] view at source ↗
Figure 10
Figure 10. Figure 10: fig10.3 18 20 [PITH_FULL_IMAGE:figures/full_fig_p049_10.png] view at source ↗
Figure 10
Figure 10. Figure 10: fig10.3 18 20 [PITH_FULL_IMAGE:figures/full_fig_p050_10.png] view at source ↗
Figure 10
Figure 10. Figure 10: fig10.3 18 20 [PITH_FULL_IMAGE:figures/full_fig_p051_10.png] view at source ↗
read the original abstract

The algorithmic differentiation (AD) of mathematical functions can be interpreted as a sequence of vertex eliminations in an underlying directed acyclic graph. The problem of determining a minimum-cost elimination ordering, which we call Optimal Vertex Elimination, is NP-complete. Consequently, much effort has been devoted to the design of heuristics. Many of these heuristics are widely believed to perform well in practice, but this hypothesis has so far been difficult to test due to the lack of scalable exact methods. We design and engineer new integer programming formulations for Optimal Vertex Eliminatioin and for a related objective we call Minimum Edge Count. Our implementations scale to graphs one-to-two orders of magnitude larger than existing techniques, enabling the assembly of a corpus of medium-sized graphs for which optimal solutions are known. This corpus facilitates a study of existing heuristics, confirming that on real data popular methods achieve high quality solutions. We also make several theoretical contributions. We give a tight analysis of the forward and reverse modes of AD, and extend our techniques to provide a simple algorithm for Optimal Vertex Elimination with approximation ratio parameterized by the size of a minimum source-sink separator. On the complexity side, we give the first approximation lower bounds for both problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper introduces new integer programming formulations for the Optimal Vertex Elimination problem (and a related Minimum Edge Count objective) arising in algorithmic differentiation on DAGs. These formulations are claimed to scale to instances one-to-two orders of magnitude larger than prior exact methods, enabling construction of a corpus of medium-sized graphs with known optimal solutions. The corpus is used to benchmark existing heuristics, which are reported to achieve high-quality solutions on real data. Additional theoretical results include a tight analysis of forward and reverse AD modes, a simple approximation algorithm whose ratio is parameterized by the size of a minimum source-sink separator, and the first approximation lower bounds for both problems.

Significance. If the IP formulations are correct and the claimed scaling holds, the work would be significant for the AD and combinatorial optimization communities by providing the first scalable exact solver for a classically hard problem, together with a reusable benchmark corpus. The theoretical contributions on forward/reverse analysis and separator-parameterized approximation are also valuable. The paper ships concrete engineering advances (new IPs, larger solvable instances) that directly address the prior lack of exact methods for heuristic evaluation.

minor comments (2)
  1. [Abstract] Abstract: 'Optimal Vertex Eliminatioin' contains a typographical error and should read 'Elimination'.
  2. [Modeling section] The manuscript would benefit from an explicit statement (perhaps in §3 or the modeling section) confirming that the new IP objective and constraints exactly reproduce the standard AD elimination cost model without omitted fill-in terms or surrogate objectives.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment, accurate summary of our contributions on scalable IP formulations for Optimal Vertex Elimination, the benchmark corpus, and the theoretical results on forward/reverse modes, separator-parameterized approximation, and lower bounds. The recommendation for minor revision is noted.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central contributions consist of new integer programming formulations for Optimal Vertex Elimination and Minimum Edge Count, presented as independent modeling work that encodes elimination costs on directed acyclic graphs. These build on the externally known NP-completeness result without the formulations themselves reducing to self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations. Theoretical extensions (forward/reverse mode analysis, separator-parameterized approximation, and approximation lower bounds) are stated as separate results with no equations or claims shown to be equivalent to their inputs by construction. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the modeling fidelity of the new IP formulations and on the representativeness of the generated medium-sized graph corpus; both cannot be inspected from the abstract alone. The paper invokes the known NP-completeness of the problem as background.

axioms (1)
  • standard math Determining a minimum-cost elimination ordering is NP-complete.
    Stated directly in the abstract as established prior knowledge.

pith-pipeline@v0.9.1-grok · 5783 in / 1105 out tokens · 77136 ms · 2026-06-30T14:02:34.309633+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 19 canonical work pages · 2 internal anchors

  1. [1]

    Sinan G Aksoy, Ryan Bennink, Yuzhou Chen, Jos´ e Fr´ ıas, Yulia R Gel, Bill Kay, Uwe Naumann, Carlos Ortiz Marrero, Anthony V Petyuk, Sandip Roy, Ignacio Segovia-Dominguez, Nate Veldt, and Stephen J. Young. Seven open problems in applied combinatorics.Journal of Combinatorics, 14(4):559–601, 2023.arXiv:2303.11464,doi:10.4310/joc.2023.v14.n4. a8

  2. [2]

    Markowitz-type heuristics for com- puting jacobian matrices efficiently

    Andreas Albrecht, Peter Gottschling, and Uwe Naumann. Markowitz-type heuristics for com- puting jacobian matrices efficiently. InInternational Conference on Computational Science, pages 575–584. Springer, 2003.doi:10.1007/3-540-44862-4_61

  3. [3]

    Hardness of approximating problems on cubic graphs

    Paola Alimonti and Viggo Kann. Hardness of approximating problems on cubic graphs. In Italian conference on algorithms and complexity, pages 288–298. Springer, 1997.doi:10.1007/ 3-540-62592-5_80

  4. [4]

    Computational graphs and rounding error.SIAM Journal on Numerical Analysis, 11(1):87–96, 1974.doi:10.1137/0711010

    Friedrich L Bauer. Computational graphs and rounding error.SIAM Journal on Numerical Analysis, 11(1):87–96, 1974.doi:10.1137/0711010

  5. [5]

    Structural optimal jacobian accumulation and minimum edge count are NP-complete under vertex elimination.arXiv preprint, 2025.arXiv:2506.17521

    Matthias Bentert, Alex Crane, P˚ al Grøn˚ as Drange, Yosuke Mizutani, and Blair D Sullivan. Structural optimal jacobian accumulation and minimum edge count are NP-complete under vertex elimination.arXiv preprint, 2025.arXiv:2506.17521. 14

  6. [6]

    An Algo- rithmic Weakening of the Erd˝ os-Hajnal Conjecture

    ´Edouard Bonnet, St´ ephan Thomass´ e, Xuan Thang Tran, and R´ emi Watrigant. An Algo- rithmic Weakening of the Erd˝ os-Hajnal Conjecture. In28th Annual European Symposium on Algorithms (ESA 2020), volume 173 ofLeibniz International Proceedings in Informatics (LIPIcs), pages 23:1–23:18, Dagstuhl, Germany, 2020. Schloss Dagstuhl – Leibniz-Zentrum f¨ ur Inf...

  7. [7]

    TEST NLS nonlinear least squares test problems.https://people.math

    John Burkhardt. TEST NLS nonlinear least squares test problems.https://people.math. sc.edu/burkardt/f_src/test_nls/test_nls.html

  8. [8]

    An integer programming approach to optimal derivative accumulation

    Jieqiu Chen, Paul Hovland, Todd Munson, and Jean Utke. An integer programming approach to optimal derivative accumulation. InRecent Advances in Algorithmic Differentiation, pages 221–231. Springer, 2012.doi:10.1007/978-3-642-30023-3_20

  9. [9]

    An integer programming ap- proach to optimal derivative accumulation

    Jieqiu Chen, Paul Hovland, Todd Munson, and Jean Utke. An integer programming ap- proach to optimal derivative accumulation. Technical report, Argonne National Laboratory,

  10. [10]

    URL:https://www.mcs.anl.gov/uploads/ cels/papers/1994-0112.pdf

    Extended version of conference paper. URL:https://www.mcs.anl.gov/uploads/ cels/papers/1994-0112.pdf

  11. [11]

    Springer, 2012.doi:10.1007/978-3-662-70107-2

    Reinhard Diestel.Graph Theory, 4th Edition, volume 173 ofGraduate texts in mathematics. Springer, 2012.doi:10.1007/978-3-662-70107-2

  12. [12]

    A mathematical view of automatic differentiation.Acta Numerica, 12:321–398, 2003.doi:10.1017/S0962492902000132

    Andreas Griewank. A mathematical view of automatic differentiation.Acta Numerica, 12:321–398, 2003.doi:10.1017/S0962492902000132

  13. [13]

    On the calculation of Jacobian matrices by the Markowitz rule

    Andreas Griewank and Shawn Reese. On the calculation of Jacobian matrices by the Markowitz rule. InAutomatic Differentiation of Algorithms: Theory, Implementation, and Application, pages 126–135. SIAM, 1991. URL:https://www.osti.gov/servlets/purl/5933713

  14. [14]

    Griewank and A

    Andreas Griewank and Andrea Walther.Evaluating derivatives: principles and techniques of algorithmic differentiation. SIAM, 2008.doi:10.1137/1.9780898717761

  15. [15]

    Gurobi Optimizer Reference Manual, 2026

    Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2026. URL:https://www. gurobi.com

  16. [16]

    Hovland, C

    P. Hovland, C. Bischof, D. Spiegelman, and M. Casella. Efficient derivative codes through automatic differentiation and interface contraction: An application in biostatistics.SIAM Journal on Scientific Computing, 18(4):1056–1066, 1997.doi:10.1137/S1064827595281800

  17. [17]

    History of automatic differentiation and rounding error estimation

    Masao Iri. History of automatic differentiation and rounding error estimation. InAutomatic Differentiation of Algorithms: Theory, Implementation, and Application, pages 1–16. SIAM, Philadelphia, PA, 1991

  18. [18]

    Tight Running Time Lower Bounds for Vertex Deletion Problems

    Christian Komusiewicz. Tight running time lower bounds for vertex deletion problems.ACM Transactions on Computation Theory (TOCT), 10(2):1–18, 2018.arXiv:1511.05449,doi: 10.1145/3186589

  19. [19]

    The representation of the cumulative rounding error of an algorithm as a taylor expansion of the local rounding errors

    Seppo Linnainmaa. The representation of the cumulative rounding error of an algorithm as a taylor expansion of the local rounding errors. Master’s thesis, University of Helsinki, 1970. In Finnish

  20. [20]

    Graphax.https://github.com/jamielohoff/graphax

    Jamie Lohoff. Graphax.https://github.com/jamielohoff/graphax. [Accessed: April 2026]. 15

  21. [21]

    Optimizing automatic differentiation with deep reinforcement learning

    Jamie Lohoff and Emre Neftci. Optimizing automatic differentiation with deep reinforcement learning. 37:3611–3648, 2024.arXiv:2406.05027,doi:10.52202/079017-0120

  22. [22]

    Randomized heuristics for exploiting Jacobian scarcity.Optimization Methods and Software, 27(2):311–322, 2012.doi:10.1080/10556788

    Andrew Lyons, Ilya Safro, and Jean Utke. Randomized heuristics for exploiting Jacobian scarcity.Optimization Methods and Software, 27(2):311–322, 2012.doi:10.1080/10556788. 2011.577774

  23. [23]

    PhD thesis, Dresden University of Technology, Germany, 1999

    Uwe Naumann.Efficient calculation of Jacobian matrices by optimized application of the chain rule to computational graphs. PhD thesis, Dresden University of Technology, Germany, 1999. URL:https://d-nb.info/956080170

  24. [24]

    Cheaper Jacobians by simulated annealing.SIAM J

    Uwe Naumann. Cheaper Jacobians by simulated annealing.SIAM J. on Optimization, 13(3):660–674, August 2002.doi:10.1137/S1052623400368394

  25. [25]

    Optimal vertex elimination in single-expression-use graphs

    Uwe Naumann and Yuxiao Hu. Optimal vertex elimination in single-expression-use graphs. ACM Transactions on Mathematical Software (TOMS), 35(1):1–20, 2008.doi:10.1145/ 1377603.1377605

  26. [26]

    Max Sagebaum, Tim Albring, and Nicolas R. Gauger. High-performance derivative computa- tions using codipack.ACM Trans. Math. Softw., 45(4), December 2019.arXiv:1709.07229, doi:10.1145/3356900

  27. [27]

    E. M. Tadjouddine. Vertex-ordering Algorithms for Automatic Differentiation of Computer Codes.The Computer Journal, 51(6):688–699, 2008.doi:10.1093/comjnl/bxm115. 16 A Deferred Proofs A.1 Deferred Proofs from Section 2 Observation 5.LetD= (V=S⊎I⊎T, E),X⊆I, andi, j∈V\X. Then(i, j)∈E(D X)if and only if there exists a directed path fromitojfor which every in...

  28. [28]

    Next, we assign theevariables

    Observe that constraints (5) and (6) are satisfied. Next, we assign theevariables. For each (i, j)∈E, sete ij = 1, as required by constraint (7). For eachs∈Sandb∈B, sete sb = 1 2, satisfying constraint (8) becausex as +x ab = 1 2 for every a∈A. Similarly, sete ut = 1 2 for everyu∈Uandt∈T. Next, for eachs∈Sandt∈T, sete st = 1. Finally, set all remaininge i...

  29. [29]

    Then, for eacha∈Aandt∈T, we setz at = 1

    Next, for each arc (u, v)∈E, we setz uv = 1, satisfying constraint (19). Then, for eacha∈Aandt∈T, we setz at = 1

  30. [30]

    Similarly, for eachs∈Sandb∈B, we set zsb = 1

    This is safe with respect to constraint (20) becausex b = 1 2 for everyb∈B. Similarly, for eachs∈Sandb∈B, we set zsb = 1

  31. [31]

    It remains to assign theyvariables

    Finally, for eachs∈Sandt∈T, we setz st = 0, which is safe with respect to (20) because for eacha∈A,x a = 1 2 andz at = 1 2. It remains to assign theyvariables. For eachs∈Sanda∈A, we sety sa = 1 2 if (s, a)∈E andy sa = 0 otherwise. Similarly, for eachb∈Bandt∈T, we sety bt = 1 2 if (b, t)∈Eandy bt = 0 otherwise. Both assignments are safe with respect to con...