A Comprehensive Evaluation of Vertex Elimination Algorithms for Algorithmic Differentiation
Pith reviewed 2026-06-30 14:02 UTC · model grok-4.3
The pith
New integer programming formulations solve optimal vertex elimination on graphs one to two orders of magnitude larger than before.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We design and engineer new integer programming formulations for Optimal Vertex Elimination and for a related objective we call Minimum Edge Count. Our implementations scale to graphs one-to-two orders of magnitude larger than existing techniques, enabling the assembly of a corpus of medium-sized graphs for which optimal solutions are known. This corpus facilitates a study of existing heuristics, confirming that on real data popular methods achieve high quality solutions. We also give a tight analysis of the forward and reverse modes of AD, extend the techniques to a simple approximation algorithm parameterized by the size of a minimum source-sink separator, and establish the first approximat
What carries the argument
Integer programming formulations that encode sequences of vertex eliminations and their associated costs in a DAG to compute an optimal elimination ordering or minimum edge count.
If this is right
- Popular heuristics for vertex elimination can be evaluated exactly against optimal solutions on a new corpus of medium-sized real graphs.
- Forward and reverse modes of algorithmic differentiation admit a tight cost analysis.
- Optimal Vertex Elimination admits an approximation algorithm whose ratio is parameterized by the size of a minimum source-sink separator.
- Both Optimal Vertex Elimination and Minimum Edge Count have provable approximation lower bounds.
Where Pith is reading between the lines
- The new solvable instances could serve as training or validation data for designing improved heuristics that generalize beyond the current corpus.
- Software systems performing automatic differentiation might incorporate the formulations for exact solving on graphs that fall inside the new size range.
- The approximation results may suggest similar parameterized algorithms for other ordering problems on DAGs that arise in compiler optimization.
Load-bearing premise
The integer programming formulations correctly encode the true costs of every possible sequence of vertex eliminations without omissions or deviations from the AD cost model.
What would settle it
An instance in which the integer program returns an elimination ordering whose cost differs from the cost obtained by exhaustive enumeration on the same graph would falsify the claim that the formulations are correct.
Figures
read the original abstract
The algorithmic differentiation (AD) of mathematical functions can be interpreted as a sequence of vertex eliminations in an underlying directed acyclic graph. The problem of determining a minimum-cost elimination ordering, which we call Optimal Vertex Elimination, is NP-complete. Consequently, much effort has been devoted to the design of heuristics. Many of these heuristics are widely believed to perform well in practice, but this hypothesis has so far been difficult to test due to the lack of scalable exact methods. We design and engineer new integer programming formulations for Optimal Vertex Eliminatioin and for a related objective we call Minimum Edge Count. Our implementations scale to graphs one-to-two orders of magnitude larger than existing techniques, enabling the assembly of a corpus of medium-sized graphs for which optimal solutions are known. This corpus facilitates a study of existing heuristics, confirming that on real data popular methods achieve high quality solutions. We also make several theoretical contributions. We give a tight analysis of the forward and reverse modes of AD, and extend our techniques to provide a simple algorithm for Optimal Vertex Elimination with approximation ratio parameterized by the size of a minimum source-sink separator. On the complexity side, we give the first approximation lower bounds for both problems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces new integer programming formulations for the Optimal Vertex Elimination problem (and a related Minimum Edge Count objective) arising in algorithmic differentiation on DAGs. These formulations are claimed to scale to instances one-to-two orders of magnitude larger than prior exact methods, enabling construction of a corpus of medium-sized graphs with known optimal solutions. The corpus is used to benchmark existing heuristics, which are reported to achieve high-quality solutions on real data. Additional theoretical results include a tight analysis of forward and reverse AD modes, a simple approximation algorithm whose ratio is parameterized by the size of a minimum source-sink separator, and the first approximation lower bounds for both problems.
Significance. If the IP formulations are correct and the claimed scaling holds, the work would be significant for the AD and combinatorial optimization communities by providing the first scalable exact solver for a classically hard problem, together with a reusable benchmark corpus. The theoretical contributions on forward/reverse analysis and separator-parameterized approximation are also valuable. The paper ships concrete engineering advances (new IPs, larger solvable instances) that directly address the prior lack of exact methods for heuristic evaluation.
minor comments (2)
- [Abstract] Abstract: 'Optimal Vertex Eliminatioin' contains a typographical error and should read 'Elimination'.
- [Modeling section] The manuscript would benefit from an explicit statement (perhaps in §3 or the modeling section) confirming that the new IP objective and constraints exactly reproduce the standard AD elimination cost model without omitted fill-in terms or surrogate objectives.
Simulated Author's Rebuttal
We thank the referee for the positive assessment, accurate summary of our contributions on scalable IP formulations for Optimal Vertex Elimination, the benchmark corpus, and the theoretical results on forward/reverse modes, separator-parameterized approximation, and lower bounds. The recommendation for minor revision is noted.
Circularity Check
No significant circularity
full rationale
The paper's central contributions consist of new integer programming formulations for Optimal Vertex Elimination and Minimum Edge Count, presented as independent modeling work that encodes elimination costs on directed acyclic graphs. These build on the externally known NP-completeness result without the formulations themselves reducing to self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations. Theoretical extensions (forward/reverse mode analysis, separator-parameterized approximation, and approximation lower bounds) are stated as separate results with no equations or claims shown to be equivalent to their inputs by construction. The work is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Determining a minimum-cost elimination ordering is NP-complete.
Reference graph
Works this paper leans on
-
[1]
Sinan G Aksoy, Ryan Bennink, Yuzhou Chen, Jos´ e Fr´ ıas, Yulia R Gel, Bill Kay, Uwe Naumann, Carlos Ortiz Marrero, Anthony V Petyuk, Sandip Roy, Ignacio Segovia-Dominguez, Nate Veldt, and Stephen J. Young. Seven open problems in applied combinatorics.Journal of Combinatorics, 14(4):559–601, 2023.arXiv:2303.11464,doi:10.4310/joc.2023.v14.n4. a8
-
[2]
Markowitz-type heuristics for com- puting jacobian matrices efficiently
Andreas Albrecht, Peter Gottschling, and Uwe Naumann. Markowitz-type heuristics for com- puting jacobian matrices efficiently. InInternational Conference on Computational Science, pages 575–584. Springer, 2003.doi:10.1007/3-540-44862-4_61
-
[3]
Hardness of approximating problems on cubic graphs
Paola Alimonti and Viggo Kann. Hardness of approximating problems on cubic graphs. In Italian conference on algorithms and complexity, pages 288–298. Springer, 1997.doi:10.1007/ 3-540-62592-5_80
1997
-
[4]
Friedrich L Bauer. Computational graphs and rounding error.SIAM Journal on Numerical Analysis, 11(1):87–96, 1974.doi:10.1137/0711010
-
[5]
Matthias Bentert, Alex Crane, P˚ al Grøn˚ as Drange, Yosuke Mizutani, and Blair D Sullivan. Structural optimal jacobian accumulation and minimum edge count are NP-complete under vertex elimination.arXiv preprint, 2025.arXiv:2506.17521. 14
-
[6]
An Algo- rithmic Weakening of the Erd˝ os-Hajnal Conjecture
´Edouard Bonnet, St´ ephan Thomass´ e, Xuan Thang Tran, and R´ emi Watrigant. An Algo- rithmic Weakening of the Erd˝ os-Hajnal Conjecture. In28th Annual European Symposium on Algorithms (ESA 2020), volume 173 ofLeibniz International Proceedings in Informatics (LIPIcs), pages 23:1–23:18, Dagstuhl, Germany, 2020. Schloss Dagstuhl – Leibniz-Zentrum f¨ ur Inf...
-
[7]
TEST NLS nonlinear least squares test problems.https://people.math
John Burkhardt. TEST NLS nonlinear least squares test problems.https://people.math. sc.edu/burkardt/f_src/test_nls/test_nls.html
-
[8]
An integer programming approach to optimal derivative accumulation
Jieqiu Chen, Paul Hovland, Todd Munson, and Jean Utke. An integer programming approach to optimal derivative accumulation. InRecent Advances in Algorithmic Differentiation, pages 221–231. Springer, 2012.doi:10.1007/978-3-642-30023-3_20
-
[9]
An integer programming ap- proach to optimal derivative accumulation
Jieqiu Chen, Paul Hovland, Todd Munson, and Jean Utke. An integer programming ap- proach to optimal derivative accumulation. Technical report, Argonne National Laboratory,
-
[10]
URL:https://www.mcs.anl.gov/uploads/ cels/papers/1994-0112.pdf
Extended version of conference paper. URL:https://www.mcs.anl.gov/uploads/ cels/papers/1994-0112.pdf
1994
-
[11]
Springer, 2012.doi:10.1007/978-3-662-70107-2
Reinhard Diestel.Graph Theory, 4th Edition, volume 173 ofGraduate texts in mathematics. Springer, 2012.doi:10.1007/978-3-662-70107-2
-
[12]
Andreas Griewank. A mathematical view of automatic differentiation.Acta Numerica, 12:321–398, 2003.doi:10.1017/S0962492902000132
-
[13]
On the calculation of Jacobian matrices by the Markowitz rule
Andreas Griewank and Shawn Reese. On the calculation of Jacobian matrices by the Markowitz rule. InAutomatic Differentiation of Algorithms: Theory, Implementation, and Application, pages 126–135. SIAM, 1991. URL:https://www.osti.gov/servlets/purl/5933713
-
[14]
Andreas Griewank and Andrea Walther.Evaluating derivatives: principles and techniques of algorithmic differentiation. SIAM, 2008.doi:10.1137/1.9780898717761
-
[15]
Gurobi Optimizer Reference Manual, 2026
Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2026. URL:https://www. gurobi.com
2026
-
[16]
P. Hovland, C. Bischof, D. Spiegelman, and M. Casella. Efficient derivative codes through automatic differentiation and interface contraction: An application in biostatistics.SIAM Journal on Scientific Computing, 18(4):1056–1066, 1997.doi:10.1137/S1064827595281800
-
[17]
History of automatic differentiation and rounding error estimation
Masao Iri. History of automatic differentiation and rounding error estimation. InAutomatic Differentiation of Algorithms: Theory, Implementation, and Application, pages 1–16. SIAM, Philadelphia, PA, 1991
1991
-
[18]
Tight Running Time Lower Bounds for Vertex Deletion Problems
Christian Komusiewicz. Tight running time lower bounds for vertex deletion problems.ACM Transactions on Computation Theory (TOCT), 10(2):1–18, 2018.arXiv:1511.05449,doi: 10.1145/3186589
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/3186589 2018
-
[19]
The representation of the cumulative rounding error of an algorithm as a taylor expansion of the local rounding errors
Seppo Linnainmaa. The representation of the cumulative rounding error of an algorithm as a taylor expansion of the local rounding errors. Master’s thesis, University of Helsinki, 1970. In Finnish
1970
-
[20]
Graphax.https://github.com/jamielohoff/graphax
Jamie Lohoff. Graphax.https://github.com/jamielohoff/graphax. [Accessed: April 2026]. 15
2026
-
[21]
Optimizing automatic differentiation with deep reinforcement learning
Jamie Lohoff and Emre Neftci. Optimizing automatic differentiation with deep reinforcement learning. 37:3611–3648, 2024.arXiv:2406.05027,doi:10.52202/079017-0120
-
[22]
Andrew Lyons, Ilya Safro, and Jean Utke. Randomized heuristics for exploiting Jacobian scarcity.Optimization Methods and Software, 27(2):311–322, 2012.doi:10.1080/10556788. 2011.577774
-
[23]
PhD thesis, Dresden University of Technology, Germany, 1999
Uwe Naumann.Efficient calculation of Jacobian matrices by optimized application of the chain rule to computational graphs. PhD thesis, Dresden University of Technology, Germany, 1999. URL:https://d-nb.info/956080170
-
[24]
Cheaper Jacobians by simulated annealing.SIAM J
Uwe Naumann. Cheaper Jacobians by simulated annealing.SIAM J. on Optimization, 13(3):660–674, August 2002.doi:10.1137/S1052623400368394
-
[25]
Optimal vertex elimination in single-expression-use graphs
Uwe Naumann and Yuxiao Hu. Optimal vertex elimination in single-expression-use graphs. ACM Transactions on Mathematical Software (TOMS), 35(1):1–20, 2008.doi:10.1145/ 1377603.1377605
-
[26]
Max Sagebaum, Tim Albring, and Nicolas R. Gauger. High-performance derivative computa- tions using codipack.ACM Trans. Math. Softw., 45(4), December 2019.arXiv:1709.07229, doi:10.1145/3356900
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/3356900 2019
-
[27]
E. M. Tadjouddine. Vertex-ordering Algorithms for Automatic Differentiation of Computer Codes.The Computer Journal, 51(6):688–699, 2008.doi:10.1093/comjnl/bxm115. 16 A Deferred Proofs A.1 Deferred Proofs from Section 2 Observation 5.LetD= (V=S⊎I⊎T, E),X⊆I, andi, j∈V\X. Then(i, j)∈E(D X)if and only if there exists a directed path fromitojfor which every in...
-
[28]
Next, we assign theevariables
Observe that constraints (5) and (6) are satisfied. Next, we assign theevariables. For each (i, j)∈E, sete ij = 1, as required by constraint (7). For eachs∈Sandb∈B, sete sb = 1 2, satisfying constraint (8) becausex as +x ab = 1 2 for every a∈A. Similarly, sete ut = 1 2 for everyu∈Uandt∈T. Next, for eachs∈Sandt∈T, sete st = 1. Finally, set all remaininge i...
-
[29]
Then, for eacha∈Aandt∈T, we setz at = 1
Next, for each arc (u, v)∈E, we setz uv = 1, satisfying constraint (19). Then, for eacha∈Aandt∈T, we setz at = 1
-
[30]
Similarly, for eachs∈Sandb∈B, we set zsb = 1
This is safe with respect to constraint (20) becausex b = 1 2 for everyb∈B. Similarly, for eachs∈Sandb∈B, we set zsb = 1
-
[31]
It remains to assign theyvariables
Finally, for eachs∈Sandt∈T, we setz st = 0, which is safe with respect to (20) because for eacha∈A,x a = 1 2 andz at = 1 2. It remains to assign theyvariables. For eachs∈Sanda∈A, we sety sa = 1 2 if (s, a)∈E andy sa = 0 otherwise. Similarly, for eachb∈Bandt∈T, we sety bt = 1 2 if (b, t)∈Eandy bt = 0 otherwise. Both assignments are safe with respect to con...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.