A Comprehensive Evaluation of Vertex Elimination Algorithms for Algorithmic Differentiation

Alex Crane; Andrew Lyons; Blair D. Sullivan; Eli Friedman; Jan H\"uckelheim; Mac\'eo Ottavy; P{\aa}l Gr{\o}n{\aa}s Drange; Paul D. Hovland; Yosuke Mizutani

arxiv: 2605.24280 · v1 · pith:7PRLBYYBnew · submitted 2026-05-22 · 💻 cs.DS

A Comprehensive Evaluation of Vertex Elimination Algorithms for Algorithmic Differentiation

Alex Crane , P{\aa}l Gr{\o}n{\aa}s Drange , Eli Friedman , Paul D. Hovland , Jan H\"uckelheim , Andrew Lyons , Yosuke Mizutani , Mac\'eo Ottavy

show 1 more author

Blair D. Sullivan

This is my paper

Pith reviewed 2026-06-30 14:02 UTC · model grok-4.3

classification 💻 cs.DS

keywords algorithmic differentiationvertex eliminationinteger programmingoptimal orderingheuristicsapproximation algorithmsdirected acyclic graphsNP-complete

0 comments

The pith

New integer programming formulations solve optimal vertex elimination on graphs one to two orders of magnitude larger than before.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces integer programming models for finding minimum-cost sequences of vertex eliminations in directed acyclic graphs that represent computations in algorithmic differentiation. Because the underlying ordering problem is NP-complete, prior work relied on heuristics whose practical performance could not be checked exactly on medium-sized instances. The new models produce a collection of graphs with verified optimal solutions and show that widely used heuristics recover high-quality orderings on real data. Additional results include a tight characterization of the costs of forward and reverse modes together with the first approximation lower bounds for the problem.

Core claim

We design and engineer new integer programming formulations for Optimal Vertex Elimination and for a related objective we call Minimum Edge Count. Our implementations scale to graphs one-to-two orders of magnitude larger than existing techniques, enabling the assembly of a corpus of medium-sized graphs for which optimal solutions are known. This corpus facilitates a study of existing heuristics, confirming that on real data popular methods achieve high quality solutions. We also give a tight analysis of the forward and reverse modes of AD, extend the techniques to a simple approximation algorithm parameterized by the size of a minimum source-sink separator, and establish the first approximat

What carries the argument

Integer programming formulations that encode sequences of vertex eliminations and their associated costs in a DAG to compute an optimal elimination ordering or minimum edge count.

If this is right

Popular heuristics for vertex elimination can be evaluated exactly against optimal solutions on a new corpus of medium-sized real graphs.
Forward and reverse modes of algorithmic differentiation admit a tight cost analysis.
Optimal Vertex Elimination admits an approximation algorithm whose ratio is parameterized by the size of a minimum source-sink separator.
Both Optimal Vertex Elimination and Minimum Edge Count have provable approximation lower bounds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The new solvable instances could serve as training or validation data for designing improved heuristics that generalize beyond the current corpus.
Software systems performing automatic differentiation might incorporate the formulations for exact solving on graphs that fall inside the new size range.
The approximation results may suggest similar parameterized algorithms for other ordering problems on DAGs that arise in compiler optimization.

Load-bearing premise

The integer programming formulations correctly encode the true costs of every possible sequence of vertex eliminations without omissions or deviations from the AD cost model.

What would settle it

An instance in which the integer program returns an elimination ordering whose cost differs from the cost obtained by exhaustive enumeration on the same graph would falsify the claim that the formulations are correct.

Figures

Figures reproduced from arXiv: 2605.24280 by Alex Crane, Andrew Lyons, Blair D. Sullivan, Eli Friedman, Jan H\"uckelheim, Mac\'eo Ottavy, P{\aa}l Gr{\o}n{\aa}s Drange, Paul D. Hovland, Yosuke Mizutani.

**Figure 2.** Figure 2: The construction used in the proof of Proposition 6. The edges between horizontally [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗

**Figure 3.** Figure 3: The construction used in the proof of Proposition 7. The sets [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗

**Figure 4.** Figure 4: The construction used to show that the bound of Theorem 9 is tight. [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗

**Figure 5.** Figure 5: Example of a DAG D where MiddleOut is an Ω(|C| 2 )-approximation. Let C = {b1, . . . , bn, w1, . . . , wn}. Then MiddleOut(D, C) gives elimination sequence (a, u, b1, . . . , bn, w1, . . . , wn) and incurs cost n 3 + 3n 2 = Θ(|C| 3 ), while the optimal elimination sequence (b1, . . . , bn, w1, . . . , wn, u, a) incurs cost 3n + 1 = Θ(|C|). Now, we pick up from Equation (24), once more bounding |N + D′ π(v)… view at source ↗

**Figure 6.** Figure 6: Gadget for vertex v used in the proof of Theorem 19. The vertex v ′ is a replica vertex, and the sets Iv and Ov contain auxiliary vertices; the cardinalities of these sets are indicated above. Bolded arrows represent multiple edges. Dashed arrows represent edges to or from the replica vertices in other gadgets. T is a global set of cardinality 3n 2 , shared by all vertex gadgets. We can now state our resul… view at source ↗

**Figure 10.** Figure 10: fig10.4 10 12 [PITH_FULL_IMAGE:figures/full_fig_p031_10.png] view at source ↗

**Figure 10.** Figure 10: fig10.4 10 12 [PITH_FULL_IMAGE:figures/full_fig_p033_10.png] view at source ↗

**Figure 10.** Figure 10: fig10.4 10 12 [PITH_FULL_IMAGE:figures/full_fig_p036_10.png] view at source ↗

**Figure 10.** Figure 10: fig10.4 10 12 [PITH_FULL_IMAGE:figures/full_fig_p038_10.png] view at source ↗

**Figure 10.** Figure 10: fig10.4 10 12 [PITH_FULL_IMAGE:figures/full_fig_p040_10.png] view at source ↗

**Figure 10.** Figure 10: fig10.3 18 20 [PITH_FULL_IMAGE:figures/full_fig_p042_10.png] view at source ↗

**Figure 10.** Figure 10: fig10.3 18 20 [PITH_FULL_IMAGE:figures/full_fig_p043_10.png] view at source ↗

**Figure 10.** Figure 10: fig10.3 18 20 [PITH_FULL_IMAGE:figures/full_fig_p044_10.png] view at source ↗

**Figure 10.** Figure 10: fig10.3 18 20 [PITH_FULL_IMAGE:figures/full_fig_p049_10.png] view at source ↗

**Figure 10.** Figure 10: fig10.3 18 20 [PITH_FULL_IMAGE:figures/full_fig_p050_10.png] view at source ↗

**Figure 10.** Figure 10: fig10.3 18 20 [PITH_FULL_IMAGE:figures/full_fig_p051_10.png] view at source ↗

read the original abstract

The algorithmic differentiation (AD) of mathematical functions can be interpreted as a sequence of vertex eliminations in an underlying directed acyclic graph. The problem of determining a minimum-cost elimination ordering, which we call Optimal Vertex Elimination, is NP-complete. Consequently, much effort has been devoted to the design of heuristics. Many of these heuristics are widely believed to perform well in practice, but this hypothesis has so far been difficult to test due to the lack of scalable exact methods. We design and engineer new integer programming formulations for Optimal Vertex Eliminatioin and for a related objective we call Minimum Edge Count. Our implementations scale to graphs one-to-two orders of magnitude larger than existing techniques, enabling the assembly of a corpus of medium-sized graphs for which optimal solutions are known. This corpus facilitates a study of existing heuristics, confirming that on real data popular methods achieve high quality solutions. We also make several theoretical contributions. We give a tight analysis of the forward and reverse modes of AD, and extend our techniques to provide a simple algorithm for Optimal Vertex Elimination with approximation ratio parameterized by the size of a minimum source-sink separator. On the complexity side, we give the first approximation lower bounds for both problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Scalable IP formulations for Optimal Vertex Elimination let them solve bigger graphs and test heuristics exactly, plus they add the first approximation lower bounds.

read the letter

The main advance is new integer programming formulations for Optimal Vertex Elimination and Minimum Edge Count that scale to graphs one or two orders of magnitude larger than prior exact methods. This lets them build a corpus of medium-sized graphs with known optimal solutions and run a direct comparison showing that popular heuristics already produce high-quality results on real data.

The formulations and the corpus are the practical payoff. They also give a tight analysis of forward and reverse AD modes, a separator-parameterized approximation algorithm, and the first approximation lower bounds for both problems. These pieces sit on top of the known NP-completeness result without circularity.

The engineering that achieves the scaling is what makes the heuristic study possible, and the theoretical extensions look like straightforward but useful additions. The lower bounds in particular close a gap that heuristic papers had left open.

The main thing to check is whether the IP constraints and objectives fully capture the true AD elimination costs; any modeling gap would affect both the corpus and the heuristic evaluation. The representativeness of the graphs in the corpus also matters for how far the positive heuristic results generalize.

This is for researchers who work on algorithmic differentiation or on elimination orders in DAGs. Anyone who needs either exact solutions on medium graphs or concrete approximation limits will find the corpus and the bounds directly usable. It has enough new modeling and theory to justify sending it to a serious referee.

Referee Report

0 major / 2 minor

Summary. The paper introduces new integer programming formulations for the Optimal Vertex Elimination problem (and a related Minimum Edge Count objective) arising in algorithmic differentiation on DAGs. These formulations are claimed to scale to instances one-to-two orders of magnitude larger than prior exact methods, enabling construction of a corpus of medium-sized graphs with known optimal solutions. The corpus is used to benchmark existing heuristics, which are reported to achieve high-quality solutions on real data. Additional theoretical results include a tight analysis of forward and reverse AD modes, a simple approximation algorithm whose ratio is parameterized by the size of a minimum source-sink separator, and the first approximation lower bounds for both problems.

Significance. If the IP formulations are correct and the claimed scaling holds, the work would be significant for the AD and combinatorial optimization communities by providing the first scalable exact solver for a classically hard problem, together with a reusable benchmark corpus. The theoretical contributions on forward/reverse analysis and separator-parameterized approximation are also valuable. The paper ships concrete engineering advances (new IPs, larger solvable instances) that directly address the prior lack of exact methods for heuristic evaluation.

minor comments (2)

[Abstract] Abstract: 'Optimal Vertex Eliminatioin' contains a typographical error and should read 'Elimination'.
[Modeling section] The manuscript would benefit from an explicit statement (perhaps in §3 or the modeling section) confirming that the new IP objective and constraints exactly reproduce the standard AD elimination cost model without omitted fill-in terms or surrogate objectives.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment, accurate summary of our contributions on scalable IP formulations for Optimal Vertex Elimination, the benchmark corpus, and the theoretical results on forward/reverse modes, separator-parameterized approximation, and lower bounds. The recommendation for minor revision is noted.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central contributions consist of new integer programming formulations for Optimal Vertex Elimination and Minimum Edge Count, presented as independent modeling work that encodes elimination costs on directed acyclic graphs. These build on the externally known NP-completeness result without the formulations themselves reducing to self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations. Theoretical extensions (forward/reverse mode analysis, separator-parameterized approximation, and approximation lower bounds) are stated as separate results with no equations or claims shown to be equivalent to their inputs by construction. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the modeling fidelity of the new IP formulations and on the representativeness of the generated medium-sized graph corpus; both cannot be inspected from the abstract alone. The paper invokes the known NP-completeness of the problem as background.

axioms (1)

standard math Determining a minimum-cost elimination ordering is NP-complete.
Stated directly in the abstract as established prior knowledge.

pith-pipeline@v0.9.1-grok · 5783 in / 1105 out tokens · 77136 ms · 2026-06-30T14:02:34.309633+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 19 canonical work pages · 2 internal anchors

[1]

Sinan G Aksoy, Ryan Bennink, Yuzhou Chen, Jos´ e Fr´ ıas, Yulia R Gel, Bill Kay, Uwe Naumann, Carlos Ortiz Marrero, Anthony V Petyuk, Sandip Roy, Ignacio Segovia-Dominguez, Nate Veldt, and Stephen J. Young. Seven open problems in applied combinatorics.Journal of Combinatorics, 14(4):559–601, 2023.arXiv:2303.11464,doi:10.4310/joc.2023.v14.n4. a8

work page doi:10.4310/joc.2023.v14.n4 2023
[2]

Markowitz-type heuristics for com- puting jacobian matrices efficiently

Andreas Albrecht, Peter Gottschling, and Uwe Naumann. Markowitz-type heuristics for com- puting jacobian matrices efficiently. InInternational Conference on Computational Science, pages 575–584. Springer, 2003.doi:10.1007/3-540-44862-4_61

work page doi:10.1007/3-540-44862-4_61 2003
[3]

Hardness of approximating problems on cubic graphs

Paola Alimonti and Viggo Kann. Hardness of approximating problems on cubic graphs. In Italian conference on algorithms and complexity, pages 288–298. Springer, 1997.doi:10.1007/ 3-540-62592-5_80

1997
[4]

Computational graphs and rounding error.SIAM Journal on Numerical Analysis, 11(1):87–96, 1974.doi:10.1137/0711010

Friedrich L Bauer. Computational graphs and rounding error.SIAM Journal on Numerical Analysis, 11(1):87–96, 1974.doi:10.1137/0711010

work page doi:10.1137/0711010 1974
[5]

Structural optimal jacobian accumulation and minimum edge count are NP-complete under vertex elimination.arXiv preprint, 2025.arXiv:2506.17521

Matthias Bentert, Alex Crane, P˚ al Grøn˚ as Drange, Yosuke Mizutani, and Blair D Sullivan. Structural optimal jacobian accumulation and minimum edge count are NP-complete under vertex elimination.arXiv preprint, 2025.arXiv:2506.17521. 14

work page arXiv 2025
[6]

An Algo- rithmic Weakening of the Erd˝ os-Hajnal Conjecture

´Edouard Bonnet, St´ ephan Thomass´ e, Xuan Thang Tran, and R´ emi Watrigant. An Algo- rithmic Weakening of the Erd˝ os-Hajnal Conjecture. In28th Annual European Symposium on Algorithms (ESA 2020), volume 173 ofLeibniz International Proceedings in Informatics (LIPIcs), pages 23:1–23:18, Dagstuhl, Germany, 2020. Schloss Dagstuhl – Leibniz-Zentrum f¨ ur Inf...

work page doi:10.4230/lipics.esa.2020.23 2020
[7]

TEST NLS nonlinear least squares test problems.https://people.math

John Burkhardt. TEST NLS nonlinear least squares test problems.https://people.math. sc.edu/burkardt/f_src/test_nls/test_nls.html
[8]

An integer programming approach to optimal derivative accumulation

Jieqiu Chen, Paul Hovland, Todd Munson, and Jean Utke. An integer programming approach to optimal derivative accumulation. InRecent Advances in Algorithmic Differentiation, pages 221–231. Springer, 2012.doi:10.1007/978-3-642-30023-3_20

work page doi:10.1007/978-3-642-30023-3_20 2012
[9]

An integer programming ap- proach to optimal derivative accumulation

Jieqiu Chen, Paul Hovland, Todd Munson, and Jean Utke. An integer programming ap- proach to optimal derivative accumulation. Technical report, Argonne National Laboratory,
[10]

URL:https://www.mcs.anl.gov/uploads/ cels/papers/1994-0112.pdf

Extended version of conference paper. URL:https://www.mcs.anl.gov/uploads/ cels/papers/1994-0112.pdf

1994
[11]

Springer, 2012.doi:10.1007/978-3-662-70107-2

Reinhard Diestel.Graph Theory, 4th Edition, volume 173 ofGraduate texts in mathematics. Springer, 2012.doi:10.1007/978-3-662-70107-2

work page doi:10.1007/978-3-662-70107-2 2012
[12]

A mathematical view of automatic differentiation.Acta Numerica, 12:321–398, 2003.doi:10.1017/S0962492902000132

Andreas Griewank. A mathematical view of automatic differentiation.Acta Numerica, 12:321–398, 2003.doi:10.1017/S0962492902000132

work page doi:10.1017/s0962492902000132 2003
[13]

On the calculation of Jacobian matrices by the Markowitz rule

Andreas Griewank and Shawn Reese. On the calculation of Jacobian matrices by the Markowitz rule. InAutomatic Differentiation of Algorithms: Theory, Implementation, and Application, pages 126–135. SIAM, 1991. URL:https://www.osti.gov/servlets/purl/5933713

work page arXiv 1991
[14]

Griewank and A

Andreas Griewank and Andrea Walther.Evaluating derivatives: principles and techniques of algorithmic differentiation. SIAM, 2008.doi:10.1137/1.9780898717761

work page doi:10.1137/1.9780898717761 2008
[15]

Gurobi Optimizer Reference Manual, 2026

Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2026. URL:https://www. gurobi.com

2026
[16]

Hovland, C

P. Hovland, C. Bischof, D. Spiegelman, and M. Casella. Efficient derivative codes through automatic differentiation and interface contraction: An application in biostatistics.SIAM Journal on Scientific Computing, 18(4):1056–1066, 1997.doi:10.1137/S1064827595281800

work page doi:10.1137/s1064827595281800 1997
[17]

History of automatic differentiation and rounding error estimation

Masao Iri. History of automatic differentiation and rounding error estimation. InAutomatic Differentiation of Algorithms: Theory, Implementation, and Application, pages 1–16. SIAM, Philadelphia, PA, 1991

1991
[18]

Tight Running Time Lower Bounds for Vertex Deletion Problems

Christian Komusiewicz. Tight running time lower bounds for vertex deletion problems.ACM Transactions on Computation Theory (TOCT), 10(2):1–18, 2018.arXiv:1511.05449,doi: 10.1145/3186589

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/3186589 2018
[19]

The representation of the cumulative rounding error of an algorithm as a taylor expansion of the local rounding errors

Seppo Linnainmaa. The representation of the cumulative rounding error of an algorithm as a taylor expansion of the local rounding errors. Master’s thesis, University of Helsinki, 1970. In Finnish

1970
[20]

Graphax.https://github.com/jamielohoff/graphax

Jamie Lohoff. Graphax.https://github.com/jamielohoff/graphax. [Accessed: April 2026]. 15

2026
[21]

Optimizing automatic differentiation with deep reinforcement learning

Jamie Lohoff and Emre Neftci. Optimizing automatic differentiation with deep reinforcement learning. 37:3611–3648, 2024.arXiv:2406.05027,doi:10.52202/079017-0120

work page doi:10.52202/079017-0120 2024
[22]

Randomized heuristics for exploiting Jacobian scarcity.Optimization Methods and Software, 27(2):311–322, 2012.doi:10.1080/10556788

Andrew Lyons, Ilya Safro, and Jean Utke. Randomized heuristics for exploiting Jacobian scarcity.Optimization Methods and Software, 27(2):311–322, 2012.doi:10.1080/10556788. 2011.577774

work page doi:10.1080/10556788 2012
[23]

PhD thesis, Dresden University of Technology, Germany, 1999

Uwe Naumann.Efficient calculation of Jacobian matrices by optimized application of the chain rule to computational graphs. PhD thesis, Dresden University of Technology, Germany, 1999. URL:https://d-nb.info/956080170

work page arXiv 1999
[24]

Cheaper Jacobians by simulated annealing.SIAM J

Uwe Naumann. Cheaper Jacobians by simulated annealing.SIAM J. on Optimization, 13(3):660–674, August 2002.doi:10.1137/S1052623400368394

work page doi:10.1137/s1052623400368394 2002
[25]

Optimal vertex elimination in single-expression-use graphs

Uwe Naumann and Yuxiao Hu. Optimal vertex elimination in single-expression-use graphs. ACM Transactions on Mathematical Software (TOMS), 35(1):1–20, 2008.doi:10.1145/ 1377603.1377605

work page arXiv 2008
[26]

Max Sagebaum, Tim Albring, and Nicolas R. Gauger. High-performance derivative computa- tions using codipack.ACM Trans. Math. Softw., 45(4), December 2019.arXiv:1709.07229, doi:10.1145/3356900

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/3356900 2019
[27]

E. M. Tadjouddine. Vertex-ordering Algorithms for Automatic Differentiation of Computer Codes.The Computer Journal, 51(6):688–699, 2008.doi:10.1093/comjnl/bxm115. 16 A Deferred Proofs A.1 Deferred Proofs from Section 2 Observation 5.LetD= (V=S⊎I⊎T, E),X⊆I, andi, j∈V\X. Then(i, j)∈E(D X)if and only if there exists a directed path fromitojfor which every in...

work page doi:10.1093/comjnl/bxm115 2008
[28]

Next, we assign theevariables

Observe that constraints (5) and (6) are satisfied. Next, we assign theevariables. For each (i, j)∈E, sete ij = 1, as required by constraint (7). For eachs∈Sandb∈B, sete sb = 1 2, satisfying constraint (8) becausex as +x ab = 1 2 for every a∈A. Similarly, sete ut = 1 2 for everyu∈Uandt∈T. Next, for eachs∈Sandt∈T, sete st = 1. Finally, set all remaininge i...
[29]

Then, for eacha∈Aandt∈T, we setz at = 1

Next, for each arc (u, v)∈E, we setz uv = 1, satisfying constraint (19). Then, for eacha∈Aandt∈T, we setz at = 1
[30]

Similarly, for eachs∈Sandb∈B, we set zsb = 1

This is safe with respect to constraint (20) becausex b = 1 2 for everyb∈B. Similarly, for eachs∈Sandb∈B, we set zsb = 1
[31]

It remains to assign theyvariables

Finally, for eachs∈Sandt∈T, we setz st = 0, which is safe with respect to (20) because for eacha∈A,x a = 1 2 andz at = 1 2. It remains to assign theyvariables. For eachs∈Sanda∈A, we sety sa = 1 2 if (s, a)∈E andy sa = 0 otherwise. Similarly, for eachb∈Bandt∈T, we sety bt = 1 2 if (b, t)∈Eandy bt = 0 otherwise. Both assignments are safe with respect to con...

[1] [1]

Sinan G Aksoy, Ryan Bennink, Yuzhou Chen, Jos´ e Fr´ ıas, Yulia R Gel, Bill Kay, Uwe Naumann, Carlos Ortiz Marrero, Anthony V Petyuk, Sandip Roy, Ignacio Segovia-Dominguez, Nate Veldt, and Stephen J. Young. Seven open problems in applied combinatorics.Journal of Combinatorics, 14(4):559–601, 2023.arXiv:2303.11464,doi:10.4310/joc.2023.v14.n4. a8

work page doi:10.4310/joc.2023.v14.n4 2023

[2] [2]

Markowitz-type heuristics for com- puting jacobian matrices efficiently

Andreas Albrecht, Peter Gottschling, and Uwe Naumann. Markowitz-type heuristics for com- puting jacobian matrices efficiently. InInternational Conference on Computational Science, pages 575–584. Springer, 2003.doi:10.1007/3-540-44862-4_61

work page doi:10.1007/3-540-44862-4_61 2003

[3] [3]

Hardness of approximating problems on cubic graphs

Paola Alimonti and Viggo Kann. Hardness of approximating problems on cubic graphs. In Italian conference on algorithms and complexity, pages 288–298. Springer, 1997.doi:10.1007/ 3-540-62592-5_80

1997

[4] [4]

Computational graphs and rounding error.SIAM Journal on Numerical Analysis, 11(1):87–96, 1974.doi:10.1137/0711010

Friedrich L Bauer. Computational graphs and rounding error.SIAM Journal on Numerical Analysis, 11(1):87–96, 1974.doi:10.1137/0711010

work page doi:10.1137/0711010 1974

[5] [5]

Structural optimal jacobian accumulation and minimum edge count are NP-complete under vertex elimination.arXiv preprint, 2025.arXiv:2506.17521

Matthias Bentert, Alex Crane, P˚ al Grøn˚ as Drange, Yosuke Mizutani, and Blair D Sullivan. Structural optimal jacobian accumulation and minimum edge count are NP-complete under vertex elimination.arXiv preprint, 2025.arXiv:2506.17521. 14

work page arXiv 2025

[6] [6]

An Algo- rithmic Weakening of the Erd˝ os-Hajnal Conjecture

´Edouard Bonnet, St´ ephan Thomass´ e, Xuan Thang Tran, and R´ emi Watrigant. An Algo- rithmic Weakening of the Erd˝ os-Hajnal Conjecture. In28th Annual European Symposium on Algorithms (ESA 2020), volume 173 ofLeibniz International Proceedings in Informatics (LIPIcs), pages 23:1–23:18, Dagstuhl, Germany, 2020. Schloss Dagstuhl – Leibniz-Zentrum f¨ ur Inf...

work page doi:10.4230/lipics.esa.2020.23 2020

[7] [7]

TEST NLS nonlinear least squares test problems.https://people.math

John Burkhardt. TEST NLS nonlinear least squares test problems.https://people.math. sc.edu/burkardt/f_src/test_nls/test_nls.html

[8] [8]

An integer programming approach to optimal derivative accumulation

Jieqiu Chen, Paul Hovland, Todd Munson, and Jean Utke. An integer programming approach to optimal derivative accumulation. InRecent Advances in Algorithmic Differentiation, pages 221–231. Springer, 2012.doi:10.1007/978-3-642-30023-3_20

work page doi:10.1007/978-3-642-30023-3_20 2012

[9] [9]

An integer programming ap- proach to optimal derivative accumulation

Jieqiu Chen, Paul Hovland, Todd Munson, and Jean Utke. An integer programming ap- proach to optimal derivative accumulation. Technical report, Argonne National Laboratory,

[10] [10]

URL:https://www.mcs.anl.gov/uploads/ cels/papers/1994-0112.pdf

Extended version of conference paper. URL:https://www.mcs.anl.gov/uploads/ cels/papers/1994-0112.pdf

1994

[11] [11]

Springer, 2012.doi:10.1007/978-3-662-70107-2

Reinhard Diestel.Graph Theory, 4th Edition, volume 173 ofGraduate texts in mathematics. Springer, 2012.doi:10.1007/978-3-662-70107-2

work page doi:10.1007/978-3-662-70107-2 2012

[12] [12]

A mathematical view of automatic differentiation.Acta Numerica, 12:321–398, 2003.doi:10.1017/S0962492902000132

Andreas Griewank. A mathematical view of automatic differentiation.Acta Numerica, 12:321–398, 2003.doi:10.1017/S0962492902000132

work page doi:10.1017/s0962492902000132 2003

[13] [13]

On the calculation of Jacobian matrices by the Markowitz rule

Andreas Griewank and Shawn Reese. On the calculation of Jacobian matrices by the Markowitz rule. InAutomatic Differentiation of Algorithms: Theory, Implementation, and Application, pages 126–135. SIAM, 1991. URL:https://www.osti.gov/servlets/purl/5933713

work page arXiv 1991

[14] [14]

Griewank and A

Andreas Griewank and Andrea Walther.Evaluating derivatives: principles and techniques of algorithmic differentiation. SIAM, 2008.doi:10.1137/1.9780898717761

work page doi:10.1137/1.9780898717761 2008

[15] [15]

Gurobi Optimizer Reference Manual, 2026

Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2026. URL:https://www. gurobi.com

2026

[16] [16]

Hovland, C

P. Hovland, C. Bischof, D. Spiegelman, and M. Casella. Efficient derivative codes through automatic differentiation and interface contraction: An application in biostatistics.SIAM Journal on Scientific Computing, 18(4):1056–1066, 1997.doi:10.1137/S1064827595281800

work page doi:10.1137/s1064827595281800 1997

[17] [17]

History of automatic differentiation and rounding error estimation

Masao Iri. History of automatic differentiation and rounding error estimation. InAutomatic Differentiation of Algorithms: Theory, Implementation, and Application, pages 1–16. SIAM, Philadelphia, PA, 1991

1991

[18] [18]

Tight Running Time Lower Bounds for Vertex Deletion Problems

Christian Komusiewicz. Tight running time lower bounds for vertex deletion problems.ACM Transactions on Computation Theory (TOCT), 10(2):1–18, 2018.arXiv:1511.05449,doi: 10.1145/3186589

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/3186589 2018

[19] [19]

The representation of the cumulative rounding error of an algorithm as a taylor expansion of the local rounding errors

Seppo Linnainmaa. The representation of the cumulative rounding error of an algorithm as a taylor expansion of the local rounding errors. Master’s thesis, University of Helsinki, 1970. In Finnish

1970

[20] [20]

Graphax.https://github.com/jamielohoff/graphax

Jamie Lohoff. Graphax.https://github.com/jamielohoff/graphax. [Accessed: April 2026]. 15

2026

[21] [21]

Optimizing automatic differentiation with deep reinforcement learning

Jamie Lohoff and Emre Neftci. Optimizing automatic differentiation with deep reinforcement learning. 37:3611–3648, 2024.arXiv:2406.05027,doi:10.52202/079017-0120

work page doi:10.52202/079017-0120 2024

[22] [22]

Randomized heuristics for exploiting Jacobian scarcity.Optimization Methods and Software, 27(2):311–322, 2012.doi:10.1080/10556788

Andrew Lyons, Ilya Safro, and Jean Utke. Randomized heuristics for exploiting Jacobian scarcity.Optimization Methods and Software, 27(2):311–322, 2012.doi:10.1080/10556788. 2011.577774

work page doi:10.1080/10556788 2012

[23] [23]

PhD thesis, Dresden University of Technology, Germany, 1999

Uwe Naumann.Efficient calculation of Jacobian matrices by optimized application of the chain rule to computational graphs. PhD thesis, Dresden University of Technology, Germany, 1999. URL:https://d-nb.info/956080170

work page arXiv 1999

[24] [24]

Cheaper Jacobians by simulated annealing.SIAM J

Uwe Naumann. Cheaper Jacobians by simulated annealing.SIAM J. on Optimization, 13(3):660–674, August 2002.doi:10.1137/S1052623400368394

work page doi:10.1137/s1052623400368394 2002

[25] [25]

Optimal vertex elimination in single-expression-use graphs

Uwe Naumann and Yuxiao Hu. Optimal vertex elimination in single-expression-use graphs. ACM Transactions on Mathematical Software (TOMS), 35(1):1–20, 2008.doi:10.1145/ 1377603.1377605

work page arXiv 2008

[26] [26]

Max Sagebaum, Tim Albring, and Nicolas R. Gauger. High-performance derivative computa- tions using codipack.ACM Trans. Math. Softw., 45(4), December 2019.arXiv:1709.07229, doi:10.1145/3356900

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/3356900 2019

[27] [27]

E. M. Tadjouddine. Vertex-ordering Algorithms for Automatic Differentiation of Computer Codes.The Computer Journal, 51(6):688–699, 2008.doi:10.1093/comjnl/bxm115. 16 A Deferred Proofs A.1 Deferred Proofs from Section 2 Observation 5.LetD= (V=S⊎I⊎T, E),X⊆I, andi, j∈V\X. Then(i, j)∈E(D X)if and only if there exists a directed path fromitojfor which every in...

work page doi:10.1093/comjnl/bxm115 2008

[28] [28]

Next, we assign theevariables

Observe that constraints (5) and (6) are satisfied. Next, we assign theevariables. For each (i, j)∈E, sete ij = 1, as required by constraint (7). For eachs∈Sandb∈B, sete sb = 1 2, satisfying constraint (8) becausex as +x ab = 1 2 for every a∈A. Similarly, sete ut = 1 2 for everyu∈Uandt∈T. Next, for eachs∈Sandt∈T, sete st = 1. Finally, set all remaininge i...

[29] [29]

Then, for eacha∈Aandt∈T, we setz at = 1

Next, for each arc (u, v)∈E, we setz uv = 1, satisfying constraint (19). Then, for eacha∈Aandt∈T, we setz at = 1

[30] [30]

Similarly, for eachs∈Sandb∈B, we set zsb = 1

This is safe with respect to constraint (20) becausex b = 1 2 for everyb∈B. Similarly, for eachs∈Sandb∈B, we set zsb = 1

[31] [31]

It remains to assign theyvariables

Finally, for eachs∈Sandt∈T, we setz st = 0, which is safe with respect to (20) because for eacha∈A,x a = 1 2 andz at = 1 2. It remains to assign theyvariables. For eachs∈Sanda∈A, we sety sa = 1 2 if (s, a)∈E andy sa = 0 otherwise. Similarly, for eachb∈Bandt∈T, we sety bt = 1 2 if (b, t)∈Eandy bt = 0 otherwise. Both assignments are safe with respect to con...