Automatic Differentiation for Adjoint Stencil Loops
Pith reviewed 2026-05-25 02:10 UTC · model grok-4.3
The pith
Automatic differentiation combined with loop transformations produces adjoint stencil loops that keep the original memory access pattern and parallelizability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that a combination of automatic differentiation and loop transformations can generate adjoint code for stencil computations whose memory access pattern remains stencil-like, whose semantics match standard reverse-mode automatic differentiation, and whose parallelization and optimization opportunities are identical to those of the original loops.
What carries the argument
Loop transformations applied during the differentiation process that restructure the adjoint computation to retain the primal stencil access pattern.
If this is right
- Adjoint computations for stencil-based applications can reuse existing parallelization and optimization infrastructure without modification.
- The same domain-specific languages and compilers that accelerate primal stencil loops can accelerate their derivatives.
- Gradient calculations in seismic imaging and fluid dynamics can be generated automatically while retaining performance characteristics of hand-tuned code.
Where Pith is reading between the lines
- The method could reduce reliance on manually derived adjoint codes in large-scale simulation packages.
- Similar restructuring might apply to other regular access patterns such as convolutions inside neural-network layers.
- Automatic generation of performance-portable adjoint code becomes feasible for codes already written in stencil-friendly languages.
Load-bearing premise
The loop transformations preserve exact semantic equivalence to ordinary reverse-mode automatic differentiation.
What would settle it
Numerical comparison on a small stencil kernel where the transformed adjoint loop produces results that differ from a reference implementation of standard reverse-mode AD.
Figures
read the original abstract
Stencil loops are a common motif in computations including convolutional neural networks, structured-mesh solvers for partial differential equations, and image processing. Stencil loops are easy to parallelise, and their fast execution is aided by compilers, libraries, and domain-specific languages. Reverse-mode automatic differentiation, also known as algorithmic differentiation, autodiff, adjoint differentiation, or back-propagation, is sometimes used to obtain gradients of programs that contain stencil loops. Unfortunately, conventional automatic differentiation results in a memory access pattern that is not stencil-like and not easily parallelisable. In this paper we present a novel combination of automatic differentiation and loop transformations that preserves the structure and memory access pattern of stencil loops, while computing fully consistent derivatives. The generated loops can be parallelised and optimised for performance in the same way and using the same tools as the original computation. We have implemented this new technique in the Python tool PerforAD, which we release with this paper along with test cases derived from seismic imaging and computational fluid dynamics applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a novel combination of automatic differentiation and loop transformations for stencil loops that preserves the original structure and memory access patterns while producing fully consistent derivatives. The generated adjoint loops remain parallelizable and optimizable using the same tools as the primal computation. The technique is implemented in the released PerforAD Python tool and demonstrated on test cases from seismic imaging and computational fluid dynamics.
Significance. If the central claim of semantic equivalence holds, the result would enable efficient reverse-mode differentiation of performance-critical stencil computations without disrupting existing compiler and parallelization pipelines, which is relevant for PDE solvers, image processing, and convolutional networks. The release of PerforAD together with application-derived test cases strengthens reproducibility.
major comments (2)
- [Abstract] The central claim of computing 'fully consistent derivatives' (Abstract) requires that the loop transformations produce results mathematically identical to standard reverse-mode AD, yet the manuscript provides no explicit derivation of the adjoint rules, equivalence proof, or formal argument establishing semantic equivalence.
- [Abstract] Without verification on stencils involving variable coefficients, non-uniform access patterns, or boundary handling, it remains unclear whether the transformations preserve correctness in all cases claimed (Abstract). This is load-bearing for the assertion that the generated loops can be used identically to the primal.
minor comments (1)
- The release of the PerforAD implementation and test cases is a positive contribution to reproducibility and should be highlighted more explicitly.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address the two major comments point by point below and have revised the manuscript to strengthen the presentation of the central claims.
read point-by-point responses
-
Referee: [Abstract] The central claim of computing 'fully consistent derivatives' (Abstract) requires that the loop transformations produce results mathematically identical to standard reverse-mode AD, yet the manuscript provides no explicit derivation of the adjoint rules, equivalence proof, or formal argument establishing semantic equivalence.
Authors: We agree that an explicit derivation strengthens the paper. The revised manuscript adds Section 3.2, which derives the adjoint stencil transformation rules directly from the chain rule applied to a generic stencil update and proves equivalence to standard reverse-mode AD by showing that each rewrite step preserves the computed values. The argument is limited to the supported stencil class but is now stated formally. revision: yes
-
Referee: [Abstract] Without verification on stencils involving variable coefficients, non-uniform access patterns, or boundary handling, it remains unclear whether the transformations preserve correctness in all cases claimed (Abstract). This is load-bearing for the assertion that the generated loops can be used identically to the primal.
Authors: The seismic test case already exercises variable coefficients arising from heterogeneous media, and the CFD case includes boundary stencils. To address the concern directly, the revised evaluation section adds explicit experiments on non-uniform access patterns and a discussion of boundary handling. We acknowledge that exhaustive coverage of every conceivable stencil variant lies outside the paper's scope; the added cases and scope clarification support the claims for the target application domains. revision: partial
Circularity Check
No circularity; algorithmic construction is self-contained
full rationale
The paper presents an algorithmic technique for combining reverse-mode AD with loop transformations to preserve stencil structure and memory access patterns. This is described as a direct construction implemented in the released PerforAD tool, with no fitted parameters, no 'predictions' of quantities derived from the same data, and no load-bearing self-citations or uniqueness theorems that reduce the central claim to prior author work by definition. The abstract and description focus on semantic consistency of the generated adjoint loops as an engineering outcome rather than a mathematical derivation that collapses to its inputs. No quoted equations or steps exhibit self-definitional, fitted-input, or renaming circularity.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
adjoint stencils technique solves this problem by implementing back-propagation using only gather operations obtained via loop transformations
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
M. Araya-Polo, J. Cabezas, M. Hanzich, M. Pericas, F. Rubio, I. Gelado, M. Shafiq, E. Morancho, N. Navarro, E. Ayguade, J. M. Cela, and M. Valero. 2011. Assessing Accelerator-Based HPC Reverse Time Migration. IEEE Transactions on Parallel and Distributed Systems 22, 1 (Jan 2011), 147–162. https://doi.org/10.1109/TPDS. 2010.144
-
[2]
Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Ab- durrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman Amarasinghe. 2019. Tiramisu: A polyhedral compiler for expressing fast and portable code. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization . IEEE Press, 193–205
2019
-
[3]
Atilim Gunes Baydin, Barak A Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. 2018. Automatic differentiation in machine learning: a survey. Journal of Marchine Learning Research 18 (2018), 1–43
2018
-
[4]
Ramanujam, and P
Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral program optimization system. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
2008
-
[5]
Patrick E Farrell, David A Ham, Simon W Funke, and Marie E Rognes. 2013. Au- tomated derivation of the adjoint of high-level transient finite element programs. SIAM Journal on Scientific Computing 35, 4 (2013), C369–C393
2013
-
[6]
Michael Förster. 2014. Algorithmic Differentiation of Pragma-Defined Parallel Re- gions: Differentiating Computer Programs Containing OpenMP . Ph.D. Dissertation. RWTH Aachen
2014
-
[7]
MB Giles, D Ghate, and MC Duta. 2005. Using automatic differentiation for adjoint CFD code development. (2005)
2005
-
[8]
Andreas Griewank et al . 1989. On automatic differentiation. Mathematical Programming: recent developments and applications 6, 6 (1989), 83–107
1989
-
[9]
Andreas Griewank, David Juedes, and Jean Utke. 1996. Algorithm 755: ADOL-C: a package for the automatic differentiation of algorithms written in C/C++. ACM Transactions on Mathematical Software (TOMS) 22, 2 (1996), 131–167
1996
-
[10]
Laurent Hascoet and Valérie Pascual. 2013. The Tapenade automatic differenti- ation tool: Principles, model, and specification. ACM Trans. Math. Softw. 39, 3, Article 20 (May 2013), 43 pages. https://doi.org/10.1145/2450153.2450158
-
[11]
Patrick Heimbach, Chris Hill, and Ralf Giering. 2005. An efficient exact ad- joint of the parallel MIT general circulation model, generated via automatic differentiation. Future Generation Computer Systems 21, 8 (2005), 1356–1371
2005
-
[12]
Robin J Hogan. 2014. Fast reverse-mode automatic differentiation using expres- sion templates in C++. ACM Transactions on Mathematical Software (TOMS) 40, 4 (2014), 26
2014
-
[13]
Paul Dennis Hovland. 1997. Automatic differentiation of parallel programs. Ph.D. Dissertation. University of Illinois at Urbana-Champaign
1997
-
[14]
J.C. Hückelheim, P.D. Hovland, M.M. Strout, and J.-D. Müller. 2018. Paralleliz- able adjoint stencil computations using transposed forward-mode algorithmic differentiation. Optimization Methods and Software 33, 4-6 (2018), 672–693. https://doi.org/10.1080/10556788.2018.1435654
-
[15]
Hovland, Michelle Mills Strout, and Jens-Dominik Müller
Jan Hückelheim, Paul D. Hovland, Michelle Mills Strout, and Jens-Dominik Müller
-
[16]
International Journal for High Performance Computing Applications (2017)
Reverse-mode algorithmic differentiation of an OpenMP-parallel com- pressible flow solver. International Journal for High Performance Computing Applications (2017). https://doi.org/10.1177/1094342017712060
-
[17]
Michael Innes. 2018. Don’t unroll adjoint: differentiating SSA-Form programs. arXiv preprint arXiv:1810.07951 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[18]
Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, and Samuel Williams. 2010. An auto-tuning framework for parallel multicore stencil computations. In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS) . IEEE, 1–12
2010
-
[19]
Stefan Kronawitter and Christian Lengauer. 2018. Polyhedral search space explo- ration in the ExaStencils code generator. ACM Trans. Archit. Code Optim. 15, 4, Article 40 (Oct. 2018), 25 pages. https://doi.org/10.1145/3274653
-
[20]
Tzu-Mao Li, Michaël Gharbi, Andrew Adams, Frédo Durand, and Jonathan Ragan- Kelley. 2018. Differentiable programming for image processing and deep learning in Halide. ACM Transactions on Graphics (TOG) 37, 4 (2018), 139
2018
-
[21]
Architec- ture and performance of devito, a system for automated stencil computation,
F. Luporini, M. Lange, M. Louboutin, N. Kukreja, J. Hückelheim, C. Yount, P. Witte, P. H. J. Kelly, G. J. Gorman, and F. J. Herrmann. 2018. Architecture and performance of Devito, a system for automated stencil computation. CoRR abs/1807.03032 (jul 2018). arXiv:1807.03032 http://arxiv.org/abs/1807.03032
-
[22]
Smith, Mateusz Paprocki, Ondrej Certik, Sergey B
Aaron Meurer, Christopher P. Smith, Mateusz Paprocki, Ondřej Čertík, Sergey B. Kirpichev, Matthew Rocklin, AMiT Kumar, Sergiu Ivanov, Jason K. Moore, Sar- taj Singh, Thilina Rathnayake, Sean Vig, Brian E. Granger, Richard P. Muller, Francesco Bonazzi, Harsh Gupta, Shivam Vats, Fredrik Johansson, Fabian Pe- dregosa, Matthew J. Curry, Andy R. Terrel, Štěpán...
-
[23]
Sri Hari Krishna Narayanan, Boyana Norris, and Beata Winnicka. 2010. ADIC2: Development of a component source transformation system for differentiating C and C++. Procedia Computer Science 1, 1 (2010), 1845–1853
2010
-
[24]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer
-
[25]
Automatic differentiation in PyTorch. (2017)
2017
-
[26]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices 48, 6 (2013), 519–530
2013
-
[27]
Jarrett Revels, Miles Lubin, and Theodore Papamarkou. 2016. Forward-mode automatic differentiation in Julia. arXiv preprint arXiv:1607.07892 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[28]
Kevin Stock, Martin Kong, Tobias Grosser, Louis-Noël Pouchet, Fabrice Rastello, Jagannathan Ramanujam, and Ponnuswamy Sadayappan. 2014. A framework for enhancing data reuse via associative reordering. In ACM SIGPLAN Notices, Vol. 49. ACM, 65–76
2014
-
[29]
Jean Utke, Uwe Naumann, Mike Fagan, Nathan Tallent, Michelle Strout, Patrick Heimbach, Chris Hill, and Carl Wunsch. 2008. OpenAD/F: A modular open- source tool for automatic differentiation of Fortran codes. ACM Transactions on Mathematical Software (TOMS) 34, 4 (2008), 18. The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Ar...
2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.