A Differentiable Programming System to Bridge Machine Learning and Scientific Computing
Pith reviewed 2026-05-24 20:06 UTC · model grok-4.3
The pith
A system computes gradients through arbitrary programs that include control flow, recursion, and mutation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The system is able to take gradients of general program structures. It supports almost all language constructs (control flow, recursion, mutation, etc.) and compiles high-performance code without requiring any user intervention or refactoring to stage computations. This enables an expressive programming model for deep learning, but more importantly, it enables straightforward incorporation of a large ecosystem of libraries into models, along with support for mixed-mode, complex, and checkpointed differentiation.
What carries the argument
The automatic differentiation engine that traverses and transforms general program structures, including those with control flow and mutation, while emitting optimized machine code.
If this is right
- Machine learning models can directly include and differentiate through existing scientific computing libraries without code changes.
- Advanced differentiation modes such as mixed-mode and checkpointed variants become available inside the same framework.
- High-performance differentiated code is generated automatically for programs that contain recursion and mutation.
- An expressive model for deep learning is obtained by treating arbitrary programs as differentiable objects.
Where Pith is reading between the lines
- Hybrid models could embed differential equation solvers or physics simulators as trainable layers whose internal gradients are supplied automatically.
- The same mechanism might extend to domains with heavy control flow, such as learned optimizers or search algorithms.
- Numerical stability of the generated gradients could be tested on long-running simulations that were previously off-limits to differentiation.
Load-bearing premise
The implementation correctly and efficiently handles arbitrary combinations of control flow, recursion, and mutation while preserving numerical correctness and producing optimized machine code.
What would settle it
Apply the system to a program that nests recursion inside mutated arrays and conditional branches, then compare its computed gradient against an independent finite-difference check; any systematic discrepancy would falsify the claim.
Figures
read the original abstract
Scientific computing is increasingly incorporating the advancements in machine learning and the ability to work with large amounts of data. At the same time, machine learning models are becoming increasingly sophisticated and exhibit many features often seen in scientific computing, stressing the capabilities of machine learning frameworks. Just as the disciplines of scientific computing and machine learning have shared common underlying infrastructure in the form of numerical linear algebra, we now have the opportunity to further share new computational infrastructure, and thus ideas, in the form of Differentiable Programming. We describe Zygote, a Differentiable Programming system that is able to take gradients of general program structures. We implement this system in the Julia programming language. Our system supports almost all language constructs (control flow, recursion, mutation, etc.) and compiles high-performance code without requiring any user intervention or refactoring to stage computations. This enables an expressive programming model for deep learning, but more importantly, it enables us to incorporate a large ecosystem of libraries in our models in a straightforward way. We discuss our approach to automatic differentiation, including its support for advanced techniques such as mixed-mode, complex and checkpointed differentiation, and present several examples of differentiating programs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Zygote, a source-to-source automatic differentiation system implemented in Julia. It claims to support differentiation of general program structures including control flow, recursion, mutation and other constructs without requiring user refactoring or staging, while generating high-performance code. The work discusses advanced techniques such as mixed-mode, complex and checkpointed differentiation and presents examples to bridge machine learning and scientific computing.
Significance. If the implementation correctly handles arbitrary nestings of the supported constructs while preserving numerical correctness, the system would enable direct incorporation of existing Julia scientific libraries into differentiable models, providing shared infrastructure between the fields beyond numerical linear algebra.
major comments (2)
- [Abstract] Abstract: the central claim that the system 'supports almost all language constructs (control flow, recursion, mutation, etc.)' and produces correct gradients for general programs without user intervention is load-bearing but unsupported by any concrete test cases, error analysis, or examples exercising interactions among recursion, mutation and data-dependent control flow.
- [Discussion of mixed-mode and checkpointed differentiation] Discussion of mixed-mode and checkpointed differentiation: the description of these techniques does not address aliasing or control-flow path sensitivity when mutation and recursion are combined, leaving the correctness of adjoints for arbitrary combinations unverified.
minor comments (2)
- The manuscript lacks performance benchmarks, comparisons to other AD frameworks, or listings of generated adjoint code to substantiate the high-performance claim.
- No quantitative error analysis or verification suite is reported to confirm numerical correctness across the supported language features.
Simulated Author's Rebuttal
We thank the referee for their comments, which help clarify the presentation of our claims regarding Zygote's capabilities. We respond to each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the system 'supports almost all language constructs (control flow, recursion, mutation, etc.)' and produces correct gradients for general programs without user intervention is load-bearing but unsupported by any concrete test cases, error analysis, or examples exercising interactions among recursion, mutation and data-dependent control flow.
Authors: The referee is correct that the manuscript does not provide a single test case or error analysis that exercises the interaction of recursion, mutation, and data-dependent control flow simultaneously. While individual features are demonstrated, the combined case is not explicitly shown. We will add an example in the revised manuscript that combines these elements to support the abstract's claim. revision: yes
-
Referee: [Discussion of mixed-mode and checkpointed differentiation] Discussion of mixed-mode and checkpointed differentiation: the description of these techniques does not address aliasing or control-flow path sensitivity when mutation and recursion are combined, leaving the correctness of adjoints for arbitrary combinations unverified.
Authors: We agree that the discussion of mixed-mode and checkpointed differentiation does not explicitly treat aliasing or control-flow path sensitivity in the context of combined mutation and recursion. The current text focuses on the techniques in isolation. In revision, we will expand this section to discuss these issues and note the verification status for arbitrary combinations. revision: yes
Circularity Check
No circularity; paper describes software system implementation without derivation chain or fitted predictions
full rationale
The manuscript presents Zygote as a Julia-based differentiable programming system supporting control flow, recursion, and mutation via source-to-source AD. No equations, first-principles derivations, or numerical predictions appear in the abstract or described content. Claims concern system capabilities and examples of differentiation rather than results obtained by fitting parameters to data or reducing via self-referential definitions. Self-citations, if present, are not load-bearing for any asserted uniqueness theorem or ansatz. The paper is therefore self-contained as an engineering description; correctness would be assessed by code inspection or benchmarks external to any internal chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Automatic differentiation can be applied to arbitrary combinations of control flow, recursion, and mutation while preserving correctness and generating efficient code.
invented entities (1)
-
Zygote differentiable programming system
no independent evidence
Forward citations
Cited by 8 Pith papers
-
The Neural Compiler: Program-to-Network Translation for Hybrid Scientific Machine Learning
The Neural Compiler converts symbolic programs into exact differentiable PyTorch modules for hybrid scientific machine learning, enabling precise encoding of known physics with few trainable parameters.
-
Decision-Focused Federated Learning Under Heterogeneous Objectives and Constraints
Derives heterogeneity bounds separating objective-shift and feasible-set-shift effects in decision-focused federated learning and shows federation benefits when statistical gains exceed client-specific penalties.
-
VertAX: a differentiable vertex model for learning epithelial tissue mechanics
VertAX supplies a differentiable JAX implementation of vertex models for confluent epithelia that enables forward simulation, mechanical parameter inference, and inverse design of tissue-scale behaviors.
-
Universal Differential Equations for Scientific Machine Learning
Universal Differential Equations unify scientific models with machine learning by embedding flexible approximators into differential equations, enabling applications from biological mechanism discovery to high-dimensi...
-
Decision-Focused Federated Learning Under Heterogeneous Objectives and Constraints
New bounds on SPO+ loss heterogeneity in federated predict-then-optimize with varying objectives and constraints indicate federation benefits when statistical gains exceed heterogeneity costs, with robustness in stron...
-
Physics-informed reservoir characterization from bulk and extreme pressure events with a differentiable simulator
A physics-informed ML method embeds a differentiable flow simulator into neural network training to infer permeability from sparse pressure data, halving inference error versus data-driven baselines across scenarios a...
-
Learning Non-Markovian Noise via Ensemble Optimal Control
Machine learning trains an ensemble optimal control scheme to pick optimal measurement times for non-Markovian quantum noise parameters, reaching near Cramér-Rao bound precision.
-
Neural Computers
Neural Computers are introduced as a new machine form where computation, memory, and I/O are unified in a learned runtime state, with initial video-model experiments showing acquisition of basic interface primitives f...
Reference graph
Works this paper leans on
- [1]
- [2]
-
[3]
E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users’ guide, volume 9. SIAM, 1999
work page 1999
-
[4]
C. G. Atkeson and J. C. Santamaria. A comparison of direct and model-based reinforcement learning. In Proceedings of International Conference on Robotics and Automation, volume 4, pages 3557–3564. IEEE, 1997
work page 1997
-
[5]
Y . Bar-Sinai, S. Hoyer, J. Hickey, and M. P. Brenner. Data-driven discretization: machine learning for coarse graining of partial differential equations. arXiv e-prints, page arXiv:1808.04930, Aug 2018
-
[6]
A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind. Automatic differentiation in machine learning: a survey. Journal of Marchine Learning Research, 18:1–43, 2018
work page 2018
-
[7]
J. Bezanson, A. Edelman, S. Karpinski, and V . B. Shah. Julia: A fresh approach to numerical computing. SIAM Review, 59(1):65–98, 2017
work page 2017
-
[8]
C. Bischof, P. Khademi, A. Mauer, and A. Carle. ADIFOR 2.0: Automatic differentiation of Fortran 77 programs. IEEE Computational Science and Engineering, 3(3):18–32, 1996
work page 1996
-
[9]
S. Byrne. Miletus: Writing financial contracts in julia, 2019
work page 2019
-
[10]
T. Q. Chen, Y . Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural ordinary differential equations. In Advances in Neural Information Processing Systems, pages 6571–6583, 2018
work page 2018
-
[11]
M. F. Cusumano-Towner, F. A. Saad, A. K. Lew, and V . K. Mansinghka. Gen: A general-purpose probabilistic programming system with programmable inference. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, pages 221–236, New York, NY , USA, 2019. ACM
work page 2019
-
[12]
F. de Avila Belbute-Peres, K. Smith, K. Allen, J. Tenenbaum, and J. Z. Kolter. End-to-end differentiable physics for learning and control. In Advances in Neural Information Processing Systems, pages 7178–7189, 2018
work page 2018
-
[13]
J. Degrave, M. Hermans, J. Dambre, and F. wyffels. A differentiable physics engine for deep learning in robotics. Frontiers in Neurorobotics, 13, Mar 2019
work page 2019
-
[14]
J. J. Dongarra, J. D. Cruz, S. Hammarling, and I. S. Duff. Algorithm 679: A set of level 3 basic linear algebra subprograms: model implementation and test programs. ACM Transactions on Mathematical Software (TOMS), 16(1):18–28, 1990
work page 1990
- [15]
-
[16]
Automatic Full Compilation of Julia Programs and ML Models to Cloud TPUs
K. Fischer and E. Saba. Automatic full compilation of Julia programs and ML models to cloud TPUs. CoRR, abs/1810.09868, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[17]
K. Fischer and E. Saba. XLA.jl: Compiling Julia to XLA. https://github.com/JuliaTPU/XLA.jl, 2018
work page 2018
- [18]
-
[19]
P. Gao, A. Honkela, M. Rattray, and N. D. Lawrence. Gaussian process modelling of latent chemical species: applications to inferring transcription factor activities. Bioinformatics, 24(16):i70–i75, 08 2008
work page 2008
-
[20]
H. Ge, K. Xu, and Z. Ghahramani. Turing: Composable inference for probabilistic programming. In International Conference on Artificial Intelligence and Statistics, AISTATS 2018, 9-11 April 2018, Playa Blanca, Lanzarote, Canary Islands, Spain, pages 1682–1690, 2018
work page 2018
- [21]
-
[22]
E. Gobet and R. Munos. Sensitivity analysis using Itô–Malliavin calculus and martingales, and application to stochastic optimal control. SIAM Journal on Control and Optimization, 43(5):1676–1713, 2005. 12
work page 2005
-
[23]
FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models
W. Grathwohl, R. T. Q. Chen, J. Bettencourt, I. Sutskever, and D. K. Duvenaud. FFJORD: free-form continuous dynamics for scalable reversible generative models. CoRR, abs/1810.01367, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[24]
D. Hartman and L. K. Mestha. A deep learning framework for model reduction of dynamical systems. In 2017 IEEE Conference on Control Technology and Applications (CCTA), pages 1917–1922, Aug 2017
work page 2017
-
[25]
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016
work page 2016
-
[26]
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735–1780, Nov. 1997
work page 1997
-
[27]
Y . Hu, S. Boker, M. Neale, and K. Klump. Coupled latent differential equation with moderators: Simulation and application. Psychological methods, 19, 05 2013
work page 2013
-
[28]
Z.-y. Huang and J.-a. Yan. Malliavin calculus. In Introduction to Infinite Dimensional Stochastic Analysis, pages 59–112. Springer, 2000
work page 2000
-
[29]
M. Innes. Don’t unroll adjoint: Differentiating SSA-form programs. CoRR, abs/1810.07951, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [30]
-
[31]
Fashionable Modelling with Flux
M. Innes, E. Saba, K. Fischer, D. Gandhi, M. C. Rudilosso, N. M. Joy, T. Karmali, A. Pal, and V . Shah. Fashionable modelling with Flux. CoRR, abs/1811.01457, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[32]
M. Johnson, R. Frostig, D. Maclaurin, and C. Leary. JAX: Autograd and xla. https://github.com/ google/jax, 2018
work page 2018
-
[33]
S. P. Jones and J.-M. Eber. How to write a financial contract. 2003
work page 2003
-
[34]
S. P. Jones, J.-M. Eber, and J. Seward. Composing contracts: an adventure in financial engineering. ACM SIG-PLAN Notices, 35(9):280–292, 2000
work page 2000
-
[35]
T. Kurth, S. Treichler, J. Romero, M. Mudigonda, N. Luehr, E. Phillips, A. Mahesh, M. Matheson, J. Deslippe, M. Fatica, et al. Exascale deep learning for climate analytics. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, page 51. IEEE Press, 2018
work page 2018
-
[36]
T.-M. Li, M. Aittala, F. Durand, and J. Lehtinen. Differentiable monte carlo ray tracing through edge sampling. In SIGGRAPH Asia 2018 Technical Papers, page 222. ACM, 2018
work page 2018
-
[37]
K. Ordaz-Hernandez, X. Fischer, and F. Bennis. Model reduction technique for mechanical behaviour modelling: Efficiency criteria and validity domain assessment.Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 222(3):493–505, 2008
work page 2008
-
[38]
B. A. Pearlmutter and J. M. Siskind. Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator. ACM Transactions on Programming Languages and Systems (TOPLAS), 30(2):7, 2008
work page 2008
-
[39]
PyTorch Team. PyTorch, a, year in... pytorch.org/blog/a-year-in, 2018. Accessed: 2018-09-22
work page 2018
-
[40]
The road to 1.0: production ready PyTorch.https://pytorch.org/blog/a-year-in/,
PyTorch Team. The road to 1.0: production ready PyTorch.https://pytorch.org/blog/a-year-in/,
-
[41]
Accessed: 2018-09-22
work page 2018
-
[42]
DiffEqFlux.jl - A Julia Library for Neural Differential Equations
C. Rackauckas, M. Innes, Y . Ma, J. Bettencourt, L. White, and V . Dixit. Diffeqflux.jl - A julia library for neural differential equations. CoRR, abs/1902.02376, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1902
-
[43]
C. Rackauckas, Y . Ma, V . Dixit, X. Guo, M. Innes, J. Revels, J. Nyberg, and V . Ivaturi. A Comparison of Automatic Differentiation and Continuous Sensitivity Analysis for Derivatives of Differential Equation Solutions. arXiv e-prints, page arXiv:1812.01892, Dec 2018
work page internal anchor Pith review arXiv 2018
-
[44]
C. Rackauckas and Q. Nie. Differentialequations.jl – a performant and feature-rich ecosystem for solving differential equations in julia. 5(1), 2017. Exported from https://app.dimensions.ai on 2019/05/05
work page 2017
-
[45]
C. V . Rackauckas and Q. Nie. Adaptive methods for stochastic differential equations via natural embeddings and rejection sampling with memory. Discrete and continuous dynamical systems. Series B, 22 7:2731– 2761, 2017
work page 2017
-
[46]
H. M. R. Ugalde, J.-C. Carmona, V . M. Alvarado, and J. Reyes-Reyes. Neural network design and model reduction approach for black box nonlinear system identification with reduced number of parameters. Neurocomputing, 101:170 – 180, 2013
work page 2013
- [47]
-
[48]
H. Zhang. The Malliavan Calculus. PhD thesis, 2004
work page 2004
-
[49]
X. zhe Luo, J. guo Liu, P. Zhang, and L. Wang. Yao.jl: Extensible, efficient quantum algorithm design for humans. In preparation, 2019. 13
work page 2019
-
[50]
M. Álvarez, D. Luengo, and N. D. Lawrence. Latent force models. In D. van Dyk and M. Welling, editors, Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, volume 5 of Proceedings of Machine Learning Research, pages 9–16, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, 16–18 Apr 2009. PMLR. 14
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.