pith. sign in

arxiv: 1907.07587 · v2 · pith:DVJDZMK2new · submitted 2019-07-17 · 💻 cs.PL · cs.LG

A Differentiable Programming System to Bridge Machine Learning and Scientific Computing

Pith reviewed 2026-05-24 20:06 UTC · model grok-4.3

classification 💻 cs.PL cs.LG
keywords differentiable programmingautomatic differentiationmachine learningscientific computingcontrol flowrecursionmutationprogram gradients
0
0 comments X

The pith

A system computes gradients through arbitrary programs that include control flow, recursion, and mutation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a differentiable programming system that derives gradients for general program structures written in a high-level language. It handles nearly every language feature such as branches, loops, recursive calls, and in-place changes, then produces optimized code without any manual staging or rewriting by the user. A reader would care because this removes the usual barrier between machine learning models and existing scientific libraries, letting gradients flow directly through complex simulations and solvers. If the claim holds, the same infrastructure that already supports numerical linear algebra can now be shared for differentiation as well.

Core claim

The system is able to take gradients of general program structures. It supports almost all language constructs (control flow, recursion, mutation, etc.) and compiles high-performance code without requiring any user intervention or refactoring to stage computations. This enables an expressive programming model for deep learning, but more importantly, it enables straightforward incorporation of a large ecosystem of libraries into models, along with support for mixed-mode, complex, and checkpointed differentiation.

What carries the argument

The automatic differentiation engine that traverses and transforms general program structures, including those with control flow and mutation, while emitting optimized machine code.

If this is right

  • Machine learning models can directly include and differentiate through existing scientific computing libraries without code changes.
  • Advanced differentiation modes such as mixed-mode and checkpointed variants become available inside the same framework.
  • High-performance differentiated code is generated automatically for programs that contain recursion and mutation.
  • An expressive model for deep learning is obtained by treating arbitrary programs as differentiable objects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Hybrid models could embed differential equation solvers or physics simulators as trainable layers whose internal gradients are supplied automatically.
  • The same mechanism might extend to domains with heavy control flow, such as learned optimizers or search algorithms.
  • Numerical stability of the generated gradients could be tested on long-running simulations that were previously off-limits to differentiation.

Load-bearing premise

The implementation correctly and efficiently handles arbitrary combinations of control flow, recursion, and mutation while preserving numerical correctness and producing optimized machine code.

What would settle it

Apply the system to a program that nests recursion inside mutated arrays and conditional branches, then compare its computed gradient against an independent finite-difference check; any systematic discrepancy would falsify the claim.

Figures

Figures reproduced from arXiv: 1907.07587 by Alan Edelman, Chris Rackauckas, Elliot Saba, Keno Fischer, Mike Innes, Viral B Shah, Will Tebbutt.

Figure 1
Figure 1. Figure 1: The differential operator J is able to implement the chain rule through a local, syntactic recursive transformation. julia> f(x) = x^2 + 3x + 1 julia> gradient(f, 1/3) (3.6666666666666665,) julia> using Measurements; julia> gradient(f, 1/3 +- 0.01) (3.6666666666666665 +- 0.02,) [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Using a neural network surrogate to solve inverse problems Model-based reinforcement learning has advantages over model-agnostic methods, given that an effective agent must approximate the dynamics of its environment [4]. However, model-based approaches have been hindered by the inability to incorporate realistic environmental models into deep learning models. Previous work has had success re-implementing … view at source ↗
Figure 5
Figure 5. Figure 5: After 100 iterations [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: An ADAM optimizer is used to tune pa￾rameters of a variational quantum circuit to find the ground state of a 4-site anti-ferromagnetic Heisenberg chain Hamiltonian. The necessary gradients are obtained by automatic differentia￾tion of a Yao.jl quantum simulator [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: Neural SDE Training. For the SDE solution X(t), the blue line shows X1(t) while the orange line shows X2(t). The green points shows the fitting data for X1 while the purple points show the fitting data for X2. The ribbons show the 95 percentile bounds of the stochastic solutions. The analytical formula for the adjoint of the strong solution of a SDE is difficult to efficiently calculate due to the lack of … view at source ↗
read the original abstract

Scientific computing is increasingly incorporating the advancements in machine learning and the ability to work with large amounts of data. At the same time, machine learning models are becoming increasingly sophisticated and exhibit many features often seen in scientific computing, stressing the capabilities of machine learning frameworks. Just as the disciplines of scientific computing and machine learning have shared common underlying infrastructure in the form of numerical linear algebra, we now have the opportunity to further share new computational infrastructure, and thus ideas, in the form of Differentiable Programming. We describe Zygote, a Differentiable Programming system that is able to take gradients of general program structures. We implement this system in the Julia programming language. Our system supports almost all language constructs (control flow, recursion, mutation, etc.) and compiles high-performance code without requiring any user intervention or refactoring to stage computations. This enables an expressive programming model for deep learning, but more importantly, it enables us to incorporate a large ecosystem of libraries in our models in a straightforward way. We discuss our approach to automatic differentiation, including its support for advanced techniques such as mixed-mode, complex and checkpointed differentiation, and present several examples of differentiating programs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Zygote, a source-to-source automatic differentiation system implemented in Julia. It claims to support differentiation of general program structures including control flow, recursion, mutation and other constructs without requiring user refactoring or staging, while generating high-performance code. The work discusses advanced techniques such as mixed-mode, complex and checkpointed differentiation and presents examples to bridge machine learning and scientific computing.

Significance. If the implementation correctly handles arbitrary nestings of the supported constructs while preserving numerical correctness, the system would enable direct incorporation of existing Julia scientific libraries into differentiable models, providing shared infrastructure between the fields beyond numerical linear algebra.

major comments (2)
  1. [Abstract] Abstract: the central claim that the system 'supports almost all language constructs (control flow, recursion, mutation, etc.)' and produces correct gradients for general programs without user intervention is load-bearing but unsupported by any concrete test cases, error analysis, or examples exercising interactions among recursion, mutation and data-dependent control flow.
  2. [Discussion of mixed-mode and checkpointed differentiation] Discussion of mixed-mode and checkpointed differentiation: the description of these techniques does not address aliasing or control-flow path sensitivity when mutation and recursion are combined, leaving the correctness of adjoints for arbitrary combinations unverified.
minor comments (2)
  1. The manuscript lacks performance benchmarks, comparisons to other AD frameworks, or listings of generated adjoint code to substantiate the high-performance claim.
  2. No quantitative error analysis or verification suite is reported to confirm numerical correctness across the supported language features.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their comments, which help clarify the presentation of our claims regarding Zygote's capabilities. We respond to each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the system 'supports almost all language constructs (control flow, recursion, mutation, etc.)' and produces correct gradients for general programs without user intervention is load-bearing but unsupported by any concrete test cases, error analysis, or examples exercising interactions among recursion, mutation and data-dependent control flow.

    Authors: The referee is correct that the manuscript does not provide a single test case or error analysis that exercises the interaction of recursion, mutation, and data-dependent control flow simultaneously. While individual features are demonstrated, the combined case is not explicitly shown. We will add an example in the revised manuscript that combines these elements to support the abstract's claim. revision: yes

  2. Referee: [Discussion of mixed-mode and checkpointed differentiation] Discussion of mixed-mode and checkpointed differentiation: the description of these techniques does not address aliasing or control-flow path sensitivity when mutation and recursion are combined, leaving the correctness of adjoints for arbitrary combinations unverified.

    Authors: We agree that the discussion of mixed-mode and checkpointed differentiation does not explicitly treat aliasing or control-flow path sensitivity in the context of combined mutation and recursion. The current text focuses on the techniques in isolation. In revision, we will expand this section to discuss these issues and note the verification status for arbitrary combinations. revision: yes

Circularity Check

0 steps flagged

No circularity; paper describes software system implementation without derivation chain or fitted predictions

full rationale

The manuscript presents Zygote as a Julia-based differentiable programming system supporting control flow, recursion, and mutation via source-to-source AD. No equations, first-principles derivations, or numerical predictions appear in the abstract or described content. Claims concern system capabilities and examples of differentiation rather than results obtained by fitting parameters to data or reducing via self-referential definitions. Self-citations, if present, are not load-bearing for any asserted uniqueness theorem or ansatz. The paper is therefore self-contained as an engineering description; correctness would be assessed by code inspection or benchmarks external to any internal chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the correctness and completeness of the automatic differentiation implementation for Julia's full language semantics; no free parameters or invented physical entities are introduced.

axioms (1)
  • domain assumption Automatic differentiation can be applied to arbitrary combinations of control flow, recursion, and mutation while preserving correctness and generating efficient code.
    This premise is invoked when the abstract asserts support for almost all language constructs without user intervention.
invented entities (1)
  • Zygote differentiable programming system no independent evidence
    purpose: To enable gradient computation on general Julia programs
    The paper introduces this new software artifact as the vehicle for the claimed capabilities.

pith-pipeline@v0.9.0 · 5748 in / 1349 out tokens · 25623 ms · 2026-05-24T20:06:05.027856+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. The Neural Compiler: Program-to-Network Translation for Hybrid Scientific Machine Learning

    cs.LG 2026-05 unverdicted novelty 7.0

    The Neural Compiler converts symbolic programs into exact differentiable PyTorch modules for hybrid scientific machine learning, enabling precise encoding of known physics with few trainable parameters.

  2. Decision-Focused Federated Learning Under Heterogeneous Objectives and Constraints

    math.OC 2026-04 unverdicted novelty 7.0

    Derives heterogeneity bounds separating objective-shift and feasible-set-shift effects in decision-focused federated learning and shows federation benefits when statistical gains exceed client-specific penalties.

  3. VertAX: a differentiable vertex model for learning epithelial tissue mechanics

    cs.LG 2026-04 unverdicted novelty 7.0

    VertAX supplies a differentiable JAX implementation of vertex models for confluent epithelia that enables forward simulation, mechanical parameter inference, and inverse design of tissue-scale behaviors.

  4. Universal Differential Equations for Scientific Machine Learning

    cs.LG 2020-01 unverdicted novelty 7.0

    Universal Differential Equations unify scientific models with machine learning by embedding flexible approximators into differential equations, enabling applications from biological mechanism discovery to high-dimensi...

  5. Decision-Focused Federated Learning Under Heterogeneous Objectives and Constraints

    math.OC 2026-04 conditional novelty 6.0

    New bounds on SPO+ loss heterogeneity in federated predict-then-optimize with varying objectives and constraints indicate federation benefits when statistical gains exceed heterogeneity costs, with robustness in stron...

  6. Physics-informed reservoir characterization from bulk and extreme pressure events with a differentiable simulator

    cs.LG 2026-04 unverdicted novelty 6.0

    A physics-informed ML method embeds a differentiable flow simulator into neural network training to infer permeability from sparse pressure data, halving inference error versus data-driven baselines across scenarios a...

  7. Learning Non-Markovian Noise via Ensemble Optimal Control

    quant-ph 2026-04 unverdicted novelty 5.0

    Machine learning trains an ensemble optimal control scheme to pick optimal measurement times for non-Markovian quantum noise parameters, reaching near Cramér-Rao bound precision.

  8. Neural Computers

    cs.LG 2026-04 unverdicted novelty 5.0

    Neural Computers are introduced as a new machine form where computation, memory, and I/O are unified in a learned runtime state, with initial video-model experiments showing acquisition of basic interface primitives f...

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · cited by 7 Pith papers · 6 internal anchors

  1. [1]

    MPI Forum, 2015

    MPI - A Message Passing Interface Standard. MPI Forum, 2015

  2. [2]

    Abadi, P

    M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), pages 265–283, 2016

  3. [3]

    Anderson, Z

    E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users’ guide, volume 9. SIAM, 1999

  4. [4]

    C. G. Atkeson and J. C. Santamaria. A comparison of direct and model-based reinforcement learning. In Proceedings of International Conference on Robotics and Automation, volume 4, pages 3557–3564. IEEE, 1997

  5. [5]

    Bar-Sinai, S

    Y . Bar-Sinai, S. Hoyer, J. Hickey, and M. P. Brenner. Data-driven discretization: machine learning for coarse graining of partial differential equations. arXiv e-prints, page arXiv:1808.04930, Aug 2018

  6. [6]

    A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind. Automatic differentiation in machine learning: a survey. Journal of Marchine Learning Research, 18:1–43, 2018

  7. [7]

    Bezanson, A

    J. Bezanson, A. Edelman, S. Karpinski, and V . B. Shah. Julia: A fresh approach to numerical computing. SIAM Review, 59(1):65–98, 2017

  8. [8]

    Bischof, P

    C. Bischof, P. Khademi, A. Mauer, and A. Carle. ADIFOR 2.0: Automatic differentiation of Fortran 77 programs. IEEE Computational Science and Engineering, 3(3):18–32, 1996

  9. [9]

    S. Byrne. Miletus: Writing financial contracts in julia, 2019

  10. [10]

    T. Q. Chen, Y . Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural ordinary differential equations. In Advances in Neural Information Processing Systems, pages 6571–6583, 2018

  11. [11]

    M. F. Cusumano-Towner, F. A. Saad, A. K. Lew, and V . K. Mansinghka. Gen: A general-purpose probabilistic programming system with programmable inference. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, pages 221–236, New York, NY , USA, 2019. ACM

  12. [12]

    de Avila Belbute-Peres, K

    F. de Avila Belbute-Peres, K. Smith, K. Allen, J. Tenenbaum, and J. Z. Kolter. End-to-end differentiable physics for learning and control. In Advances in Neural Information Processing Systems, pages 7178–7189, 2018

  13. [13]

    Degrave, M

    J. Degrave, M. Hermans, J. Dambre, and F. wyffels. A differentiable physics engine for deep learning in robotics. Frontiers in Neurorobotics, 13, Mar 2019

  14. [14]

    J. J. Dongarra, J. D. Cruz, S. Hammarling, and I. S. Duff. Algorithm 679: A set of level 3 basic linear algebra subprograms: model implementation and test programs. ACM Transactions on Mathematical Software (TOMS), 16(1):18–28, 1990

  15. [15]

    Dupont, A

    E. Dupont, A. Doucet, and Y . Whye Teh. Augmented Neural ODEs.arXiv e-prints, page arXiv:1904.01681, Apr 2019

  16. [16]

    Automatic Full Compilation of Julia Programs and ML Models to Cloud TPUs

    K. Fischer and E. Saba. Automatic full compilation of Julia programs and ML models to cloud TPUs. CoRR, abs/1810.09868, 2018

  17. [17]

    Fischer and E

    K. Fischer and E. Saba. XLA.jl: Compiling Julia to XLA. https://github.com/JuliaTPU/XLA.jl, 2018

  18. [18]

    Gandhi, M

    D. Gandhi, M. Innes, E. Saba, K. Fischer, and V . Shah. Julia E Flux: Modernizando o Aprendizado de Máquina, 2019

  19. [19]

    P. Gao, A. Honkela, M. Rattray, and N. D. Lawrence. Gaussian process modelling of latent chemical species: applications to inferring transcription factor activities. Bioinformatics, 24(16):i70–i75, 08 2008

  20. [20]

    H. Ge, K. Xu, and Z. Ghahramani. Turing: Composable inference for probabilistic programming. In International Conference on Artificial Intelligence and Statistics, AISTATS 2018, 9-11 April 2018, Playa Blanca, Lanzarote, Canary Islands, Spain, pages 1682–1690, 2018

  21. [21]

    Giordano

    M. Giordano. Uncertainty propagation with functionally correlated quantities. ArXiv e-prints, Oct. 2016

  22. [22]

    Gobet and R

    E. Gobet and R. Munos. Sensitivity analysis using Itô–Malliavin calculus and martingales, and application to stochastic optimal control. SIAM Journal on Control and Optimization, 43(5):1676–1713, 2005. 12

  23. [23]

    FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models

    W. Grathwohl, R. T. Q. Chen, J. Bettencourt, I. Sutskever, and D. K. Duvenaud. FFJORD: free-form continuous dynamics for scalable reversible generative models. CoRR, abs/1810.01367, 2018

  24. [24]

    Hartman and L

    D. Hartman and L. K. Mestha. A deep learning framework for model reduction of dynamical systems. In 2017 IEEE Conference on Control Technology and Applications (CCTA), pages 1917–1922, Aug 2017

  25. [25]

    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

  26. [26]

    Hochreiter and J

    S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735–1780, Nov. 1997

  27. [27]

    Y . Hu, S. Boker, M. Neale, and K. Klump. Coupled latent differential equation with moderators: Simulation and application. Psychological methods, 19, 05 2013

  28. [28]

    Huang and J.-a

    Z.-y. Huang and J.-a. Yan. Malliavin calculus. In Introduction to Infinite Dimensional Stochastic Analysis, pages 59–112. Springer, 2000

  29. [29]

    M. Innes. Don’t unroll adjoint: Differentiating SSA-form programs. CoRR, abs/1810.07951, 2018

  30. [30]

    Innes, N

    M. Innes, N. M. Joy, and T. Karmali. Reinforcement learning vs. differentiable programming. https: //fluxml.ai/2019/03/05/dp-vs-rl.html, 2019

  31. [31]

    Fashionable Modelling with Flux

    M. Innes, E. Saba, K. Fischer, D. Gandhi, M. C. Rudilosso, N. M. Joy, T. Karmali, A. Pal, and V . Shah. Fashionable modelling with Flux. CoRR, abs/1811.01457, 2018

  32. [32]

    Johnson, R

    M. Johnson, R. Frostig, D. Maclaurin, and C. Leary. JAX: Autograd and xla. https://github.com/ google/jax, 2018

  33. [33]

    S. P. Jones and J.-M. Eber. How to write a financial contract. 2003

  34. [34]

    S. P. Jones, J.-M. Eber, and J. Seward. Composing contracts: an adventure in financial engineering. ACM SIG-PLAN Notices, 35(9):280–292, 2000

  35. [35]

    Kurth, S

    T. Kurth, S. Treichler, J. Romero, M. Mudigonda, N. Luehr, E. Phillips, A. Mahesh, M. Matheson, J. Deslippe, M. Fatica, et al. Exascale deep learning for climate analytics. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, page 51. IEEE Press, 2018

  36. [36]

    T.-M. Li, M. Aittala, F. Durand, and J. Lehtinen. Differentiable monte carlo ray tracing through edge sampling. In SIGGRAPH Asia 2018 Technical Papers, page 222. ACM, 2018

  37. [37]

    Ordaz-Hernandez, X

    K. Ordaz-Hernandez, X. Fischer, and F. Bennis. Model reduction technique for mechanical behaviour modelling: Efficiency criteria and validity domain assessment.Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 222(3):493–505, 2008

  38. [38]

    B. A. Pearlmutter and J. M. Siskind. Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator. ACM Transactions on Programming Languages and Systems (TOPLAS), 30(2):7, 2008

  39. [39]

    PyTorch, a, year in

    PyTorch Team. PyTorch, a, year in... pytorch.org/blog/a-year-in, 2018. Accessed: 2018-09-22

  40. [40]

    The road to 1.0: production ready PyTorch.https://pytorch.org/blog/a-year-in/,

    PyTorch Team. The road to 1.0: production ready PyTorch.https://pytorch.org/blog/a-year-in/,

  41. [41]

    Accessed: 2018-09-22

  42. [42]

    DiffEqFlux.jl - A Julia Library for Neural Differential Equations

    C. Rackauckas, M. Innes, Y . Ma, J. Bettencourt, L. White, and V . Dixit. Diffeqflux.jl - A julia library for neural differential equations. CoRR, abs/1902.02376, 2019

  43. [43]

    Rackauckas, Y

    C. Rackauckas, Y . Ma, V . Dixit, X. Guo, M. Innes, J. Revels, J. Nyberg, and V . Ivaturi. A Comparison of Automatic Differentiation and Continuous Sensitivity Analysis for Derivatives of Differential Equation Solutions. arXiv e-prints, page arXiv:1812.01892, Dec 2018

  44. [44]

    Rackauckas and Q

    C. Rackauckas and Q. Nie. Differentialequations.jl – a performant and feature-rich ecosystem for solving differential equations in julia. 5(1), 2017. Exported from https://app.dimensions.ai on 2019/05/05

  45. [45]

    C. V . Rackauckas and Q. Nie. Adaptive methods for stochastic differential equations via natural embeddings and rejection sampling with memory. Discrete and continuous dynamical systems. Series B, 22 7:2731– 2761, 2017

  46. [46]

    H. M. R. Ugalde, J.-C. Carmona, V . M. Alvarado, and J. Reyes-Reyes. Neural network design and model reduction approach for black box nonlinear system identification with reduced number of parameters. Neurocomputing, 101:170 – 180, 2013

  47. [47]

    F. Wang, X. Wu, G. Essertel, J. Decker, and T. Rompf. Demystifying differentiable programming: Shift/reset the penultimate backpropagator. arXiv preprint arXiv:1803.10228, 2018

  48. [48]

    H. Zhang. The Malliavan Calculus. PhD thesis, 2004

  49. [49]

    zhe Luo, J

    X. zhe Luo, J. guo Liu, P. Zhang, and L. Wang. Yao.jl: Extensible, efficient quantum algorithm design for humans. In preparation, 2019. 13

  50. [50]

    Álvarez, D

    M. Álvarez, D. Luengo, and N. D. Lawrence. Latent force models. In D. van Dyk and M. Welling, editors, Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, volume 5 of Proceedings of Machine Learning Research, pages 9–16, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, 16–18 Apr 2009. PMLR. 14