A Differentiable Programming System to Bridge Machine Learning and Scientific Computing

Alan Edelman; Chris Rackauckas; Elliot Saba; Keno Fischer; Mike Innes; Viral B Shah; Will Tebbutt

arxiv: 1907.07587 · v2 · pith:DVJDZMK2new · submitted 2019-07-17 · 💻 cs.PL · cs.LG

A Differentiable Programming System to Bridge Machine Learning and Scientific Computing

Mike Innes , Alan Edelman , Keno Fischer , Chris Rackauckas , Elliot Saba , Viral B Shah , Will Tebbutt This is my paper

Pith reviewed 2026-05-24 20:06 UTC · model grok-4.3

classification 💻 cs.PL cs.LG

keywords differentiable programmingautomatic differentiationmachine learningscientific computingcontrol flowrecursionmutationprogram gradients

0 comments

The pith

A system computes gradients through arbitrary programs that include control flow, recursion, and mutation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a differentiable programming system that derives gradients for general program structures written in a high-level language. It handles nearly every language feature such as branches, loops, recursive calls, and in-place changes, then produces optimized code without any manual staging or rewriting by the user. A reader would care because this removes the usual barrier between machine learning models and existing scientific libraries, letting gradients flow directly through complex simulations and solvers. If the claim holds, the same infrastructure that already supports numerical linear algebra can now be shared for differentiation as well.

Core claim

The system is able to take gradients of general program structures. It supports almost all language constructs (control flow, recursion, mutation, etc.) and compiles high-performance code without requiring any user intervention or refactoring to stage computations. This enables an expressive programming model for deep learning, but more importantly, it enables straightforward incorporation of a large ecosystem of libraries into models, along with support for mixed-mode, complex, and checkpointed differentiation.

What carries the argument

The automatic differentiation engine that traverses and transforms general program structures, including those with control flow and mutation, while emitting optimized machine code.

If this is right

Machine learning models can directly include and differentiate through existing scientific computing libraries without code changes.
Advanced differentiation modes such as mixed-mode and checkpointed variants become available inside the same framework.
High-performance differentiated code is generated automatically for programs that contain recursion and mutation.
An expressive model for deep learning is obtained by treating arbitrary programs as differentiable objects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hybrid models could embed differential equation solvers or physics simulators as trainable layers whose internal gradients are supplied automatically.
The same mechanism might extend to domains with heavy control flow, such as learned optimizers or search algorithms.
Numerical stability of the generated gradients could be tested on long-running simulations that were previously off-limits to differentiation.

Load-bearing premise

The implementation correctly and efficiently handles arbitrary combinations of control flow, recursion, and mutation while preserving numerical correctness and producing optimized machine code.

What would settle it

Apply the system to a program that nests recursion inside mutated arrays and conditional branches, then compare its computed gradient against an independent finite-difference check; any systematic discrepancy would falsify the claim.

Figures

Figures reproduced from arXiv: 1907.07587 by Alan Edelman, Chris Rackauckas, Elliot Saba, Keno Fischer, Mike Innes, Viral B Shah, Will Tebbutt.

**Figure 1.** Figure 1: The differential operator J is able to implement the chain rule through a local, syntactic recursive transformation. julia> f(x) = x^2 + 3x + 1 julia> gradient(f, 1/3) (3.6666666666666665,) julia> using Measurements; julia> gradient(f, 1/3 +- 0.01) (3.6666666666666665 +- 0.02,) [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 3.** Figure 3: Using a neural network surrogate to solve inverse problems Model-based reinforcement learning has advantages over model-agnostic methods, given that an effective agent must approximate the dynamics of its environment [4]. However, model-based approaches have been hindered by the inability to incorporate realistic environmental models into deep learning models. Previous work has had success re-implementing … view at source ↗

**Figure 5.** Figure 5: After 100 iterations [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 7.** Figure 7: An ADAM optimizer is used to tune parameters of a variational quantum circuit to find the ground state of a 4-site anti-ferromagnetic Heisenberg chain Hamiltonian. The necessary gradients are obtained by automatic differentiation of a Yao.jl quantum simulator [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 9.** Figure 9: Neural SDE Training. For the SDE solution X(t), the blue line shows X1(t) while the orange line shows X2(t). The green points shows the fitting data for X1 while the purple points show the fitting data for X2. The ribbons show the 95 percentile bounds of the stochastic solutions. The analytical formula for the adjoint of the strong solution of a SDE is difficult to efficiently calculate due to the lack of … view at source ↗

read the original abstract

Scientific computing is increasingly incorporating the advancements in machine learning and the ability to work with large amounts of data. At the same time, machine learning models are becoming increasingly sophisticated and exhibit many features often seen in scientific computing, stressing the capabilities of machine learning frameworks. Just as the disciplines of scientific computing and machine learning have shared common underlying infrastructure in the form of numerical linear algebra, we now have the opportunity to further share new computational infrastructure, and thus ideas, in the form of Differentiable Programming. We describe Zygote, a Differentiable Programming system that is able to take gradients of general program structures. We implement this system in the Julia programming language. Our system supports almost all language constructs (control flow, recursion, mutation, etc.) and compiles high-performance code without requiring any user intervention or refactoring to stage computations. This enables an expressive programming model for deep learning, but more importantly, it enables us to incorporate a large ecosystem of libraries in our models in a straightforward way. We discuss our approach to automatic differentiation, including its support for advanced techniques such as mixed-mode, complex and checkpointed differentiation, and present several examples of differentiating programs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Zygote adds broad Julia language support to source-to-source AD, which is useful, but the paper leaves the hardest correctness cases for mixed constructs lightly tested.

read the letter

The new piece is Zygote itself: a source-to-source AD system written for Julia that claims to handle control flow, recursion, mutation, and most other constructs without forcing users to stage or rewrite code. That combination lets existing Julia scientific libraries be dropped straight into gradient-based models, which is the practical payoff the authors highlight. The paper walks through their approach to mixed-mode and checkpointed differentiation and gives several examples of differentiated programs, which shows they have thought about real usage patterns rather than just toy cases. The Julia ecosystem angle is handled cleanly and the implementation is presented as production-ready rather than a research prototype. The soft spot is exactly the one the stress-test flags. The abstract and description assert that arbitrary nestings of the supported features will produce correct gradients, yet the examples do not include a clear case that combines recursion, in-place mutation, and data-dependent control flow in one function. AD rules for mutation are sensitive to aliasing and path sensitivity, so an untested interaction can fail silently even if each feature works alone. The paper does not supply benchmarks, generated code listings, or independent test results that would let a reader verify the claim for themselves. This is not fatal, but it is a gap that makes the strongest claims harder to assess from the text alone. The work is aimed at people already using Julia for scientific computing who want to add differentiability, or at AD researchers looking at language-level support. A reader who cares about practical differentiable programming will find the implementation details worth their time. It is worth sending to peer review because the system exists, the design choices are explained, and the core idea is testable; referees can ask for the missing test cases and performance numbers.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Zygote, a source-to-source automatic differentiation system implemented in Julia. It claims to support differentiation of general program structures including control flow, recursion, mutation and other constructs without requiring user refactoring or staging, while generating high-performance code. The work discusses advanced techniques such as mixed-mode, complex and checkpointed differentiation and presents examples to bridge machine learning and scientific computing.

Significance. If the implementation correctly handles arbitrary nestings of the supported constructs while preserving numerical correctness, the system would enable direct incorporation of existing Julia scientific libraries into differentiable models, providing shared infrastructure between the fields beyond numerical linear algebra.

major comments (2)

[Abstract] Abstract: the central claim that the system 'supports almost all language constructs (control flow, recursion, mutation, etc.)' and produces correct gradients for general programs without user intervention is load-bearing but unsupported by any concrete test cases, error analysis, or examples exercising interactions among recursion, mutation and data-dependent control flow.
[Discussion of mixed-mode and checkpointed differentiation] Discussion of mixed-mode and checkpointed differentiation: the description of these techniques does not address aliasing or control-flow path sensitivity when mutation and recursion are combined, leaving the correctness of adjoints for arbitrary combinations unverified.

minor comments (2)

The manuscript lacks performance benchmarks, comparisons to other AD frameworks, or listings of generated adjoint code to substantiate the high-performance claim.
No quantitative error analysis or verification suite is reported to confirm numerical correctness across the supported language features.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their comments, which help clarify the presentation of our claims regarding Zygote's capabilities. We respond to each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the system 'supports almost all language constructs (control flow, recursion, mutation, etc.)' and produces correct gradients for general programs without user intervention is load-bearing but unsupported by any concrete test cases, error analysis, or examples exercising interactions among recursion, mutation and data-dependent control flow.

Authors: The referee is correct that the manuscript does not provide a single test case or error analysis that exercises the interaction of recursion, mutation, and data-dependent control flow simultaneously. While individual features are demonstrated, the combined case is not explicitly shown. We will add an example in the revised manuscript that combines these elements to support the abstract's claim. revision: yes
Referee: [Discussion of mixed-mode and checkpointed differentiation] Discussion of mixed-mode and checkpointed differentiation: the description of these techniques does not address aliasing or control-flow path sensitivity when mutation and recursion are combined, leaving the correctness of adjoints for arbitrary combinations unverified.

Authors: We agree that the discussion of mixed-mode and checkpointed differentiation does not explicitly treat aliasing or control-flow path sensitivity in the context of combined mutation and recursion. The current text focuses on the techniques in isolation. In revision, we will expand this section to discuss these issues and note the verification status for arbitrary combinations. revision: yes

Circularity Check

0 steps flagged

No circularity; paper describes software system implementation without derivation chain or fitted predictions

full rationale

The manuscript presents Zygote as a Julia-based differentiable programming system supporting control flow, recursion, and mutation via source-to-source AD. No equations, first-principles derivations, or numerical predictions appear in the abstract or described content. Claims concern system capabilities and examples of differentiation rather than results obtained by fitting parameters to data or reducing via self-referential definitions. Self-citations, if present, are not load-bearing for any asserted uniqueness theorem or ansatz. The paper is therefore self-contained as an engineering description; correctness would be assessed by code inspection or benchmarks external to any internal chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the correctness and completeness of the automatic differentiation implementation for Julia's full language semantics; no free parameters or invented physical entities are introduced.

axioms (1)

domain assumption Automatic differentiation can be applied to arbitrary combinations of control flow, recursion, and mutation while preserving correctness and generating efficient code.
This premise is invoked when the abstract asserts support for almost all language constructs without user intervention.

invented entities (1)

Zygote differentiable programming system no independent evidence
purpose: To enable gradient computation on general Julia programs
The paper introduces this new software artifact as the vehicle for the claimed capabilities.

pith-pipeline@v0.9.0 · 5748 in / 1349 out tokens · 25623 ms · 2026-05-24T20:06:05.027856+00:00 · methodology

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Neural Compiler: Program-to-Network Translation for Hybrid Scientific Machine Learning
cs.LG 2026-05 unverdicted novelty 7.0

The Neural Compiler converts symbolic programs into exact differentiable PyTorch modules for hybrid scientific machine learning, enabling precise encoding of known physics with few trainable parameters.
Decision-Focused Federated Learning Under Heterogeneous Objectives and Constraints
math.OC 2026-04 unverdicted novelty 7.0

Derives heterogeneity bounds separating objective-shift and feasible-set-shift effects in decision-focused federated learning and shows federation benefits when statistical gains exceed client-specific penalties.
VertAX: a differentiable vertex model for learning epithelial tissue mechanics
cs.LG 2026-04 unverdicted novelty 7.0

VertAX supplies a differentiable JAX implementation of vertex models for confluent epithelia that enables forward simulation, mechanical parameter inference, and inverse design of tissue-scale behaviors.
Universal Differential Equations for Scientific Machine Learning
cs.LG 2020-01 unverdicted novelty 7.0

Universal Differential Equations unify scientific models with machine learning by embedding flexible approximators into differential equations, enabling applications from biological mechanism discovery to high-dimensi...
Decision-Focused Federated Learning Under Heterogeneous Objectives and Constraints
math.OC 2026-04 conditional novelty 6.0

New bounds on SPO+ loss heterogeneity in federated predict-then-optimize with varying objectives and constraints indicate federation benefits when statistical gains exceed heterogeneity costs, with robustness in stron...
Physics-informed reservoir characterization from bulk and extreme pressure events with a differentiable simulator
cs.LG 2026-04 unverdicted novelty 6.0

A physics-informed ML method embeds a differentiable flow simulator into neural network training to infer permeability from sparse pressure data, halving inference error versus data-driven baselines across scenarios a...
Learning Non-Markovian Noise via Ensemble Optimal Control
quant-ph 2026-04 unverdicted novelty 5.0

Machine learning trains an ensemble optimal control scheme to pick optimal measurement times for non-Markovian quantum noise parameters, reaching near Cramér-Rao bound precision.
Neural Computers
cs.LG 2026-04 unverdicted novelty 5.0

Neural Computers are introduced as a new machine form where computation, memory, and I/O are unified in a learned runtime state, with initial video-model experiments showing acquisition of basic interface primitives f...

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · cited by 7 Pith papers · 6 internal anchors

[1]

MPI Forum, 2015

MPI - A Message Passing Interface Standard. MPI Forum, 2015

work page 2015
[2]

Abadi, P

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. Tensorﬂow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), pages 265–283, 2016

work page 2016
[3]

Anderson, Z

E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users’ guide, volume 9. SIAM, 1999

work page 1999
[4]

C. G. Atkeson and J. C. Santamaria. A comparison of direct and model-based reinforcement learning. In Proceedings of International Conference on Robotics and Automation, volume 4, pages 3557–3564. IEEE, 1997

work page 1997
[5]

Bar-Sinai, S

Y . Bar-Sinai, S. Hoyer, J. Hickey, and M. P. Brenner. Data-driven discretization: machine learning for coarse graining of partial differential equations. arXiv e-prints, page arXiv:1808.04930, Aug 2018

work page arXiv 2018
[6]

A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind. Automatic differentiation in machine learning: a survey. Journal of Marchine Learning Research, 18:1–43, 2018

work page 2018
[7]

Bezanson, A

J. Bezanson, A. Edelman, S. Karpinski, and V . B. Shah. Julia: A fresh approach to numerical computing. SIAM Review, 59(1):65–98, 2017

work page 2017
[8]

Bischof, P

C. Bischof, P. Khademi, A. Mauer, and A. Carle. ADIFOR 2.0: Automatic differentiation of Fortran 77 programs. IEEE Computational Science and Engineering, 3(3):18–32, 1996

work page 1996
[9]

S. Byrne. Miletus: Writing ﬁnancial contracts in julia, 2019

work page 2019
[10]

T. Q. Chen, Y . Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural ordinary differential equations. In Advances in Neural Information Processing Systems, pages 6571–6583, 2018

work page 2018
[11]

M. F. Cusumano-Towner, F. A. Saad, A. K. Lew, and V . K. Mansinghka. Gen: A general-purpose probabilistic programming system with programmable inference. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, pages 221–236, New York, NY , USA, 2019. ACM

work page 2019
[12]

de Avila Belbute-Peres, K

F. de Avila Belbute-Peres, K. Smith, K. Allen, J. Tenenbaum, and J. Z. Kolter. End-to-end differentiable physics for learning and control. In Advances in Neural Information Processing Systems, pages 7178–7189, 2018

work page 2018
[13]

Degrave, M

J. Degrave, M. Hermans, J. Dambre, and F. wyffels. A differentiable physics engine for deep learning in robotics. Frontiers in Neurorobotics, 13, Mar 2019

work page 2019
[14]

J. J. Dongarra, J. D. Cruz, S. Hammarling, and I. S. Duff. Algorithm 679: A set of level 3 basic linear algebra subprograms: model implementation and test programs. ACM Transactions on Mathematical Software (TOMS), 16(1):18–28, 1990

work page 1990
[15]

Dupont, A

E. Dupont, A. Doucet, and Y . Whye Teh. Augmented Neural ODEs.arXiv e-prints, page arXiv:1904.01681, Apr 2019

work page arXiv 1904
[16]

Automatic Full Compilation of Julia Programs and ML Models to Cloud TPUs

K. Fischer and E. Saba. Automatic full compilation of Julia programs and ML models to cloud TPUs. CoRR, abs/1810.09868, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[17]

Fischer and E

K. Fischer and E. Saba. XLA.jl: Compiling Julia to XLA. https://github.com/JuliaTPU/XLA.jl, 2018

work page 2018
[18]

Gandhi, M

D. Gandhi, M. Innes, E. Saba, K. Fischer, and V . Shah. Julia E Flux: Modernizando o Aprendizado de Máquina, 2019

work page 2019
[19]

P. Gao, A. Honkela, M. Rattray, and N. D. Lawrence. Gaussian process modelling of latent chemical species: applications to inferring transcription factor activities. Bioinformatics, 24(16):i70–i75, 08 2008

work page 2008
[20]

H. Ge, K. Xu, and Z. Ghahramani. Turing: Composable inference for probabilistic programming. In International Conference on Artiﬁcial Intelligence and Statistics, AISTATS 2018, 9-11 April 2018, Playa Blanca, Lanzarote, Canary Islands, Spain, pages 1682–1690, 2018

work page 2018
[21]

Giordano

M. Giordano. Uncertainty propagation with functionally correlated quantities. ArXiv e-prints, Oct. 2016

work page 2016
[22]

Gobet and R

E. Gobet and R. Munos. Sensitivity analysis using Itô–Malliavin calculus and martingales, and application to stochastic optimal control. SIAM Journal on Control and Optimization, 43(5):1676–1713, 2005. 12

work page 2005
[23]

FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models

W. Grathwohl, R. T. Q. Chen, J. Bettencourt, I. Sutskever, and D. K. Duvenaud. FFJORD: free-form continuous dynamics for scalable reversible generative models. CoRR, abs/1810.01367, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[24]

Hartman and L

D. Hartman and L. K. Mestha. A deep learning framework for model reduction of dynamical systems. In 2017 IEEE Conference on Control Technology and Applications (CCTA), pages 1917–1922, Aug 2017

work page 2017
[25]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

work page 2016
[26]

Hochreiter and J

S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735–1780, Nov. 1997

work page 1997
[27]

Y . Hu, S. Boker, M. Neale, and K. Klump. Coupled latent differential equation with moderators: Simulation and application. Psychological methods, 19, 05 2013

work page 2013
[28]

Huang and J.-a

Z.-y. Huang and J.-a. Yan. Malliavin calculus. In Introduction to Inﬁnite Dimensional Stochastic Analysis, pages 59–112. Springer, 2000

work page 2000
[29]

M. Innes. Don’t unroll adjoint: Differentiating SSA-form programs. CoRR, abs/1810.07951, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[30]

Innes, N

M. Innes, N. M. Joy, and T. Karmali. Reinforcement learning vs. differentiable programming. https: //fluxml.ai/2019/03/05/dp-vs-rl.html, 2019

work page 2019
[31]

Fashionable Modelling with Flux

M. Innes, E. Saba, K. Fischer, D. Gandhi, M. C. Rudilosso, N. M. Joy, T. Karmali, A. Pal, and V . Shah. Fashionable modelling with Flux. CoRR, abs/1811.01457, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[32]

Johnson, R

M. Johnson, R. Frostig, D. Maclaurin, and C. Leary. JAX: Autograd and xla. https://github.com/ google/jax, 2018

work page 2018
[33]

S. P. Jones and J.-M. Eber. How to write a ﬁnancial contract. 2003

work page 2003
[34]

S. P. Jones, J.-M. Eber, and J. Seward. Composing contracts: an adventure in ﬁnancial engineering. ACM SIG-PLAN Notices, 35(9):280–292, 2000

work page 2000
[35]

Kurth, S

T. Kurth, S. Treichler, J. Romero, M. Mudigonda, N. Luehr, E. Phillips, A. Mahesh, M. Matheson, J. Deslippe, M. Fatica, et al. Exascale deep learning for climate analytics. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, page 51. IEEE Press, 2018

work page 2018
[36]

T.-M. Li, M. Aittala, F. Durand, and J. Lehtinen. Differentiable monte carlo ray tracing through edge sampling. In SIGGRAPH Asia 2018 Technical Papers, page 222. ACM, 2018

work page 2018
[37]

Ordaz-Hernandez, X

K. Ordaz-Hernandez, X. Fischer, and F. Bennis. Model reduction technique for mechanical behaviour modelling: Efﬁciency criteria and validity domain assessment.Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 222(3):493–505, 2008

work page 2008
[38]

B. A. Pearlmutter and J. M. Siskind. Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator. ACM Transactions on Programming Languages and Systems (TOPLAS), 30(2):7, 2008

work page 2008
[39]

PyTorch, a, year in

PyTorch Team. PyTorch, a, year in... pytorch.org/blog/a-year-in, 2018. Accessed: 2018-09-22

work page 2018
[40]

The road to 1.0: production ready PyTorch.https://pytorch.org/blog/a-year-in/,

PyTorch Team. The road to 1.0: production ready PyTorch.https://pytorch.org/blog/a-year-in/,

work page
[41]

Accessed: 2018-09-22

work page 2018
[42]

DiffEqFlux.jl - A Julia Library for Neural Differential Equations

C. Rackauckas, M. Innes, Y . Ma, J. Bettencourt, L. White, and V . Dixit. Diffeqﬂux.jl - A julia library for neural differential equations. CoRR, abs/1902.02376, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902
[43]

Rackauckas, Y

C. Rackauckas, Y . Ma, V . Dixit, X. Guo, M. Innes, J. Revels, J. Nyberg, and V . Ivaturi. A Comparison of Automatic Differentiation and Continuous Sensitivity Analysis for Derivatives of Differential Equation Solutions. arXiv e-prints, page arXiv:1812.01892, Dec 2018

work page internal anchor Pith review arXiv 2018
[44]

Rackauckas and Q

C. Rackauckas and Q. Nie. Differentialequations.jl – a performant and feature-rich ecosystem for solving differential equations in julia. 5(1), 2017. Exported from https://app.dimensions.ai on 2019/05/05

work page 2017
[45]

C. V . Rackauckas and Q. Nie. Adaptive methods for stochastic differential equations via natural embeddings and rejection sampling with memory. Discrete and continuous dynamical systems. Series B, 22 7:2731– 2761, 2017

work page 2017
[46]

H. M. R. Ugalde, J.-C. Carmona, V . M. Alvarado, and J. Reyes-Reyes. Neural network design and model reduction approach for black box nonlinear system identiﬁcation with reduced number of parameters. Neurocomputing, 101:170 – 180, 2013

work page 2013
[47]

F. Wang, X. Wu, G. Essertel, J. Decker, and T. Rompf. Demystifying differentiable programming: Shift/reset the penultimate backpropagator. arXiv preprint arXiv:1803.10228, 2018

work page arXiv 2018
[48]

H. Zhang. The Malliavan Calculus. PhD thesis, 2004

work page 2004
[49]

zhe Luo, J

X. zhe Luo, J. guo Liu, P. Zhang, and L. Wang. Yao.jl: Extensible, efﬁcient quantum algorithm design for humans. In preparation, 2019. 13

work page 2019
[50]

Álvarez, D

M. Álvarez, D. Luengo, and N. D. Lawrence. Latent force models. In D. van Dyk and M. Welling, editors, Proceedings of the Twelth International Conference on Artiﬁcial Intelligence and Statistics, volume 5 of Proceedings of Machine Learning Research, pages 9–16, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, 16–18 Apr 2009. PMLR. 14

work page 2009

[1] [1]

MPI Forum, 2015

MPI - A Message Passing Interface Standard. MPI Forum, 2015

work page 2015

[2] [2]

Abadi, P

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. Tensorﬂow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), pages 265–283, 2016

work page 2016

[3] [3]

Anderson, Z

E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users’ guide, volume 9. SIAM, 1999

work page 1999

[4] [4]

C. G. Atkeson and J. C. Santamaria. A comparison of direct and model-based reinforcement learning. In Proceedings of International Conference on Robotics and Automation, volume 4, pages 3557–3564. IEEE, 1997

work page 1997

[5] [5]

Bar-Sinai, S

Y . Bar-Sinai, S. Hoyer, J. Hickey, and M. P. Brenner. Data-driven discretization: machine learning for coarse graining of partial differential equations. arXiv e-prints, page arXiv:1808.04930, Aug 2018

work page arXiv 2018

[6] [6]

A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind. Automatic differentiation in machine learning: a survey. Journal of Marchine Learning Research, 18:1–43, 2018

work page 2018

[7] [7]

Bezanson, A

J. Bezanson, A. Edelman, S. Karpinski, and V . B. Shah. Julia: A fresh approach to numerical computing. SIAM Review, 59(1):65–98, 2017

work page 2017

[8] [8]

Bischof, P

C. Bischof, P. Khademi, A. Mauer, and A. Carle. ADIFOR 2.0: Automatic differentiation of Fortran 77 programs. IEEE Computational Science and Engineering, 3(3):18–32, 1996

work page 1996

[9] [9]

S. Byrne. Miletus: Writing ﬁnancial contracts in julia, 2019

work page 2019

[10] [10]

T. Q. Chen, Y . Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural ordinary differential equations. In Advances in Neural Information Processing Systems, pages 6571–6583, 2018

work page 2018

[11] [11]

M. F. Cusumano-Towner, F. A. Saad, A. K. Lew, and V . K. Mansinghka. Gen: A general-purpose probabilistic programming system with programmable inference. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, pages 221–236, New York, NY , USA, 2019. ACM

work page 2019

[12] [12]

de Avila Belbute-Peres, K

F. de Avila Belbute-Peres, K. Smith, K. Allen, J. Tenenbaum, and J. Z. Kolter. End-to-end differentiable physics for learning and control. In Advances in Neural Information Processing Systems, pages 7178–7189, 2018

work page 2018

[13] [13]

Degrave, M

J. Degrave, M. Hermans, J. Dambre, and F. wyffels. A differentiable physics engine for deep learning in robotics. Frontiers in Neurorobotics, 13, Mar 2019

work page 2019

[14] [14]

J. J. Dongarra, J. D. Cruz, S. Hammarling, and I. S. Duff. Algorithm 679: A set of level 3 basic linear algebra subprograms: model implementation and test programs. ACM Transactions on Mathematical Software (TOMS), 16(1):18–28, 1990

work page 1990

[15] [15]

Dupont, A

E. Dupont, A. Doucet, and Y . Whye Teh. Augmented Neural ODEs.arXiv e-prints, page arXiv:1904.01681, Apr 2019

work page arXiv 1904

[16] [16]

Automatic Full Compilation of Julia Programs and ML Models to Cloud TPUs

K. Fischer and E. Saba. Automatic full compilation of Julia programs and ML models to cloud TPUs. CoRR, abs/1810.09868, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[17] [17]

Fischer and E

K. Fischer and E. Saba. XLA.jl: Compiling Julia to XLA. https://github.com/JuliaTPU/XLA.jl, 2018

work page 2018

[18] [18]

Gandhi, M

D. Gandhi, M. Innes, E. Saba, K. Fischer, and V . Shah. Julia E Flux: Modernizando o Aprendizado de Máquina, 2019

work page 2019

[19] [19]

P. Gao, A. Honkela, M. Rattray, and N. D. Lawrence. Gaussian process modelling of latent chemical species: applications to inferring transcription factor activities. Bioinformatics, 24(16):i70–i75, 08 2008

work page 2008

[20] [20]

H. Ge, K. Xu, and Z. Ghahramani. Turing: Composable inference for probabilistic programming. In International Conference on Artiﬁcial Intelligence and Statistics, AISTATS 2018, 9-11 April 2018, Playa Blanca, Lanzarote, Canary Islands, Spain, pages 1682–1690, 2018

work page 2018

[21] [21]

Giordano

M. Giordano. Uncertainty propagation with functionally correlated quantities. ArXiv e-prints, Oct. 2016

work page 2016

[22] [22]

Gobet and R

E. Gobet and R. Munos. Sensitivity analysis using Itô–Malliavin calculus and martingales, and application to stochastic optimal control. SIAM Journal on Control and Optimization, 43(5):1676–1713, 2005. 12

work page 2005

[23] [23]

FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models

W. Grathwohl, R. T. Q. Chen, J. Bettencourt, I. Sutskever, and D. K. Duvenaud. FFJORD: free-form continuous dynamics for scalable reversible generative models. CoRR, abs/1810.01367, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[24] [24]

Hartman and L

D. Hartman and L. K. Mestha. A deep learning framework for model reduction of dynamical systems. In 2017 IEEE Conference on Control Technology and Applications (CCTA), pages 1917–1922, Aug 2017

work page 2017

[25] [25]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

work page 2016

[26] [26]

Hochreiter and J

S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735–1780, Nov. 1997

work page 1997

[27] [27]

Y . Hu, S. Boker, M. Neale, and K. Klump. Coupled latent differential equation with moderators: Simulation and application. Psychological methods, 19, 05 2013

work page 2013

[28] [28]

Huang and J.-a

Z.-y. Huang and J.-a. Yan. Malliavin calculus. In Introduction to Inﬁnite Dimensional Stochastic Analysis, pages 59–112. Springer, 2000

work page 2000

[29] [29]

M. Innes. Don’t unroll adjoint: Differentiating SSA-form programs. CoRR, abs/1810.07951, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[30] [30]

Innes, N

M. Innes, N. M. Joy, and T. Karmali. Reinforcement learning vs. differentiable programming. https: //fluxml.ai/2019/03/05/dp-vs-rl.html, 2019

work page 2019

[31] [31]

Fashionable Modelling with Flux

M. Innes, E. Saba, K. Fischer, D. Gandhi, M. C. Rudilosso, N. M. Joy, T. Karmali, A. Pal, and V . Shah. Fashionable modelling with Flux. CoRR, abs/1811.01457, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[32] [32]

Johnson, R

M. Johnson, R. Frostig, D. Maclaurin, and C. Leary. JAX: Autograd and xla. https://github.com/ google/jax, 2018

work page 2018

[33] [33]

S. P. Jones and J.-M. Eber. How to write a ﬁnancial contract. 2003

work page 2003

[34] [34]

S. P. Jones, J.-M. Eber, and J. Seward. Composing contracts: an adventure in ﬁnancial engineering. ACM SIG-PLAN Notices, 35(9):280–292, 2000

work page 2000

[35] [35]

Kurth, S

T. Kurth, S. Treichler, J. Romero, M. Mudigonda, N. Luehr, E. Phillips, A. Mahesh, M. Matheson, J. Deslippe, M. Fatica, et al. Exascale deep learning for climate analytics. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, page 51. IEEE Press, 2018

work page 2018

[36] [36]

T.-M. Li, M. Aittala, F. Durand, and J. Lehtinen. Differentiable monte carlo ray tracing through edge sampling. In SIGGRAPH Asia 2018 Technical Papers, page 222. ACM, 2018

work page 2018

[37] [37]

Ordaz-Hernandez, X

K. Ordaz-Hernandez, X. Fischer, and F. Bennis. Model reduction technique for mechanical behaviour modelling: Efﬁciency criteria and validity domain assessment.Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 222(3):493–505, 2008

work page 2008

[38] [38]

B. A. Pearlmutter and J. M. Siskind. Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator. ACM Transactions on Programming Languages and Systems (TOPLAS), 30(2):7, 2008

work page 2008

[39] [39]

PyTorch, a, year in

PyTorch Team. PyTorch, a, year in... pytorch.org/blog/a-year-in, 2018. Accessed: 2018-09-22

work page 2018

[40] [40]

The road to 1.0: production ready PyTorch.https://pytorch.org/blog/a-year-in/,

PyTorch Team. The road to 1.0: production ready PyTorch.https://pytorch.org/blog/a-year-in/,

work page

[41] [41]

Accessed: 2018-09-22

work page 2018

[42] [42]

DiffEqFlux.jl - A Julia Library for Neural Differential Equations

C. Rackauckas, M. Innes, Y . Ma, J. Bettencourt, L. White, and V . Dixit. Diffeqﬂux.jl - A julia library for neural differential equations. CoRR, abs/1902.02376, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902

[43] [43]

Rackauckas, Y

C. Rackauckas, Y . Ma, V . Dixit, X. Guo, M. Innes, J. Revels, J. Nyberg, and V . Ivaturi. A Comparison of Automatic Differentiation and Continuous Sensitivity Analysis for Derivatives of Differential Equation Solutions. arXiv e-prints, page arXiv:1812.01892, Dec 2018

work page internal anchor Pith review arXiv 2018

[44] [44]

Rackauckas and Q

C. Rackauckas and Q. Nie. Differentialequations.jl – a performant and feature-rich ecosystem for solving differential equations in julia. 5(1), 2017. Exported from https://app.dimensions.ai on 2019/05/05

work page 2017

[45] [45]

C. V . Rackauckas and Q. Nie. Adaptive methods for stochastic differential equations via natural embeddings and rejection sampling with memory. Discrete and continuous dynamical systems. Series B, 22 7:2731– 2761, 2017

work page 2017

[46] [46]

H. M. R. Ugalde, J.-C. Carmona, V . M. Alvarado, and J. Reyes-Reyes. Neural network design and model reduction approach for black box nonlinear system identiﬁcation with reduced number of parameters. Neurocomputing, 101:170 – 180, 2013

work page 2013

[47] [47]

F. Wang, X. Wu, G. Essertel, J. Decker, and T. Rompf. Demystifying differentiable programming: Shift/reset the penultimate backpropagator. arXiv preprint arXiv:1803.10228, 2018

work page arXiv 2018

[48] [48]

H. Zhang. The Malliavan Calculus. PhD thesis, 2004

work page 2004

[49] [49]

zhe Luo, J

X. zhe Luo, J. guo Liu, P. Zhang, and L. Wang. Yao.jl: Extensible, efﬁcient quantum algorithm design for humans. In preparation, 2019. 13

work page 2019

[50] [50]

Álvarez, D

M. Álvarez, D. Luengo, and N. D. Lawrence. Latent force models. In D. van Dyk and M. Welling, editors, Proceedings of the Twelth International Conference on Artiﬁcial Intelligence and Statistics, volume 5 of Proceedings of Machine Learning Research, pages 9–16, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, 16–18 Apr 2009. PMLR. 14

work page 2009