pith. machine review for the scientific record. sign in

arxiv: 1806.07366 · v5 · submitted 2018-06-19 · 💻 cs.LG · cs.AI· stat.ML

Recognition: 3 theorem links

· Lean Theorem

Neural Ordinary Differential Equations

Authors on Pith no claims yet

Pith reviewed 2026-05-15 12:56 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML
keywords neural ordinary differential equationscontinuous-depth networksresidual networksnormalizing flowsODE solversgenerative modelsbackpropagation through ODEs
0
0 comments X

The pith

Deep neural networks can replace discrete layers with continuous dynamics defined by ordinary differential equations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a family of models where the hidden state evolves continuously according to an ODE whose right-hand side is given by a neural network. Instead of stacking a fixed number of layers, the network output is the result of integrating this dynamical system with a black-box numerical solver. This design keeps memory usage constant with respect to depth, lets the solver choose how much computation to spend on each input, and allows trading numerical accuracy for speed. The same approach yields continuous normalizing flows that can be trained by maximum likelihood without ordering or partitioning the data dimensions. Training works by backpropagating through the ODE solver without inspecting its internal steps.

Core claim

A neural network parameterizes the derivative of the hidden state, dh/dt = f(h(t), t, θ), and the model output is obtained by integrating the ODE from an initial time to a final time using any standard solver. This continuous-depth formulation replaces the explicit sequence of discrete layers while preserving end-to-end differentiability.

What carries the argument

The neural ODE, a neural network that defines the vector field for the hidden-state derivative, integrated by a black-box ODE solver to produce the network output.

Load-bearing premise

A neural network can define a vector field whose integrated trajectory yields useful representations and that standard ODE solvers stay numerically stable and supply usable gradients during training.

What would settle it

A neural ODE model trained on a standard supervised classification benchmark fails to reach competitive accuracy or diverges due to solver instability while a discrete residual network of comparable capacity succeeds.

read the original abstract

We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The manuscript introduces Neural Ordinary Differential Equations (Neural ODEs), a new family of deep neural network models in which the derivative of the hidden state is parameterized by a neural network and the output is obtained by integrating the resulting ODE with a black-box solver. The central claims are that these continuous-depth models incur constant memory cost, adapt their evaluation depth to each input, and permit explicit trade-offs between numerical precision and speed. The authors derive a scalable adjoint-based backpropagation method that does not require access to the solver internals, and they demonstrate the approach on continuous-depth residual networks, latent ODE models, and continuous normalizing flows that can be trained by maximum likelihood without data partitioning or ordering.

Significance. If the results hold, the work is significant because it supplies a mathematically grounded continuous-depth alternative to discrete-layer networks that inherits standard existence/uniqueness guarantees for ODEs and the adjoint sensitivity method for gradients. The explicit construction of continuous normalizing flows and the constant-memory training procedure constitute concrete, falsifiable advances that open new modeling possibilities for dynamical systems and generative models.

minor comments (1)
  1. [Section 4] Section 4 and the associated experimental tables: the precise numerical tolerances, solver types (e.g., dopri5 vs. rk4), and step-size controls used for each reported run are not stated uniformly; adding a short reproducibility paragraph or supplementary table would strengthen the constant-memory and adaptive-depth claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of the manuscript, the accurate summary of our contributions, and the recommendation to accept. We are pleased that the significance of the continuous-depth formulation, constant-memory training via the adjoint method, and the construction of continuous normalizing flows was recognized.

Circularity Check

0 steps flagged

No significant circularity in the Neural ODE derivation

full rationale

The paper defines the model directly as dh/dt = f_θ(h(t), t) with f a neural network, integrated by black-box ODE solver. The adjoint backpropagation formula is derived explicitly in Section 3 from the chain rule on the integral without presupposing fitted parameters or target quantities. No self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations that encode the result are present. The construction rests on standard ODE existence results and is independently verifiable through the continuous ResNet, latent ODE, and normalizing flow experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard existence and uniqueness results for ODEs and on the differentiability of the black-box solver. No new entities are postulated. The neural network parameters that define the vector field are ordinary trainable weights, not free parameters introduced ad hoc.

axioms (2)
  • standard math Solutions to the initial-value problem exist and are unique for the Lipschitz-continuous vector fields produced by the neural network.
    Invoked implicitly when the ODE solver is called; standard Picard-Lindelöf theorem.
  • domain assumption The black-box ODE solver is differentiable with respect to its initial condition and parameters.
    Required for the adjoint method to produce correct gradients; stated in the training section.

pith-pipeline@v0.9.0 · 5432 in / 1403 out tokens · 31679 ms · 2026-05-15T12:56:04.870070+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • Foundation.Hamiltonian energy_conservation echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input

  • Foundation.DAlembert.Inevitability bilinear_family_forced echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver

  • Foundation.Hamiltonian H_EnergyConservation refines
    ?
    refines

    Relation between the paper passage and the cited Recognition theorem.

    the adjoint sensitivity method (Pontryagin et al., 1962). This approach computes gradients by solving a second, augmented ODE backwards in time

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Language Modeling with Hyperspherical Flows

    cs.LG 2026-05 unverdicted novelty 8.0

    S-FLM rotates vectors on a hypersphere using a learned velocity field to generate language sequences, improving continuous flow models on large-vocabulary reasoning and closing the gap to masked diffusion at standard ...

  2. Learning Lindblad Dynamics of a Superconducting Quantum Processor

    quant-ph 2026-05 unverdicted novelty 7.0

    LIMINAL fits nested Lindblad models to tomographic data and uses likelihood-ratio tests to identify minimal dynamics for a five-qubit superconducting processor, supporting three-local Hamiltonian terms and two-local d...

  3. CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies

    cs.CV 2026-04 unverdicted novelty 7.0

    CF-VLA uses a coarse initialization over endpoint velocity followed by single-step refinement to achieve strong performance with low inference steps on CALVIN, LIBERO, and real-robot tasks.

  4. Scalable Generative Sampling and Multilevel Estimation for Lattice Field Theories Near Criticality

    hep-lat 2026-04 unverdicted novelty 7.0

    A hierarchical generative model for critical lattice scalar field theories achieves orders-of-magnitude lower autocorrelation times than HMC while enabling exact multilevel Monte Carlo.

  5. Differentiable free energy surface: a variational approach to directly observing rare events using generative deep-learning models

    physics.comp-ph 2026-04 unverdicted novelty 7.0

    VaFES constructs a latent space from reversible collective variables and variationally optimizes a tractable-density generative model to produce a continuous free energy surface from which rare events are directly sampled.

  6. Continuum Robot Modeling with Action Conditioned Flow Matching

    cs.RO 2026-05 unverdicted novelty 6.0

    A conditional point-cloud flow matching model maps motor actuation to 3D geometry of tendon-driven continuum robots and outperforms prior self-modeling methods on simulated and real 2- and 3-module hardware.

  7. Accelerating the Simulation of Ordinary Differential Equations Through Physics-Preserving Neural Networks

    math.NA 2026-05 unverdicted novelty 6.0

    A neural network maps ODE states to a slow-evolving latent space with dynamics derived from the original equations via the chain rule, enabling accelerated simulations with fewer function calls.

  8. Exploring the Boundaries of Differentiable Radiation Transport and Detector Simulation

    physics.ins-det 2026-05 unverdicted novelty 6.0

    Targeted halting of gradient flow at unstable material boundaries enables stable derivatives for optimizing detector designs in radiation transport simulations.

  9. Constructing Inverse Potentials from Scattering Phase Shifts using Physics-Informed Neural Networks: Application to Neutron-Alpha Scattering

    nucl-th 2026-05 unverdicted novelty 6.0

    A PINN with a hard Gaussian envelope reconstructs a smooth attractive potential for neutron-alpha P-wave scattering that yields resonance parameters matching expected values.

  10. Information bottleneck for learning the phase space of dynamics from high-dimensional experimental data

    physics.data-an 2026-04 unverdicted novelty 6.0

    DySIB recovers a two-dimensional representation matching the phase space of a physical pendulum from high-dimensional video data by maximizing predictive mutual information in latent space.

  11. Tokenised Flow Matching for Hierarchical Simulation Based Inference

    cs.LG 2026-04 unverdicted novelty 6.0

    TFMPE combines likelihood factorisation with tokenised flow matching to enable efficient hierarchical SBI from single-site simulations, producing well-calibrated posteriors at lower computational cost on a new benchma...

  12. Generative Path-Law Jump-Diffusion: Sequential MMD-Gradient Flows and Generalisation Bounds in Marcus-Signature RKHS

    stat.ML 2026-04 unverdicted novelty 6.0

    The paper proposes the ANJD flow and AVNSG operator to generate càdlàg trajectories via sequential MMD-gradient descent in Marcus-signature RKHS with generalisation bounds.

  13. Anticipatory Reinforcement Learning: From Generative Path-Laws to Distributional Value Functions

    cs.LG 2026-04 unverdicted novelty 6.0

    ARL lifts states into signature-augmented manifolds and employs self-consistent proxies of future path-laws to enable deterministic expected-return evaluation while preserving contraction mappings in jump-diffusion en...

  14. Monte Carlo Event Generation with Continuous Normalizing Flows

    hep-ph 2026-04 conditional novelty 6.0

    Continuous normalizing flows improve unweighting efficiency in Monte Carlo event generation for high-jet-multiplicity collider processes by factors up to 184, with wall-time gains of about ten when combined with coupl...

  15. FluxMC: Rapid and High-Fidelity Inference for Space-Based Gravitational-Wave Observations

    astro-ph.IM 2026-04 unverdicted novelty 6.0

    FluxMC integrates flow matching with parallel tempering MCMC to converge in under five hours on high-fidelity IMRPhenomHM waveforms for massive black hole binaries, where standard methods fail after hundreds of hours ...

  16. mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs

    cs.RO 2025-12 unverdicted novelty 6.0

    mimic-video combines internet video pretraining with a flow-matching decoder to achieve state-of-the-art robotic manipulation performance with 10x better sample efficiency than vision-language-action models.

  17. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

    cs.LG 2021-04 accept novelty 6.0

    Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.

  18. Physics-Modeled Neural Networks

    cs.LG 2026-05 unverdicted novelty 5.0

    DynPMNNs replace static activations with time-evolving ODEs based on the FitzHugh-Nagumo model, achieve competitive regression performance on California Housing data with fewer parameters than Neural ODEs or CfCs, and...

  19. Beyond Silicon: Materials, Mechanisms, and Methods for Physical Neural Computing

    cs.NE 2026-04 unverdicted novelty 5.0

    Physical neural computing platforms using diverse materials offer complementary strengths for efficient on-device AI, with no single substrate excelling in all dimensions.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · cited by 19 Pith papers · 16 internal anchors

  1. [1]

    Computationally efficient convolved multiple output G aussian processes

    Mauricio A \'A lvarez and Neil D Lawrence. Computationally efficient convolved multiple output G aussian processes. Journal of Machine Learning Research, 12 0 (May): 0 1459--1500, 2011

  2. [2]

    OptNet : Differentiable optimization as a layer in neural networks

    Brandon Amos and J Zico Kolter. OptNet : Differentiable optimization as a layer in neural networks. In International Conference on Machine Learning, pages 136--145, 2017

  3. [3]

    A general-purpose software framework for dynamic optimization

    Joel Andersson. A general-purpose software framework for dynamic optimization. PhD thesis, 2013

  4. [4]

    CasADi -- A software framework for nonlinear optimization and optimal control

    Joel A E Andersson, Joris Gillis, Greg Horn, James B Rawlings, and Moritz Diehl. CasADi -- A software framework for nonlinear optimization and optimal control. Mathematical Programming Computation, In Press, 2018

  5. [5]

    Automatic differentiation in machine learning: a survey

    Atilim Gunes Baydin, Barak A Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. Automatic differentiation in machine learning: a survey. Journal of machine learning research, 18 0 (153): 0 1--153, 2018

  6. [6]

    Sylvester Normalizing Flows for Variational Inference

    Rianne van den Berg, Leonard Hasenclever, Jakub M Tomczak, and Max Welling. Sylvester normalizing flows for variational inference. arXiv preprint arXiv:1803.05649, 2018

  7. [7]

    The Stan Math Library: Reverse-Mode Automatic Differentiation in C++

    Bob Carpenter, Matthew D Hoffman, Marcus Brubaker, Daniel Lee, Peter Li, and Michael Betancourt. The Stan math library: Reverse-mode automatic differentiation in c++. arXiv preprint arXiv:1509.07164, 2015

  8. [8]

    Reversible Architectures for Arbitrarily Deep Residual Neural Networks

    Bo Chang, Lili Meng, Eldad Haber, Lars Ruthotto, David Begert, and Elliot Holtham. Reversible architectures for arbitrarily deep residual neural networks. arXiv preprint arXiv:1709.03698, 2017

  9. [9]

    Multi-level residual networks from dynamical systems view

    Bo Chang, Lili Meng, Eldad Haber, Frederick Tung, and David Begert. Multi-level residual networks from dynamical systems view. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=SyJS-OgR-

  10. [10]

    Recurrent neural networks for multivariate time series with missing values

    Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. Recurrent neural networks for multivariate time series with missing values. Scientific Reports, 8 0 (1): 0 6085, 2018. URL https://doi.org/10.1038/s41598-018-24271-9

  11. [11]

    Stewart, and Jimeng Sun

    Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F. Stewart, and Jimeng Sun. Doctor AI : Predicting clinical events via recurrent neural networks. In Proceedings of the 1st Machine Learning for Healthcare Conference, volume 56 of Proceedings of Machine Learning Research, pages 301--318. PMLR, 18--19 Aug 2016. URL http://proceedings.mlr.press/v56/...

  12. [12]

    Theory of ordinary differential equations

    Earl A Coddington and Norman Levinson. Theory of ordinary differential equations. Tata McGraw-Hill Education, 1955

  13. [13]

    NICE: Non-linear Independent Components Estimation

    Laurent Dinh, David Krueger, and Yoshua Bengio. NICE : Non-linear independent components estimation. arXiv preprint arXiv:1410.8516, 2014

  14. [14]

    Recurrent marked temporal point processes: Embedding event history to vector

    Nan Du, Hanjun Dai, Rakshit Trivedi, Utkarsh Upadhyay, Manuel Gomez-Rodriguez, and Le Song. Recurrent marked temporal point processes: Embedding event history to vector. In International Conference on Knowledge Discovery and Data Mining, pages 1555--1564. ACM, 2016

  15. [15]

    Automated derivation of the adjoint of high-level transient finite element programs

    Patrick Farrell, David Ham, Simon Funke, and Marie Rognes. Automated derivation of the adjoint of high-level transient finite element programs. SIAM Journal on Scientific Computing, 2013

  16. [16]

    Spatially adaptive computation time for residual networks

    Michael Figurnov, Maxwell D Collins, Yukun Zhu, Li Zhang, Jonathan Huang, Dmitry Vetrov, and Ruslan Salakhutdinov. Spatially adaptive computation time for residual networks. arXiv preprint, 2017

  17. [17]

    Futoma , S

    J. Futoma , S. Hariharan , and K. Heller . Learning to Detect Sepsis with a Multitask G aussian Process RNN Classifier . ArXiv e-prints, 2017

  18. [18]

    The reversible residual network: Backpropagation without storing activations

    Aidan N Gomez, Mengye Ren, Raquel Urtasun, and Roger B Grosse. The reversible residual network: Backpropagation without storing activations. In Advances in Neural Information Processing Systems, pages 2211--2221, 2017

  19. [19]

    Adaptive Computation Time for Recurrent Neural Networks

    Alex Graves. Adaptive computation time for recurrent neural networks. arXiv preprint arXiv:1603.08983, 2016

  20. [20]

    HyperNetworks

    David Ha, Andrew Dai, and Quoc V Le. Hypernetworks. arXiv preprint arXiv:1609.09106, 2016

  21. [21]

    Stable architectures for deep neural networks

    Eldad Haber and Lars Ruthotto. Stable architectures for deep neural networks. Inverse Problems, 34 0 (1): 0 014004, 2017

  22. [22]

    Hairer, S.P

    E. Hairer, S.P. N rsett, and G. Wanner. Solving Ordinary Differential Equations I -- Nonstiff Problems . Springer, 1987

  23. [23]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016 a

  24. [24]

    Identity mappings in deep residual networks

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European conference on computer vision, pages 630--645. Springer, 2016 b

  25. [25]

    Neural networks for machine learning lecture 6a overview of mini-batch gradient descent, 2012

    Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky. Neural networks for machine learning lecture 6a overview of mini-batch gradient descent, 2012

  26. [26]

    Variable Computation in Recurrent Neural Networks

    Yacine Jernite, Edouard Grave, Armand Joulin, and Tomas Mikolov. Variable computation in recurrent neural networks. arXiv preprint arXiv:1611.06188, 2016

  27. [27]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma and Jimmy Ba. Adam : A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

  28. [28]

    Kingma and Max Welling

    Diederik P. Kingma and Max Welling. Auto-encoding variational B ayes. International Conference on Learning Representations, 2014

  29. [29]

    Improved variational inference with inverse autoregressive flow

    Diederik P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. Improved variational inference with inverse autoregressive flow. In Advances in Neural Information Processing Systems, pages 4743--4751, 2016

  30. [30]

    a herungsweisen I ntegration totaler D ifferentialgleichungen . Zeitschrift f \

    W. Kutta. Beitrag zur n \"a herungsweisen I ntegration totaler D ifferentialgleichungen . Zeitschrift f \"u r Mathematik und Physik , 46: 0 435--453, 1901

  31. [31]

    A theoretical framework for back-propagation

    Yann LeCun, D Touresky, G Hinton, and T Sejnowski. A theoretical framework for back-propagation. In Proceedings of the 1988 connectionist models summer school, volume 1, pages 21--28. CMU, Pittsburgh, Pa: Morgan Kaufmann, 1988

  32. [32]

    Gradient-based learning applied to document recognition

    Yann LeCun, L \'e on Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86 0 (11): 0 2278--2324, 1998

  33. [33]

    Time-Dependent Representation for Neural Event Sequence Prediction

    Yang Li. Time-dependent representation for neural event sequence prediction. arXiv preprint arXiv:1708.00065, 2017

  34. [34]

    Directly modeling missing data in sequences with RNN s: Improved classification of clinical time series

    Zachary C Lipton, David Kale, and Randall Wetzel. Directly modeling missing data in sequences with RNN s: Improved classification of clinical time series. In Proceedings of the 1st Machine Learning for Healthcare Conference, volume 56 of Proceedings of Machine Learning Research, pages 253--270. PMLR, 18--19 Aug 2016. URL http://proceedings.mlr.press/v56/L...

  35. [35]

    Long , Y

    Z. Long , Y. Lu , X. Ma , and B. Dong . PDE-Net : Learning PDE s from Data . ArXiv e-prints, 2017

  36. [36]

    Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations

    Yiping Lu, Aoxiao Zhong, Quanzheng Li, and Bin Dong. Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. arXiv preprint arXiv:1710.10121, 2017

  37. [37]

    Autograd : Reverse-mode differentiation of native P ython

    Dougal Maclaurin, David Duvenaud, and Ryan P Adams. Autograd : Reverse-mode differentiation of native P ython. In ICML workshop on Automatic Machine Learning, 2015

  38. [38]

    The neural H awkes process: A neurally self-modulating multivariate point process

    Hongyuan Mei and Jason M Eisner. The neural H awkes process: A neurally self-modulating multivariate point process. In Advances in Neural Information Processing Systems, pages 6757--6767, 2017

  39. [39]

    Fast derivatives of likelihood functionals for ODE based models using adjoint-state method

    Valdemar Melicher, Tom Haber, and Wim Vanroose. Fast derivatives of likelihood functionals for ODE based models using adjoint-state method. Computational Statistics, 32 0 (4): 0 1621--1643, 2017

  40. [40]

    Intensit \"a tsschwankungen im fernsprechverker

    Conny Palm. Intensit \"a tsschwankungen im fernsprechverker. Ericsson Technics, 1943

  41. [41]

    Automatic differentiation in pytorch

    Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017

  42. [42]

    Gradient calculations for dynamic recurrent neural networks: A survey

    Barak A Pearlmutter. Gradient calculations for dynamic recurrent neural networks: A survey. IEEE Transactions on Neural networks, 6 0 (5): 0 1212--1228, 1995

  43. [43]

    The mathematical theory of optimal processes

    Lev Semenovich Pontryagin, EF Mishchenko, VG Boltyanskii, and RV Gamkrelidze. The mathematical theory of optimal processes. 1962

  44. [44]

    Raissi and G

    M. Raissi and G. E. Karniadakis . Hidden physics models: Machine learning of nonlinear partial differential equations . Journal of Computational Physics, pages 125--141, 2018

  45. [45]

    Multistep Neural Networks for Data-driven Discovery of Nonlinear Dynamical Systems

    Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Multistep neural networks for data-driven discovery of nonlinear dynamical systems. arXiv preprint arXiv:1801.01236, 2018 a

  46. [46]

    Numerical G aussian processes for time-dependent and nonlinear partial differential equations

    Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Numerical G aussian processes for time-dependent and nonlinear partial differential equations. SIAM Journal on Scientific Computing, 40 0 (1): 0 A172--A198, 2018 b

  47. [47]

    Stochastic backpropagation and approximate inference in deep generative models

    Danilo J Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31st International Conference on Machine Learning, pages 1278--1286, 2014

  48. [48]

    Variational Inference with Normalizing Flows

    Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770, 2015

  49. [49]

    U ber die numerische A ufl \

    C. Runge. \"U ber die numerische A ufl \"o sung von D ifferentialgleichungen . Mathematische Annalen, 46: 0 167--178, 1895

  50. [50]

    Deep Neural Networks Motivated by Partial Differential Equations

    Lars Ruthotto and Eldad Haber. Deep neural networks motivated by partial differential equations. arXiv preprint arXiv:1804.04272, 2018

  51. [51]

    Ryder, A

    T. Ryder, A. Golightly, A. S. McGough, and D. Prangle. Black-box Variational Inference for Stochastic Differential Equations . ArXiv e-prints, 2018

  52. [52]

    Probabilistic ODE solvers with R unge- K utta means

    Michael Schober, David Duvenaud, and Philipp Hennig. Probabilistic ODE solvers with R unge- K utta means. In Advances in Neural Information Processing Systems 25, 2014

  53. [53]

    Reliable Decision Support using Counterfactual Models

    Peter Schulam and Suchi Saria. What-if reasoning with counterfactual G aussian processes. arXiv preprint arXiv:1703.10651, 2017

  54. [54]

    Scalable joint models for reliable uncertainty-aware event prediction

    Hossein Soleimani, James Hensman, and Suchi Saria. Scalable joint models for reliable uncertainty-aware event prediction. IEEE transactions on pattern analysis and machine intelligence, 2017 a

  55. [55]

    Treatment-Response Models for Counterfactual Reasoning with Continuous-time, Continuous-valued Interventions

    Hossein Soleimani, Adarsh Subbaswamy, and Suchi Saria. Treatment-response models for counterfactual reasoning with continuous-time, continuous-valued interventions. arXiv preprint arXiv:1704.02038, 2017 b

  56. [56]

    Stable fluids

    Jos Stam. Stable fluids. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, pages 121--128. ACM Press/Addison-Wesley Publishing Co., 1999

  57. [57]

    Optimization and uncertainty analysis of ODE models using second order adjoint sensitivity analysis

    Paul Stapor, Fabian Froehlich, and Jan Hasenauer. Optimization and uncertainty analysis of ODE models using second order adjoint sensitivity analysis. bioRxiv, page 272005, 2018

  58. [58]

    Improving Variational Auto-Encoders using Householder Flow

    Jakub M Tomczak and Max Welling. Improving variational auto-encoders using H ouseholder flow. arXiv preprint arXiv:1611.09630, 2016

  59. [59]

    Latent-space Physics: Towards Learning the Temporal Evolution of Fluid Flow

    Steffen Wiewel, Moritz Becher, and Nils Thuerey. Latent-space physics: Towards learning the temporal evolution of fluid flow. arXiv preprint arXiv:1802.10123, 2018

  60. [60]

    Fatode: a library for forward, adjoint, and tangent linear integration of ODE s

    Hong Zhang and Adrian Sandu. Fatode: a library for forward, adjoint, and tangent linear integration of ODE s. SIAM Journal on Scientific Computing, 36 0 (5): 0 C504--C523, 2014