arxiv: 1806.07366 · v5 · submitted 2018-06-19 · 💻 cs.LG · cs.AI· stat.ML

Recognition: 3 theorem links

· Lean Theorem

Neural Ordinary Differential Equations

Ricky T. Q. Chen , Yulia Rubanova , Jesse Bettencourt , David Duvenaud

Authors on Pith no claims yet

Pith reviewed 2026-05-15 12:56 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords neural ordinary differential equationscontinuous-depth networksresidual networksnormalizing flowsODE solversgenerative modelsbackpropagation through ODEs

0 comments

The pith

Deep neural networks can replace discrete layers with continuous dynamics defined by ordinary differential equations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a family of models where the hidden state evolves continuously according to an ODE whose right-hand side is given by a neural network. Instead of stacking a fixed number of layers, the network output is the result of integrating this dynamical system with a black-box numerical solver. This design keeps memory usage constant with respect to depth, lets the solver choose how much computation to spend on each input, and allows trading numerical accuracy for speed. The same approach yields continuous normalizing flows that can be trained by maximum likelihood without ordering or partitioning the data dimensions. Training works by backpropagating through the ODE solver without inspecting its internal steps.

Core claim

A neural network parameterizes the derivative of the hidden state, dh/dt = f(h(t), t, θ), and the model output is obtained by integrating the ODE from an initial time to a final time using any standard solver. This continuous-depth formulation replaces the explicit sequence of discrete layers while preserving end-to-end differentiability.

What carries the argument

The neural ODE, a neural network that defines the vector field for the hidden-state derivative, integrated by a black-box ODE solver to produce the network output.

Load-bearing premise

A neural network can define a vector field whose integrated trajectory yields useful representations and that standard ODE solvers stay numerically stable and supply usable gradients during training.

What would settle it

A neural ODE model trained on a standard supervised classification benchmark fails to reach competitive accuracy or diverges due to solver instability while a discrete residual network of comparable capacity succeeds.

read the original abstract

We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Neural ODEs swap discrete layers for a learned continuous vector field, delivering constant memory and adaptive depth via black-box solvers and adjoint gradients.

read the letter

The punchline is that this paper replaces the usual stack of residual blocks with an ODE whose right-hand side is a neural net, then integrates it with an off-the-shelf solver. That single change buys constant memory during backprop and lets the model spend more or less compute on each input depending on how hard it is. The adjoint sensitivity derivation in section 3 is standard but applied cleanly here, and the continuous normalizing flow construction avoids the usual dimension-ordering hacks in discrete flows. Experiments show the memory savings on continuous ResNets, decent performance on latent ODEs for time series, and workable likelihood training for the flows. The math checks out against existence results for ODEs and the reported runs are consistent with the claims. One minor soft spot is that exact solver tolerances and step-size controls are not listed for every experiment, which could make precise reproduction of the speed-accuracy curves a bit harder. Training time is also slower than a plain residual net because each forward pass now calls an adaptive solver, though the paper does not overclaim on that point. This work is aimed at people who already think about residual networks or normalizing flows and want a new primitive for trading memory against depth or for building continuous-time generative models. A reader who cares about memory-efficient architectures or maximum-likelihood flows will get concrete value from the adjoint trick and the flow results. The central argument holds up without circularity or load-bearing gaps, so it deserves a serious referee. I would send it out for review and would cite the continuous-flow section in my own related work.

Referee Report

0 major / 1 minor

Summary. The manuscript introduces Neural Ordinary Differential Equations (Neural ODEs), a new family of deep neural network models in which the derivative of the hidden state is parameterized by a neural network and the output is obtained by integrating the resulting ODE with a black-box solver. The central claims are that these continuous-depth models incur constant memory cost, adapt their evaluation depth to each input, and permit explicit trade-offs between numerical precision and speed. The authors derive a scalable adjoint-based backpropagation method that does not require access to the solver internals, and they demonstrate the approach on continuous-depth residual networks, latent ODE models, and continuous normalizing flows that can be trained by maximum likelihood without data partitioning or ordering.

Significance. If the results hold, the work is significant because it supplies a mathematically grounded continuous-depth alternative to discrete-layer networks that inherits standard existence/uniqueness guarantees for ODEs and the adjoint sensitivity method for gradients. The explicit construction of continuous normalizing flows and the constant-memory training procedure constitute concrete, falsifiable advances that open new modeling possibilities for dynamical systems and generative models.

minor comments (1)

[Section 4] Section 4 and the associated experimental tables: the precise numerical tolerances, solver types (e.g., dopri5 vs. rk4), and step-size controls used for each reported run are not stated uniformly; adding a short reproducibility paragraph or supplementary table would strengthen the constant-memory and adaptive-depth claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of the manuscript, the accurate summary of our contributions, and the recommendation to accept. We are pleased that the significance of the continuous-depth formulation, constant-memory training via the adjoint method, and the construction of continuous normalizing flows was recognized.

Circularity Check

0 steps flagged

No significant circularity in the Neural ODE derivation

full rationale

The paper defines the model directly as dh/dt = f_θ(h(t), t) with f a neural network, integrated by black-box ODE solver. The adjoint backpropagation formula is derived explicitly in Section 3 from the chain rule on the integral without presupposing fitted parameters or target quantities. No self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations that encode the result are present. The construction rests on standard ODE existence results and is independently verifiable through the continuous ResNet, latent ODE, and normalizing flow experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard existence and uniqueness results for ODEs and on the differentiability of the black-box solver. No new entities are postulated. The neural network parameters that define the vector field are ordinary trainable weights, not free parameters introduced ad hoc.

axioms (2)

standard math Solutions to the initial-value problem exist and are unique for the Lipschitz-continuous vector fields produced by the neural network.
Invoked implicitly when the ODE solver is called; standard Picard-Lindelöf theorem.
domain assumption The black-box ODE solver is differentiable with respect to its initial condition and parameters.
Required for the adjoint method to produce correct gradients; stated in the training section.

pith-pipeline@v0.9.0 · 5432 in / 1403 out tokens · 31679 ms · 2026-05-15T12:56:04.870070+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Foundation.Hamiltonian energy_conservation echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input
Foundation.DAlembert.Inevitability bilinear_family_forced echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver
Foundation.Hamiltonian H_EnergyConservation refines

?

refines
Relation between the paper passage and the cited Recognition theorem.

the adjoint sensitivity method (Pontryagin et al., 1962). This approach computes gradients by solving a second, augmented ODE backwards in time

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Language Modeling with Hyperspherical Flows
cs.LG 2026-05 unverdicted novelty 8.0

S-FLM rotates vectors on a hypersphere using a learned velocity field to generate language sequences, improving continuous flow models on large-vocabulary reasoning and closing the gap to masked diffusion at standard ...
Learning Lindblad Dynamics of a Superconducting Quantum Processor
quant-ph 2026-05 unverdicted novelty 7.0

LIMINAL fits nested Lindblad models to tomographic data and uses likelihood-ratio tests to identify minimal dynamics for a five-qubit superconducting processor, supporting three-local Hamiltonian terms and two-local d...
CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies
cs.CV 2026-04 unverdicted novelty 7.0

CF-VLA uses a coarse initialization over endpoint velocity followed by single-step refinement to achieve strong performance with low inference steps on CALVIN, LIBERO, and real-robot tasks.
Scalable Generative Sampling and Multilevel Estimation for Lattice Field Theories Near Criticality
hep-lat 2026-04 unverdicted novelty 7.0

A hierarchical generative model for critical lattice scalar field theories achieves orders-of-magnitude lower autocorrelation times than HMC while enabling exact multilevel Monte Carlo.
Differentiable free energy surface: a variational approach to directly observing rare events using generative deep-learning models
physics.comp-ph 2026-04 unverdicted novelty 7.0

VaFES constructs a latent space from reversible collective variables and variationally optimizes a tractable-density generative model to produce a continuous free energy surface from which rare events are directly sampled.
Continuum Robot Modeling with Action Conditioned Flow Matching
cs.RO 2026-05 unverdicted novelty 6.0

A conditional point-cloud flow matching model maps motor actuation to 3D geometry of tendon-driven continuum robots and outperforms prior self-modeling methods on simulated and real 2- and 3-module hardware.
Accelerating the Simulation of Ordinary Differential Equations Through Physics-Preserving Neural Networks
math.NA 2026-05 unverdicted novelty 6.0

A neural network maps ODE states to a slow-evolving latent space with dynamics derived from the original equations via the chain rule, enabling accelerated simulations with fewer function calls.
Exploring the Boundaries of Differentiable Radiation Transport and Detector Simulation
physics.ins-det 2026-05 unverdicted novelty 6.0

Targeted halting of gradient flow at unstable material boundaries enables stable derivatives for optimizing detector designs in radiation transport simulations.
Constructing Inverse Potentials from Scattering Phase Shifts using Physics-Informed Neural Networks: Application to Neutron-Alpha Scattering
nucl-th 2026-05 unverdicted novelty 6.0

A PINN with a hard Gaussian envelope reconstructs a smooth attractive potential for neutron-alpha P-wave scattering that yields resonance parameters matching expected values.
Information bottleneck for learning the phase space of dynamics from high-dimensional experimental data
physics.data-an 2026-04 unverdicted novelty 6.0

DySIB recovers a two-dimensional representation matching the phase space of a physical pendulum from high-dimensional video data by maximizing predictive mutual information in latent space.
Tokenised Flow Matching for Hierarchical Simulation Based Inference
cs.LG 2026-04 unverdicted novelty 6.0

TFMPE combines likelihood factorisation with tokenised flow matching to enable efficient hierarchical SBI from single-site simulations, producing well-calibrated posteriors at lower computational cost on a new benchma...
Generative Path-Law Jump-Diffusion: Sequential MMD-Gradient Flows and Generalisation Bounds in Marcus-Signature RKHS
stat.ML 2026-04 unverdicted novelty 6.0

The paper proposes the ANJD flow and AVNSG operator to generate càdlàg trajectories via sequential MMD-gradient descent in Marcus-signature RKHS with generalisation bounds.
Anticipatory Reinforcement Learning: From Generative Path-Laws to Distributional Value Functions
cs.LG 2026-04 unverdicted novelty 6.0

ARL lifts states into signature-augmented manifolds and employs self-consistent proxies of future path-laws to enable deterministic expected-return evaluation while preserving contraction mappings in jump-diffusion en...
Monte Carlo Event Generation with Continuous Normalizing Flows
hep-ph 2026-04 conditional novelty 6.0

Continuous normalizing flows improve unweighting efficiency in Monte Carlo event generation for high-jet-multiplicity collider processes by factors up to 184, with wall-time gains of about ten when combined with coupl...
FluxMC: Rapid and High-Fidelity Inference for Space-Based Gravitational-Wave Observations
astro-ph.IM 2026-04 unverdicted novelty 6.0

FluxMC integrates flow matching with parallel tempering MCMC to converge in under five hours on high-fidelity IMRPhenomHM waveforms for massive black hole binaries, where standard methods fail after hundreds of hours ...
mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs
cs.RO 2025-12 unverdicted novelty 6.0

mimic-video combines internet video pretraining with a flow-matching decoder to achieve state-of-the-art robotic manipulation performance with 10x better sample efficiency than vision-language-action models.
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
cs.LG 2021-04 accept novelty 6.0

Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
Physics-Modeled Neural Networks
cs.LG 2026-05 unverdicted novelty 5.0

DynPMNNs replace static activations with time-evolving ODEs based on the FitzHugh-Nagumo model, achieve competitive regression performance on California Housing data with fewer parameters than Neural ODEs or CfCs, and...
Beyond Silicon: Materials, Mechanisms, and Methods for Physical Neural Computing
cs.NE 2026-04 unverdicted novelty 5.0

Physical neural computing platforms using diverse materials offer complementary strengths for efficient on-device AI, with no single substrate excelling in all dimensions.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · cited by 19 Pith papers · 16 internal anchors

[1]

Computationally efficient convolved multiple output G aussian processes

Mauricio A \'A lvarez and Neil D Lawrence. Computationally efficient convolved multiple output G aussian processes. Journal of Machine Learning Research, 12 0 (May): 0 1459--1500, 2011

work page 2011
[2]

OptNet : Differentiable optimization as a layer in neural networks

Brandon Amos and J Zico Kolter. OptNet : Differentiable optimization as a layer in neural networks. In International Conference on Machine Learning, pages 136--145, 2017

work page 2017
[3]

A general-purpose software framework for dynamic optimization

Joel Andersson. A general-purpose software framework for dynamic optimization. PhD thesis, 2013

work page 2013
[4]

CasADi -- A software framework for nonlinear optimization and optimal control

Joel A E Andersson, Joris Gillis, Greg Horn, James B Rawlings, and Moritz Diehl. CasADi -- A software framework for nonlinear optimization and optimal control. Mathematical Programming Computation, In Press, 2018

work page 2018
[5]

Automatic differentiation in machine learning: a survey

Atilim Gunes Baydin, Barak A Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. Automatic differentiation in machine learning: a survey. Journal of machine learning research, 18 0 (153): 0 1--153, 2018

work page 2018
[6]

Sylvester Normalizing Flows for Variational Inference

Rianne van den Berg, Leonard Hasenclever, Jakub M Tomczak, and Max Welling. Sylvester normalizing flows for variational inference. arXiv preprint arXiv:1803.05649, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

The Stan Math Library: Reverse-Mode Automatic Differentiation in C++

Bob Carpenter, Matthew D Hoffman, Marcus Brubaker, Daniel Lee, Peter Li, and Michael Betancourt. The Stan math library: Reverse-mode automatic differentiation in c++. arXiv preprint arXiv:1509.07164, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[8]

Reversible Architectures for Arbitrarily Deep Residual Neural Networks

Bo Chang, Lili Meng, Eldad Haber, Lars Ruthotto, David Begert, and Elliot Holtham. Reversible architectures for arbitrarily deep residual neural networks. arXiv preprint arXiv:1709.03698, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[9]

Multi-level residual networks from dynamical systems view

Bo Chang, Lili Meng, Eldad Haber, Frederick Tung, and David Begert. Multi-level residual networks from dynamical systems view. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=SyJS-OgR-

work page 2018
[10]

Recurrent neural networks for multivariate time series with missing values

Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. Recurrent neural networks for multivariate time series with missing values. Scientific Reports, 8 0 (1): 0 6085, 2018. URL https://doi.org/10.1038/s41598-018-24271-9

work page doi:10.1038/s41598-018-24271-9 2018
[11]

Stewart, and Jimeng Sun

Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F. Stewart, and Jimeng Sun. Doctor AI : Predicting clinical events via recurrent neural networks. In Proceedings of the 1st Machine Learning for Healthcare Conference, volume 56 of Proceedings of Machine Learning Research, pages 301--318. PMLR, 18--19 Aug 2016. URL http://proceedings.mlr.press/v56/...

work page 2016
[12]

Theory of ordinary differential equations

Earl A Coddington and Norman Levinson. Theory of ordinary differential equations. Tata McGraw-Hill Education, 1955

work page 1955
[13]

NICE: Non-linear Independent Components Estimation

Laurent Dinh, David Krueger, and Yoshua Bengio. NICE : Non-linear independent components estimation. arXiv preprint arXiv:1410.8516, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[14]

Recurrent marked temporal point processes: Embedding event history to vector

Nan Du, Hanjun Dai, Rakshit Trivedi, Utkarsh Upadhyay, Manuel Gomez-Rodriguez, and Le Song. Recurrent marked temporal point processes: Embedding event history to vector. In International Conference on Knowledge Discovery and Data Mining, pages 1555--1564. ACM, 2016

work page 2016
[15]

Automated derivation of the adjoint of high-level transient finite element programs

Patrick Farrell, David Ham, Simon Funke, and Marie Rognes. Automated derivation of the adjoint of high-level transient finite element programs. SIAM Journal on Scientific Computing, 2013

work page 2013
[16]

Spatially adaptive computation time for residual networks

Michael Figurnov, Maxwell D Collins, Yukun Zhu, Li Zhang, Jonathan Huang, Dmitry Vetrov, and Ruslan Salakhutdinov. Spatially adaptive computation time for residual networks. arXiv preprint, 2017

work page 2017
[17]

Futoma , S

J. Futoma , S. Hariharan , and K. Heller . Learning to Detect Sepsis with a Multitask G aussian Process RNN Classifier . ArXiv e-prints, 2017

work page 2017
[18]

The reversible residual network: Backpropagation without storing activations

Aidan N Gomez, Mengye Ren, Raquel Urtasun, and Roger B Grosse. The reversible residual network: Backpropagation without storing activations. In Advances in Neural Information Processing Systems, pages 2211--2221, 2017

work page 2017
[19]

Adaptive Computation Time for Recurrent Neural Networks

Alex Graves. Adaptive computation time for recurrent neural networks. arXiv preprint arXiv:1603.08983, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[20]

HyperNetworks

David Ha, Andrew Dai, and Quoc V Le. Hypernetworks. arXiv preprint arXiv:1609.09106, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[21]

Stable architectures for deep neural networks

Eldad Haber and Lars Ruthotto. Stable architectures for deep neural networks. Inverse Problems, 34 0 (1): 0 014004, 2017

work page 2017
[22]

Hairer, S.P

E. Hairer, S.P. N rsett, and G. Wanner. Solving Ordinary Differential Equations I -- Nonstiff Problems . Springer, 1987

work page 1987
[23]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016 a

work page 2016
[24]

Identity mappings in deep residual networks

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European conference on computer vision, pages 630--645. Springer, 2016 b

work page 2016
[25]

Neural networks for machine learning lecture 6a overview of mini-batch gradient descent, 2012

Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky. Neural networks for machine learning lecture 6a overview of mini-batch gradient descent, 2012

work page 2012
[26]

Variable Computation in Recurrent Neural Networks

Yacine Jernite, Edouard Grave, Armand Joulin, and Tomas Mikolov. Variable computation in recurrent neural networks. arXiv preprint arXiv:1611.06188, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[27]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam : A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[28]

Kingma and Max Welling

Diederik P. Kingma and Max Welling. Auto-encoding variational B ayes. International Conference on Learning Representations, 2014

work page 2014
[29]

Improved variational inference with inverse autoregressive flow

Diederik P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. Improved variational inference with inverse autoregressive flow. In Advances in Neural Information Processing Systems, pages 4743--4751, 2016

work page 2016
[30]

a herungsweisen I ntegration totaler D ifferentialgleichungen . Zeitschrift f \

W. Kutta. Beitrag zur n \"a herungsweisen I ntegration totaler D ifferentialgleichungen . Zeitschrift f \"u r Mathematik und Physik , 46: 0 435--453, 1901

work page 1901
[31]

A theoretical framework for back-propagation

Yann LeCun, D Touresky, G Hinton, and T Sejnowski. A theoretical framework for back-propagation. In Proceedings of the 1988 connectionist models summer school, volume 1, pages 21--28. CMU, Pittsburgh, Pa: Morgan Kaufmann, 1988

work page 1988
[32]

Gradient-based learning applied to document recognition

Yann LeCun, L \'e on Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86 0 (11): 0 2278--2324, 1998

work page 1998
[33]

Time-Dependent Representation for Neural Event Sequence Prediction

Yang Li. Time-dependent representation for neural event sequence prediction. arXiv preprint arXiv:1708.00065, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[34]

Directly modeling missing data in sequences with RNN s: Improved classification of clinical time series

Zachary C Lipton, David Kale, and Randall Wetzel. Directly modeling missing data in sequences with RNN s: Improved classification of clinical time series. In Proceedings of the 1st Machine Learning for Healthcare Conference, volume 56 of Proceedings of Machine Learning Research, pages 253--270. PMLR, 18--19 Aug 2016. URL http://proceedings.mlr.press/v56/L...

work page 2016
[35]

Long , Y

Z. Long , Y. Lu , X. Ma , and B. Dong . PDE-Net : Learning PDE s from Data . ArXiv e-prints, 2017

work page 2017
[36]

Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations

Yiping Lu, Aoxiao Zhong, Quanzheng Li, and Bin Dong. Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. arXiv preprint arXiv:1710.10121, 2017

work page arXiv 2017
[37]

Autograd : Reverse-mode differentiation of native P ython

Dougal Maclaurin, David Duvenaud, and Ryan P Adams. Autograd : Reverse-mode differentiation of native P ython. In ICML workshop on Automatic Machine Learning, 2015

work page 2015
[38]

The neural H awkes process: A neurally self-modulating multivariate point process

Hongyuan Mei and Jason M Eisner. The neural H awkes process: A neurally self-modulating multivariate point process. In Advances in Neural Information Processing Systems, pages 6757--6767, 2017

work page 2017
[39]

Fast derivatives of likelihood functionals for ODE based models using adjoint-state method

Valdemar Melicher, Tom Haber, and Wim Vanroose. Fast derivatives of likelihood functionals for ODE based models using adjoint-state method. Computational Statistics, 32 0 (4): 0 1621--1643, 2017

work page 2017
[40]

Intensit \"a tsschwankungen im fernsprechverker

Conny Palm. Intensit \"a tsschwankungen im fernsprechverker. Ericsson Technics, 1943

work page 1943
[41]

Automatic differentiation in pytorch

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017

work page 2017
[42]

Gradient calculations for dynamic recurrent neural networks: A survey

Barak A Pearlmutter. Gradient calculations for dynamic recurrent neural networks: A survey. IEEE Transactions on Neural networks, 6 0 (5): 0 1212--1228, 1995

work page 1995
[43]

The mathematical theory of optimal processes

Lev Semenovich Pontryagin, EF Mishchenko, VG Boltyanskii, and RV Gamkrelidze. The mathematical theory of optimal processes. 1962

work page 1962
[44]

Raissi and G

M. Raissi and G. E. Karniadakis . Hidden physics models: Machine learning of nonlinear partial differential equations . Journal of Computational Physics, pages 125--141, 2018

work page 2018
[45]

Multistep Neural Networks for Data-driven Discovery of Nonlinear Dynamical Systems

Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Multistep neural networks for data-driven discovery of nonlinear dynamical systems. arXiv preprint arXiv:1801.01236, 2018 a

work page internal anchor Pith review Pith/arXiv arXiv 2018
[46]

Numerical G aussian processes for time-dependent and nonlinear partial differential equations

Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Numerical G aussian processes for time-dependent and nonlinear partial differential equations. SIAM Journal on Scientific Computing, 40 0 (1): 0 A172--A198, 2018 b

work page 2018
[47]

Stochastic backpropagation and approximate inference in deep generative models

Danilo J Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31st International Conference on Machine Learning, pages 1278--1286, 2014

work page 2014
[48]

Variational Inference with Normalizing Flows

Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[49]

U ber die numerische A ufl \

C. Runge. \"U ber die numerische A ufl \"o sung von D ifferentialgleichungen . Mathematische Annalen, 46: 0 167--178, 1895

work page
[50]

Deep Neural Networks Motivated by Partial Differential Equations

Lars Ruthotto and Eldad Haber. Deep neural networks motivated by partial differential equations. arXiv preprint arXiv:1804.04272, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[51]

Ryder, A

T. Ryder, A. Golightly, A. S. McGough, and D. Prangle. Black-box Variational Inference for Stochastic Differential Equations . ArXiv e-prints, 2018

work page 2018
[52]

Probabilistic ODE solvers with R unge- K utta means

Michael Schober, David Duvenaud, and Philipp Hennig. Probabilistic ODE solvers with R unge- K utta means. In Advances in Neural Information Processing Systems 25, 2014

work page 2014
[53]

Reliable Decision Support using Counterfactual Models

Peter Schulam and Suchi Saria. What-if reasoning with counterfactual G aussian processes. arXiv preprint arXiv:1703.10651, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[54]

Scalable joint models for reliable uncertainty-aware event prediction

Hossein Soleimani, James Hensman, and Suchi Saria. Scalable joint models for reliable uncertainty-aware event prediction. IEEE transactions on pattern analysis and machine intelligence, 2017 a

work page 2017
[55]

Treatment-Response Models for Counterfactual Reasoning with Continuous-time, Continuous-valued Interventions

Hossein Soleimani, Adarsh Subbaswamy, and Suchi Saria. Treatment-response models for counterfactual reasoning with continuous-time, continuous-valued interventions. arXiv preprint arXiv:1704.02038, 2017 b

work page internal anchor Pith review Pith/arXiv arXiv 2017
[56]

Stable fluids

Jos Stam. Stable fluids. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, pages 121--128. ACM Press/Addison-Wesley Publishing Co., 1999

work page 1999
[57]

Optimization and uncertainty analysis of ODE models using second order adjoint sensitivity analysis

Paul Stapor, Fabian Froehlich, and Jan Hasenauer. Optimization and uncertainty analysis of ODE models using second order adjoint sensitivity analysis. bioRxiv, page 272005, 2018

work page 2018
[58]

Improving Variational Auto-Encoders using Householder Flow

Jakub M Tomczak and Max Welling. Improving variational auto-encoders using H ouseholder flow. arXiv preprint arXiv:1611.09630, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[59]

Latent-space Physics: Towards Learning the Temporal Evolution of Fluid Flow

Steffen Wiewel, Moritz Becher, and Nils Thuerey. Latent-space physics: Towards learning the temporal evolution of fluid flow. arXiv preprint arXiv:1802.10123, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[60]

Fatode: a library for forward, adjoint, and tangent linear integration of ODE s

Hong Zhang and Adrian Sandu. Fatode: a library for forward, adjoint, and tangent linear integration of ODE s. SIAM Journal on Scientific Computing, 36 0 (5): 0 C504--C523, 2014

work page 2014