Automatic differentiation in machine learning: a survey

· 2015 · cs.SC · arXiv 1502.05767

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

open full Pith review browse 8 citing papers arXiv PDF

abstract

Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in machine learning. Automatic differentiation (AD), also called algorithmic differentiation or simply "autodiff", is a family of techniques similar to but more general than backpropagation for efficiently and accurately evaluating derivatives of numeric functions expressed as computer programs. AD is a small but established field with applications in areas including computational fluid dynamics, atmospheric sciences, and engineering design optimization. Until very recently, the fields of machine learning and AD have largely been unaware of each other and, in some cases, have independently discovered each other's results. Despite its relevance, general-purpose AD has been missing from the machine learning toolbox, a situation slowly changing with its ongoing adoption under the names "dynamic computational graphs" and "differentiable programming". We survey the intersection of AD and machine learning, cover applications where AD has direct relevance, and address the main implementation techniques. By precisely defining the main differentiation techniques and their interrelationships, we aim to bring clarity to the usage of the terms "autodiff", "automatic differentiation", and "symbolic differentiation" as these are encountered more and more in machine learning settings.

citation-role summary

background 2 method 1

citation-polarity summary

background 2 use method 1

representative citing papers

ADELIA: Automatic Differentiation for Efficient Laplace Inference Approximations

cs.DC · 2026-05-07 · conditional · novelty 7.0

ADELIA is the first AD-enabled INLA system that computes exact hyperparameter gradients via a structure-exploiting multi-GPU backward pass, delivering 4.2-7.9x per-gradient speedups and 5-8x better energy efficiency than finite differences on models with up to 1.9 million latent variables.

Learning by training: emergent return-point memory from cyclically tuning disordered sphere packings

physics.comp-ph · 2025-09-01 · unverdicted · novelty 6.0

Cyclic inverse design on athermal disordered sphere packings produces an emergent marginally absorbing manifold that encodes return-point memory of the training range through gradient discontinuities.

Large-eddy simulation nets (LESnets) based on physics-informed neural operator for wall-bounded turbulence

physics.flu-dyn · 2026-04-29 · unverdicted · novelty 6.0

LESnets integrates LES equations and the law of the wall into F-FNO to enable data-free, stable long-term predictions of wall-bounded turbulence at Re_tau up to 1000 on coarse grids, matching traditional LES accuracy at higher efficiency.

Efficient optimisation of multi-parameter quantum control protocols for strongly-coupled systems

quant-ph · 2026-04-21 · unverdicted · novelty 6.0

Gradient-based optimization of SUPER and FTPE pulse protocols via auto-differentiation and uniTEMPO yields higher preparation fidelities than resonant pi-pulses or standard two-photon excitation, with the advantage increasing at higher temperatures.

Physics-Informed Neural Networks for Solving Two-Flavor Neutrino Oscillations in Vacuum and Matter Environments for Atmospheric and Reactor Neutrinos

hep-ph · 2026-04-23 · unverdicted · novelty 5.0 · 2 refs

Physics-informed neural networks solve two-flavor neutrino oscillation equations in vacuum and matter with mean squared errors of order 10^{-3} to 10^{-4}, matching analytical results.

High-precision measurement of the W boson mass with the CMS experiment

hep-ex · 2024-12-18 · unverdicted · novelty 5.0

CMS measures the W boson mass as 80360.2 ± 9.9 MeV from 2016 data, consistent with the Standard Model prediction.

Heterogeneous Variational Inference for Markov Degradation Hazard Models: Discretized Mixture with Interpretable Clusters

cs.LG · 2026-04-27 · unverdicted · novelty 5.0

A discretized finite mixture model with ADVI identifies interpretable low- and high-risk clusters in Markov degradation hazard models for 280 industrial pumps, achieving 84x speedup over NUTS while enforcing stability constraints.

Exploring the Boundaries of Differentiable Radiation Transport and Detector Simulation

physics.ins-det · 2026-05-07

citing papers explorer

Showing 8 of 8 citing papers.

ADELIA: Automatic Differentiation for Efficient Laplace Inference Approximations cs.DC · 2026-05-07 · conditional · none · ref 13
ADELIA is the first AD-enabled INLA system that computes exact hyperparameter gradients via a structure-exploiting multi-GPU backward pass, delivering 4.2-7.9x per-gradient speedups and 5-8x better energy efficiency than finite differences on models with up to 1.9 million latent variables.
Learning by training: emergent return-point memory from cyclically tuning disordered sphere packings physics.comp-ph · 2025-09-01 · unverdicted · none · ref 57 · internal anchor
Cyclic inverse design on athermal disordered sphere packings produces an emergent marginally absorbing manifold that encodes return-point memory of the training range through gradient discontinuities.
Large-eddy simulation nets (LESnets) based on physics-informed neural operator for wall-bounded turbulence physics.flu-dyn · 2026-04-29 · unverdicted · none · ref 96
LESnets integrates LES equations and the law of the wall into F-FNO to enable data-free, stable long-term predictions of wall-bounded turbulence at Re_tau up to 1000 on coarse grids, matching traditional LES accuracy at higher efficiency.
Efficient optimisation of multi-parameter quantum control protocols for strongly-coupled systems quant-ph · 2026-04-21 · unverdicted · none · ref 18
Gradient-based optimization of SUPER and FTPE pulse protocols via auto-differentiation and uniTEMPO yields higher preparation fidelities than resonant pi-pulses or standard two-photon excitation, with the advantage increasing at higher temperatures.
Physics-Informed Neural Networks for Solving Two-Flavor Neutrino Oscillations in Vacuum and Matter Environments for Atmospheric and Reactor Neutrinos hep-ph · 2026-04-23 · unverdicted · none · ref 46 · 2 links · internal anchor
Physics-informed neural networks solve two-flavor neutrino oscillation equations in vacuum and matter with mean squared errors of order 10^{-3} to 10^{-4}, matching analytical results.
High-precision measurement of the W boson mass with the CMS experiment hep-ex · 2024-12-18 · unverdicted · none · ref 52 · internal anchor
CMS measures the W boson mass as 80360.2 ± 9.9 MeV from 2016 data, consistent with the Standard Model prediction.
Heterogeneous Variational Inference for Markov Degradation Hazard Models: Discretized Mixture with Interpretable Clusters cs.LG · 2026-04-27 · unverdicted · none · ref 21
A discretized finite mixture model with ADVI identifies interpretable low- and high-risk clusters in Markov degradation hazard models for 280 industrial pumps, achieving 84x speedup over NUTS while enforcing stability constraints.
Exploring the Boundaries of Differentiable Radiation Transport and Detector Simulation physics.ins-det · 2026-05-07 · unreviewed · ref 13

Automatic differentiation in machine learning: a survey

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer