Dynamic Automatic Differentiation of GPU Broadcast Kernels

Bjorn De Sutter; Jarrett Revels; Juan Pablo Vielma; Tim Besard; Valentin Churavy

arxiv: 1810.08297 · v3 · pith:HZGUK77Hnew · submitted 2018-10-18 · 💻 cs.MS

Dynamic Automatic Differentiation of GPU Broadcast Kernels

Jarrett Revels , Tim Besard , Valentin Churavy , Bjorn De Sutter , Juan Pablo Vielma This is my paper

classification 💻 cs.MS

keywords broadcastapproachoperationsreverse-modeautomaticdifferentiationjuliapure

0 comments

read the original abstract

We show how forward-mode automatic differentiation (AD) can be employed within larger reverse-mode computations to dynamically differentiate broadcast operations in a GPU-friendly manner. Our technique fully exploits the broadcast Jacobian's inherent sparsity structure, and unlike a pure reverse-mode approach, this "mixed-mode" approach does not require a backwards pass over the broadcasted operation's subgraph, obviating the need for several reverse-mode-specific programmability restrictions on user-authored broadcast operations. Most notably, this approach allows broadcast fusion in primal code despite the presence of data-dependent control flow. We discuss an experiment in which a Julia implementation of our technique outperformed pure reverse-mode TensorFlow and Julia implementations for differentiating through broadcast operations within an HM-LSTM cell update calculation.

This paper has not been read by Pith yet.

Dynamic Automatic Differentiation of GPU Broadcast Kernels

discussion (0)